ISO/IEC JTC1 SC22 WG21, Core Working Group
DxxxxR0
Robert Haberlach (rh633{at}cam{dot}ac.uk)
2017-04-21

Assignment categories of glvalues

This paper proposes to introduce, amongst other things, the notion of an empty lvalue as discussed in core issue 232.CWG232 The wording is partly adapted from the proposed wording in that core issue and core issue 453.CWG453

This proposal resolves both these core issues and core issue 504.CWG504

Motivation and proposed resolutions

The standard is unclear about its treatment of indirection through null pointers and pointers past the end of an object; typeid is specified to treat lvalues created by dereferencing a null pointer, and the common idiom &arr[len] is currently undefined. The standard is also unclear about its distinction of lvalues referring to objects and merely storage. This proposal introduces rigorous classifications of references and expressions and employs them to define these expressions flawlessly.

As of the rework on value categories, non-void expressions are put into two classes: glvalues and prvalues. Prvalues are said to have a value; and glvalues' “evaluation determines the identity of an object, bitfield or function”. However, while prvalues indeed always designate values, glvalues do not necessarily evaluate to any member of that list. They can be invalid, or refer to memory locations that don't hold objects of the specified type. Some wording makes silent assumptions about what glvalues refer to, which can make reasoning about the well-formedness difficult; then again, other wording is straight out inconsistent.

Assignment categories of expressions

…solve this issue by classifying the different states of a glvalue. More specifically, a glvalue can be

This covers all different possible states of a glvalue. Certain uses of unassigned lvalues are now defined, in particular Lvalue Transformations and the unary & operator.

References

Finally, in conjunction with the rework on reference intialization and state, we propose that a reference shall not be initialized with its own name. This makes code ill-formed with a diagnostic required - note that all common implementations already diagnose this as a warning. For further discussion on this topic, see also SG12's mailing list.UB504

Proposed wording

This wording is relative to N4640.

Hide deleted wording

3.7 [basic.stc] ¶4:

When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of the deallocated storage become invalid pointer values ([basic.compound]), references designating any part of the deallocated storage become invalid references (8.3.2 [dcl.ref]) and all glvalues designating part of the deallocated storage become invalid glvalues (3.10 [basic.lval]). Indirection through an invalid pointer value […]

3.8 [basic.life] ¶7:

Similarly, before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any glvalue that refers to the original object is an assigned storage glvalue and may be used but only in limited ways. For an object under construction or destruction, see 12.7. Otherwise, such a glvalue refers to allocated storage (3.7.4.2), and using the Using properties of the an assigned storage glvalue that do not depend on its value is well-defined.

3.8 [basic.life] ¶8:

If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, an assigned storage reference that referred to the original object, or the name of the original object will automatically refer to the new object (that is, become a pointer to that object, an assigned object reference and an assigned object lvalue, respectively) and, once the lifetime of the new object has started, can be used to manipulate the new object, if:

Append after 3.10 [basic.lval] ¶1 a paragraph ¶2:

[ Note: Historically, lvalues and rvalues were so-called because they could appear on the left- and right-hand side of an assignment (although this is no longer generally true); glvalues are “generalized” lvalues, prvalues are “pure” rvalues, and xvalues are “eXpiring” lvalues. Despite their names, these terms classify expressions, not values.; the latter is covered by the expression's assignment category.end note ] An expression is assigned if it designates A prvalue is never assigned. An invalid expression is one that referred to an object or region of storage that has been deallocated (3.7 [basic.stc]). Using a glvalue that is not assigned in a context where the language expects an assigned glvalue has undefined behavior.

Every expression belongs to exactly one of the fundamental classifications in this taxonomy: lvalue, xvalue, or prvalue. This property of an expression is called its value category. Additionally, every expression belongs to exactly one of the classifications listed above: null expression (lvalues only), expression past the end of an object (lvalues only), assigned function expression (lvalues only), assigned object expression, assigned storage expression (glvalues only), or invalid expression (glvalues only). This runtime property of an expression is called its assignment category. [Note: … — end note ]
4.1 [conv.lval]:
A glvalue (3.10) of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the glvalue is not an assigned object glvalue, the behavior is undefined. […]

4.2 [conv.array]:
An lvalue or rvalue of type “array of N T” or “array of unknown bound of T” can be converted to a prvalue of type “pointer to T”. If the expression refers to an array, the result is a pointer to the first element of thethat array. If the expression is a null lvalue, the result is the null pointer value. If the expression is an lvalue past the end of an object, the result is a pointer past the end of the object. Otherwise, the behavior is undefined.

4.3 [conv.func]:

An lvalue of function type T can be converted to a prvalue of type “pointer to T”. If the lvalue is an assigned function lvalue, the result is a pointer to the function. If the lvalue is a null lvalue, the result is the null pointer value. Otherwise, the behavior is undefined.

5.2.5 [expr.ref] ¶4:

If E2 is declared to have type “reference to T”, then E1.E2 is an lvalue; the type of E1.E2 is T. Otherwise, one of the following rules applies.

5.2.8 [expr.typeid] ¶2:

When typeid is applied to an assigned object glvalue expression whose type is a polymorphic class type (10.3), the result refers to a std::type_info object representing the type of the most derived object (1.8) (that is, the dynamic type) to which the glvalue refers. If the glvalue expression is obtained by applying the unary * operator to a pointer71 and the pointer is a null pointer value (4.10) a null lvalue (5.3.1 [expr.unary.op]), the typeid expression throws an exception (15.1 [except.throw]) of a type that would match a handler of type std::bad_typeid exception (18.7.4 [bad.typeid]).

5.3.1 [expr.unary.op] ¶1:

The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object or function type T., or a pointer to a function type and the The result is an lvalue of type T: referring to the object or function to which the expression points. Otherwise, the behavior is undefined. [Note: … —end note]
5.3.1 [expr.unary.op] ¶3:
The result of the unary & operator is a pointer to its operand. The operand shall be an lvalue or a qualified-id. If the operand is a qualified-id naming a non-static or variant member m of some class C with type T, the result has type “pointer to member of class C of type T” and is a prvalue designating C::m. Otherwise, if the type of the expression is T, the result has type “pointer to T” and is a prvalue that is the address of the designated object (1.7) or a pointer to the designated function pointer. The assignment category is determined as follows:
  • if the lvalue is an assigned object lvalue or assigned function lvalue, the result points to the designated object or function, respectively; otherwise,
  • if the lvalue is an assigned storage lvalue, the result represents the address of the designated region of storage; otherwise,
  • if the lvalue is a null lvalue, the result is the null pointer value; otherwise,
  • if the lvalue is past the end of an object, the result is a pointer past the end of that object.
Otherwise, the behavior is undefined
.

5.18 [expr.ass] ¶8:

The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand and return an lvalue referring to the left operand; the behavior is undefined if the lvalue is not an assigned object lvalue, or is an assigned storage lvalue member access expression (5.2.5 [expr.ref]) designating a variant member (see below). The result in all cases is a bit-field if the left operand is a bit-field. In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.

8.3.2 [dcl.ref] ¶5:

[…] A reference shall be initialized to refer to a valid object or function. A reference's name shall not be potentially evaluated (3.2 [odr.def]) in its intializer. If the expression to which a reference is directly bound is not an assigned expression designating either a function of an appropriate type (8.5.3 [dcl.init.ref]), or a region of storage of suitable size and alignment to contain an object of the referenced type (3.8 [basic.life]), or an object residing in such storage, the behavior is undefined. If a reference is initialized to refer to the virtual base of its initializer, and its initializer is not an assigned object expression, the behavior is undefined. [ Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” a null lvalueobtained by indirection through a null pointer, which causes undefined behavior. end note] [Note: as described in 9.6 [class.bit], a reference cannot be bound directly to a bit-field. — end note ] [Example:
int& i = true? 0 : i;      // Error: i is potentially evaluated

int& j = *(int*)nullptr; // Undefined behavior: Binding a reference to a null lvalue
— end example] Any use of a reference before it is initialized results in undefined behavior. [Example:
extern int& ir2;
int ir1 = ir2; // Undefined behavior: ir2 not yet initialized
— end example]
 

Create 8.3.2 [dcl.ref] ¶8:

The assignment category (3.10 [basic.lval]) of a reference's name is called the reference's assignment category: assigned function reference, assigned object reference, assigned storage reference and invalid reference. As described above, if a reference is initialized to be an invalid reference, the behavior is undefined. After initialization, the reference has the same type of assignment category as the expression it was directly bound to. [Note: a reference's assignment category changes in the following situations: end note] [ Example:
int const& f() {
  int const& i = 0; // i is an assigned object reference
  return i;
}

auto&& r = f(); // Undefined behavior: the reference can never be used as a non-invalid reference
end example]

References

[CWG232] “Is indirection through a null pointer undefined behavior?“: wg21.link/cwg232

[CWG453] “References may only bind to “valid” objects“: wg21.link/cwg453

[CWG504] “Should use of a variable in its own initializer require a diagnostic?“: wg21.link/cwg504

[P0137R1] (Richard Smith) “Core Issue 1776: Replacement of class objects containing reference members“: wg21.link/P0137R1

[UB504] “[ub] Proposal: make self-initialized references ill-formed (C++17?) ” open-std.org/pipermail/ub/2014-September/000506.html