ISO/IEC JTC1 SC22 WG21, Core Working Group
DxxxxR0
Robert Haberlach (rh633{at}cam{dot}ac.uk)
2017-04-21
The standard is unclear about its treatment of indirection through null pointers and pointers past the end of an object; typeid is specified to treat lvalues created by dereferencing a null pointer, and the common idiom &arr[len] is currently undefined. The standard is also unclear about its distinction of lvalues referring to objects and merely storage. This proposal introduces rigorous classifications of references and expressions and employs them to define these expressions flawlessly.
As of the rework on value categories, non-void expressions are put into two classes: glvalues and prvalues. Prvalues are said to have a value; and glvalues' “evaluation determines the identity of an object, bitfield or function”. However, while prvalues indeed always designate values, glvalues do not necessarily evaluate to any member of that list. They can be invalid, or refer to memory locations that don't hold objects of the specified type. Some wording makes silent assumptions about what glvalues refer to, which can make reasoning about the well-formedness difficult; then again, other wording is straight out inconsistent.
This covers all different possible states of a glvalue. Certain uses of unassigned lvalues are now defined, in particular Lvalue Transformations and the unary & operator.
Finally, in conjunction with the rework on reference intialization and state, we propose that a reference shall not be initialized with its own name. This makes code ill-formed with a diagnostic required - note that all common implementations already diagnose this as a warning. For further discussion on this topic, see also SG12's mailing list.UB504
3.7 [basic.stc] ¶4:
When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of the deallocated storage become invalid pointer values ([basic.compound]), references designating any part of the deallocated storage become invalid references (8.3.2 [dcl.ref]) and all glvalues designating part of the deallocated storage become invalid glvalues (3.10 [basic.lval]). Indirection through an invalid pointer value […]
3.8 [basic.life] ¶7:
Similarly, before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any glvalue that refers to the original object is an assigned storage glvalue and may be used but only in limited ways. For an object under construction or destruction, see 12.7.Otherwise, such a glvalue refers to allocated storage (3.7.4.2), and using theUsing properties ofthean assigned storage glvalue that do not depend on its value is well-defined.
3.8 [basic.life] ¶8:
If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, an assigned storage reference that referred to the original object, or the name of the original object will automatically refer to the new object (that is, become a pointer to that object, an assigned object reference and an assigned object lvalue, respectively) and, once the lifetime of the new object has started, can be used to manipulate the new object, if:
Append after 3.10 [basic.lval] ¶1 a paragraph ¶2:
4.1 [conv.lval]:[ Note: Historically, lvalues and rvalues were so-called because they could appear on the left- and right-hand side of an assignment (although this is no longer generally true); glvalues are “generalized” lvalues, prvalues are “pure” rvalues, and xvalues are “eXpiring” lvalues. Despite their names, these terms classify expressions, not values
- A glvalue …
- A prvalue …
- An xvalue is a glvalue that generally denotes an object or bit-field whose resources can be reused (usually because it is near the end of its lifetime). [ Example: Certain kinds of expressions involving rvalue references (8.3.2) yield xvalues, such as a call to a function whose return type is an rvalue reference or a cast to an rvalue reference type. — end example ] [Note: an xvalue cannot be null or past the end of an object, because the only way to obtain such an xvalue would bind an unassigned lvalue to a reference via a cast, which has undefined behavior (8.3.2 [dcl.ref]). — end note]
- An lvalue is a glvalue that is not an xvalue. It is one of the following:
- an assigned lvalue (see below),
- a null lvalue (5.3.1 [expr.unary.op]), or
- an lvalue past the end of an object (5.3.1 [expr.unary.op]), or
- an invalid lvalue (see below).
- An rvalue …
.; the latter is covered by the expression's assignment category. — end note ] An expression is assigned if it designatesA prvalue is never assigned. An invalid expression is one that referred to an object or region of storage that has been deallocated (3.7 [basic.stc]). Using a glvalue that is not assigned in a context where the language expects an assigned glvalue has undefined behavior.
- a function, in which case it is an assigned function expression, or
- an object during construction or destruction (12.7 [class.cdtor]) or an object or bitfield within its lifetime (3.8 [basic.life]), in which case it is an assigned object expression, or
- an allocated region of storage (possibly occupied by an object), in which case it is an assigned storage expression. [Note: there can't be any glvalues referring to the memory occupied by a destroyed bit-field, because binding a reference to a bit-field is ill-formed, and accessing a member of an object that isn't alive is undefined. That is, any glvalue referring to a bit-field is an assigned object glvalue. — end note]
Every expression belongs to exactly one of the fundamental classifications in this taxonomy: lvalue, xvalue, or prvalue. This property of an expression is called its value category. Additionally, every expression belongs to exactly one of the classifications listed above: null expression (lvalues only), expression past the end of an object (lvalues only), assigned function expression (lvalues only), assigned object expression, assigned storage expression (glvalues only), or invalid expression (glvalues only). This runtime property of an expression is called its assignment category. [Note: … — end note ]
A glvalue (3.10) of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the glvalue is not an assigned object glvalue, the behavior is undefined. […]
An lvalue or rvalue of type “array of N T” or “array of unknown bound of T” can be converted to a prvalue of type “pointer to T”. If the expression refers to an array, the result is a pointer to the first element ofthethat array. If the expression is a null lvalue, the result is the null pointer value. If the expression is an lvalue past the end of an object, the result is a pointer past the end of the object. Otherwise, the behavior is undefined.
4.3 [conv.func]:
An lvalue of function type T can be converted to a prvalue of type “pointer to T”. If the lvalue is an assigned function lvalue, the result is a pointer to the function. If the lvalue is a null lvalue, the result is the null pointer value. Otherwise, the behavior is undefined.
5.2.5 [expr.ref] ¶4:
If E2 is declared to have type “reference to T”, then E1.E2 is an lvalue; the type of E1.E2 is T. Otherwise, one of the following rules applies.
- If E2 is a static data member […]
- If E2 is a non-static data member, the behavior is undefined if E1 is not an assigned object expression. If
andthe type of E1 is “cq1 vq1 X”, and the type of E2 is “cq2 vq2 T”, […]- If E2 is a (possibly overloaded) member function, the behavior is undefined if E1 is not an assigned object expression. Function overload resolution (13.3) is used […]
5.2.8 [expr.typeid] ¶2:
When typeid is applied to an assigned object glvalue expression whose type is a polymorphic class type (10.3), the result refers to a std::type_info object representing the type of the most derived object (1.8) (that is, the dynamic type) to which the glvalue refers. If the glvalue expression isobtained by applying the unary * operator to a pointer71 and the pointer is a null pointer value (4.10)a null lvalue (5.3.1 [expr.unary.op]), the typeid expression throws an exception (15.1 [except.throw]) of a type that would match a handler of type std::bad_typeid exception (18.7.4 [bad.typeid]).
5.3.1 [expr.unary.op] ¶1:
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object or function type T.5.3.1 [expr.unary.op] ¶3:, or a pointer to a function type and theThe result is an lvalue of type T:referring to the object or function to which the expression points.Otherwise, the behavior is undefined. [Note: … —end note]
- if the pointer is the null pointer value, the result is a null lvalue; otherwise,
- if the pointer is past the end of an object, the result is an lvalue past the end of that object; otherwise,
- if the pointer points to an object or function, the result is an assigned object lvalue or assigned function lvalue, respectively, referring to that entity; otherwise,
- if the pointer represents the address of allocated storage, the result is an assigned storage lvalue designating that region of storage.
The result of the unary & operator is a pointer to its operand. The operand shall be an lvalue or a qualified-id. If the operand is a qualified-id naming a non-static or variant member m of some class C with type T, the result has type “pointer to member of class C of type T” and is a prvalue designating C::m. Otherwise, if the type of the expression is T, the result has type “pointer to T” and is a prvaluethat is the address of the designated object (1.7) or a pointer to the designated functionpointer. The assignment category is determined as follows:Otherwise, the behavior is undefined.
- if the lvalue is an assigned object lvalue or assigned function lvalue, the result points to the designated object or function, respectively; otherwise,
- if the lvalue is an assigned storage lvalue, the result represents the address of the designated region of storage; otherwise,
- if the lvalue is a null lvalue, the result is the null pointer value; otherwise,
- if the lvalue is past the end of an object, the result is a pointer past the end of that object.
5.18 [expr.ass] ¶8:
The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand and return an lvalue referring to the left operand; the behavior is undefined if the lvalue is not an assigned object lvalue, or is an assigned storage lvalue member access expression (5.2.5 [expr.ref]) designating a variant member (see below). The result in all cases is a bit-field if the left operand is a bit-field. In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.
8.3.2 [dcl.ref] ¶5:
[…]A reference shall be initialized to refer to a valid object or function.A reference's name shall not be potentially evaluated (3.2 [odr.def]) in its intializer. If the expression to which a reference is directly bound is not an assigned expression designating either a function of an appropriate type (8.5.3 [dcl.init.ref]), or a region of storage of suitable size and alignment to contain an object of the referenced type (3.8 [basic.life]), or an object residing in such storage, the behavior is undefined. If a reference is initialized to refer to the virtual base of its initializer, and its initializer is not an assigned object expression, the behavior is undefined. [ Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it tothe “object”a null lvalueobtained by indirection through a null pointer, which causes undefined behavior. — end note] [Note: as described in 9.6 [class.bit], a reference cannot be bound directly to a bit-field. — end note ] [Example:int& i = true? 0 : i; // Error: i is potentially evaluated— end example] Any use of a reference before it is initialized results in undefined behavior. [Example:
int& j = *(int*)nullptr; // Undefined behavior: Binding a reference to a null lvalueextern int& ir2;— end example]
int ir1 = ir2; // Undefined behavior: ir2 not yet initialized
Create 8.3.2 [dcl.ref] ¶8:
The assignment category (3.10 [basic.lval]) of a reference's name is called the reference's assignment category: assigned function reference, assigned object reference, assigned storage reference and invalid reference. As described above, if a reference is initialized to be an invalid reference, the behavior is undefined. After initialization, the reference has the same type of assignment category as the expression it was directly bound to. [Note: a reference's assignment category changes in the following situations:— end note] [ Example:
- The reference is an assigned object reference and the designated object is destroyed. The reference becomes an assigned storage reference, provided the storage occupied by the original object has not simultaneously been deallocated.
- The reference is an assigned object or assigned storage reference, and the storage occupied by the object or the designated storage, respectively, is deallocated. The reference becomes an invalid reference.
- The reference is an assigned storage reference and a new object of the same cv-unqualified type is created in the same storage; under certain circumstances, the reference becomes an assigned object reference.
int const& f() { int const& i = 0; // i is an assigned object reference return i; } auto&& r = f(); // Undefined behavior: the reference can never be used as a non-invalid reference— end example]
[CWG232] “Is indirection through a null pointer undefined behavior?“: wg21.link/cwg232
[CWG453] “References may only bind to “valid” objects“: wg21.link/cwg453
[CWG504] “Should use of a variable in its own initializer require a diagnostic?“: wg21.link/cwg504
[P0137R1] (Richard Smith) “Core Issue 1776: Replacement of class objects containing reference members“: wg21.link/P0137R1
[UB504] “[ub] Proposal: make self-initialized references ill-formed (C++17?) ” open-std.org/pipermail/ub/2014-September/000506.html