19.6. union
: A Space-Saving Class
A union
is a special kind of class. A union
may have multiple data members, but at any point in time, only one of the members may have a value. When a value is assigned to one member of the union
, all other members become undefined. The amount of storage allocated for a union
is at least as much as is needed to contain its largest data member. Like any class, a union
defines a new type.
Some, but not all, class features apply equally to union
s. A union
cannot have a member that is a reference, but it can have members of most other types, including, under the new standard, class types that have constructors or destructors. A union
can specify protection labels to make members public
, private
, or protected
. By default, like struct
s, members of a union
are public
.
A union
may define member functions, including constructors and destructors. However, a union
may not inherit from another class, nor may a union
be used as a base class. As a result, a union
may not have virtual functions.
Defining a union
union
s offer a convenient way to represent a set of mutually exclusive values of different types. As an example, we might have a process that handles different kinds of numeric or character data. That process might define a union
to hold these values:
// objects of type Token have a single member, which could be of any of the listed types
union Token {
// members are public by default
char cval;
int ival;
double dval;
};
A union
is defined starting with the keyword union
, followed by an (optional) name for the union
and a set of member declarations enclosed in curly braces. This code defines a union
named Token
that can hold a value that is either a char
, an int
, or a double
.
Using a union
Type
The name of a union
is a type name. Like the built-in types, by default union
s are uninitialized. We can explicitly initialize a union
in the same way that we can explicitly initialize aggregate classes (§ 7.5.5, p. 298) by enclosing the initializer in a pair of curly braces:
Token first_token = {'a'}; // initializes the cval member
Token last_token; // uninitialized Token object
Token *pt = new Token; // pointer to an uninitialized Token object
If an initializer is present, it is used to initialize the first member. Hence, the initialization of first_token
gives a value to its cval
member.
The members of an object of union
type are accessed using the normal member access operators:
last_token.cval = 'z';
pt->ival = 42;
Assigning a value to a data member of a union
object makes the other data members undefined. As a result, when we use a union
, we must always know what type of value is currently stored in the union
. Depending on the types of the members, retrieving or assigning to the value stored in the union
through the wrong data member can lead to a crash or other incorrect program behavior.
Anonymous union
s
An anonymousunion
is an unnamed union
that does not include any declarations between the close curly that ends its body and the semicolon that ends the union
definition (§ 2.6.1, p. 73). When we define an anonymous union
the compiler automatically creates an unnamed object of the newly defined union
type:
union { // anonymous union
char cval;
int ival;
double dval;
}; // defines an unnamed object, whose members we can access directly
cval = 'c'; // assigns a new value to the unnamed, anonymous union object
ival = 42; // that object now holds the value 42
The members of an anonymous union
are directly accessible in the scope where the anonymous union
is defined.
INFO
An anonymous union
cannot have private
or protected
members, nor can an anonymous union
define member functions.
union
s with Members of Class Type
C++11Under earlier versions of C++, union
s could not have members of a class type that defined its own constructors or copy-control members. Under the new standard, this restriction is lifted. However, union
s with members that define their own constructors and/or copy-control members are more complicated to use than union
s that have members of built-in type.
When a union
has members of built-in type, we can use ordinary assignment to change the value that the union
holds. Not so for union
s that have members of nontrivial class types. When we switch the union
’s value to and from a member of class type, we must construct or destroy that member, respectively: When we switch the union
to a member of class type, we must run a constructor for that member’s type; when we switch from that member, we must run its destructor.
When a union
has members of built-in type, the compiler will synthesize the memberwise versions of the default constructor or copy-control members. The same is not true for union
s that have members of a class type that defines its own default constructor or one or more of the copy-control members. If a union
member’s type defines one of these members, the compiler synthesizes the corresponding member of the union
as deleted (§ 13.1.6, p. 508).
For example, the string
class defines all five copy-control members and the default constructor. If a union
contains a string
and does not define its own default constructor or one of the copy-control members, then the compiler will synthesize that missing member as deleted. If a class has a union
member that has a deleted copy-control member, then that corresponding copy-control operation(s) of the class itself will be deleted as well.
Using a Class to Manage union
Members
Because of the complexities involved in constructing and destroying members of class type, union
s with class-type members ordinarily are embedded inside another class. That way the class can manage the state transitions to and from the member of class type. As an example, we’ll add a string
member to our union
. We’ll define our union
as an anonymous union
and make it a member of a class named Token
. The Token
class will manage the union
’s members.
To keep track of what type of value the union
holds, we usually define a separate object known as a discriminant. A discriminant lets us discriminate among the values that the union
can hold. In order to keep the union
and its discriminant in sync, we’ll make the discriminant a member of Token
as well. Our class will define a member of an enumeration type (§ 19.3, p. 832) to keep track of the state of its union
member.
The only functions our class will define are the default constructor, the copy-control members, and a set of assignment operators that can assign a value of one of our union
’s types to the union
member:
class Token {
public:
// copy control needed because our class has a union with a string member
// defining the move constructor and move-assignment operator is left as an exercise
Token(): tok(INT), ival{0} { }
Token(const Token &t): tok(t.tok) { copyUnion(t); }
Token &operator=(const Token&);
// if the union holds a string, we must destroy it; see § 19.1.2 (p. 824)
~Token() { if (tok == STR) sval.~string(); }
// assignment operators to set the differing members of the union
Token &operator=(const std::string&);
Token &operator=(char);
Token &operator=(int);
Token &operator=(double);
private:
enum {INT, CHAR, DBL, STR} tok; // discriminant
union { // anonymous union
char cval;
int ival;
double dval;
std::string sval;
}; // each Token object has an unnamed member of this unnamed union type
// check the discriminant and copy the union member as appropriate
void copyUnion(const Token&);
};
Our class defines a nested, unnamed, unscoped enumeration (§ 19.3, p. 832) that we use as the type for the member named tok
. We defined tok
following the close curly and before the semicolon that ends the definition of the enum
, which defines tok
to have this unnamed enum
type (§ 2.6.1, p. 73).
We’ll use tok
as our discriminant. When the union
holds an int
value, tok
will have the value INT
; if the union
has a string
, tok
will be STR
; and so on.
The default constructor initializes the discriminant and the union
member to hold an int
value of 0
.
Because our union
has a member with a destructor, we must define our own destructor to (conditionally) destroy the string
member. Unlike ordinary members of a class type, class members that are part of a union
are not automatically destroyed. The destructor has no way to know which type the union
holds, so it cannot know which member to destroy.
Our destructor checks whether the object being destroyed holds a string
. If so, the destructor explicitly calls the string
destructor (§ 19.1.2, p. 824) to free the memory used by that string
. The destructor has no work to do if the union
holds a member of any of the built-in types.
Managing the Discriminant and Destroying the string
The assignment operators will set tok
and assign the corresponding member of the union
. Like the destructor, these members must conditionally destroy the string
before assigning a new value to the union
:
Token &Token::operator=(int i)
{
if (tok == STR) sval.~string(); // if we have a string, free it
ival = i; // assign to the appropriate member
tok = INT; // update the discriminant
return *this;
}
If the current value in the union
is a string
, we must destroy that string
before assigning a new value to the union
. We do so by calling the string
destructor. Once we’ve cleaned up the string
member, we assign the given value to the member that corresponds to the parameter type of the operator. In this case, our parameter is an int
, so we assign to ival
. We update the discriminant and return.
The double
and char
assignment operators behave identically to the int
version and are left as an exercise. The string
version differs from the others because it must manage the transition to and from the string
type:
Token &Token::operator=(const std::string &s)
{
if (tok == STR) // if we already hold a string, just do an assignment
sval = s;
else
new(&sval) string(s); // otherwise construct a string
tok = STR; // update the discriminant
return *this;
}
In this case, if the union
already holds a string
, we can use the normal string
assignment operator to give a new value to that string
. Otherwise, there is no existing string
object on which to invoke the string
assignment operator. Instead, we must construct a string
in the memory that holds the union
. We do so using placement new
(§ 19.1.2, p. 824) to construct a string
at the location in which sval
resides. We initialize that string
as a copy of our string
parameter. We next update the discriminant and return.
Managing Union Members That Require Copy Control
Like the type-specific assignment operators, the copy constructor and assignment operators have to test the discriminant to know how to copy the given value. To do this common work, we’ll define a member named copyUnion
.
When we call copyUnion
from the copy constructor, the union
member will have been default-initialized, meaning that the first member of the union
will have been initialized. Because our string
is not the first member, we know that the union
member doesn’t hold a string
. In the assignment operator, it is possible that the union
already holds a string
. We’ll handle that case directly in the assignment operator. That way copyUnion
can assume that if its parameter holds a string
, copyUnion
must construct its own string
:
void Token::copyUnion(const Token &t)
{
switch (t.tok) {
case Token::INT: ival = t.ival; break;
case Token::CHAR: cval = t.cval; break;
case Token::DBL: dval = t.dval; break;
// to copy a string, construct it using placement new; see (§ 19.1.2 (p. 824))
case Token::STR: new(&sval) string(t.sval); break;
}
}
This function uses a switch
statement (§ 5.3.2, p. 178) to test the discriminant. For the built-in types, we assign the value to the corresponding member; if the member we are copying is a string
, we construct it.
The assignment operator must handle three possibilities for its string
member: Both the left-hand and right-hand operands might be a string
; neither operand might be a string
; or one but not both operands might be a string
:
Token &Token::operator=(const Token &t)
{
// if this object holds a string and t doesn't, we have to free the old string
if (tok == STR && t.tok != STR) sval.~string();
if (tok == STR && t.tok == STR)
sval = t.sval; // no need to construct a new string
else
copyUnion(t); // will construct a string if t.tok is STR
tok = t.tok;
return *this;
}
If the union
in the left-hand operand holds a string
, but the union
in the right-hand does not, then we have to first free the old string
before assigning a new value to the union member. If both union
s hold a string
, we can use the normal string
assignment operator to do the copy. Otherwise, we call copyUnion
to do the assignment. Inside copyUnion
, if the right-hand operand is a string
, we’ll construct a new string
in the union
member of the left-hand operand. If neither operand is a string
, then ordinary assignment will suffice.
INFO
Exercises Section 19.6
Exercise 19.21: Write your own version of the Token
class.
Exercise 19.22: Add a member of type Sales_data
to your Token
class.
Exercise 19.23: Add a move constructor and move assignment to Token
.
Exercise 19.24: Explain what happens if we assign a Token
object to itself.
Exercise 19.25: Write assignment operators that take values of each type in the union
.