2.6. Defining Our Own Data Structures
FundamentalAt the most basic level, a data structure is a way to group together related data elements and a strategy for using those data. As one example, our Sales_item
class groups an ISBN, a count of how many copies of that book had been sold, and the revenue associated with those sales. It also provides a set of operations such as the isbn
function and the >>
, <<
, +
, and +=
operators.
In C++ we define our own data types by defining a class. The library types string
, istream
, and ostream
are all defined as classes, as is the Sales_item
type we used in Chapter 1. C++ support for classes is extensive—in fact, Parts III and IV are largely devoted to describing class-related features. Even though the Sales_item
class is pretty simple, we won’t be able to fully define that class until we learn how to write our own operators in Chapter 14.
2.6.1. Defining the Sales_data
Type
FundamentalAlthough we can’t yet write our Sales_item
class, we can write a more concrete class that groups the same data elements. Our strategy for using this class is that users will be able to access the data elements directly and must implement needed operations for themselves.
Because our data structure does not support any operations, we’ll name our version Sales_data
to distinguish it from Sales_item
. We’ll define our class as follows:
struct Sales_data {
std::string bookNo;
unsigned units_sold = 0;
double revenue = 0.0;
};
Our class begins with the keyword struct
, followed by the name of the class and a (possibly empty) class body. The class body is surrounded by curly braces and forms a new scope (§ 2.2.4, p. 48). The names defined inside the class must be unique within the class but can reuse names defined outside the class.
The close curly that ends the class body must be followed by a semicolon. The semicolon is needed because we can define variables after the class body:
struct Sales_data { /* ... */ } accum, trans, *salesptr;
// equivalent, but better way to define these objects
struct Sales_data { /* ... */ };
Sales_data accum, trans, *salesptr;
The semicolon marks the end of the (usually empty) list of declarators. Ordinarily, it is a bad idea to define an object as part of a class definition. Doing so obscures the code by combining the definitions of two different entities—the class and a variable—in a single statement.
WARNING
It is a common mistake among new programmers to forget the semicolon at the end of a class definition.
Class Data Members
The class body defines the members of the class. Our class has only data members. The data members of a class define the contents of the objects of that class type. Each object has its own copy of the class data members. Modifying the data members of one object does not change the data in any other Sales_data
object.
We define data members the same way that we define normal variables: We specify a base type followed by a list of one or more declarators. Our class has three data members: a member of type string
named bookNo
, an unsigned
member named units_sold
, and a member of type double
named revenue
. Each Sales_data
object will have these three data members.
Under the new standard, we can supply an in-class initializer for a data member. When we create objects, the in-class initializers will be used to initialize the data members. Members without an initializer are default initialized (§ 2.2.1, p. 43). Thus, when we define Sales_data
objects, units_sold
and revenue
will be initialized to 0, and bookNo
will be initialized to the empty string.
In-class initializers are restricted as to the form (§ 2.2.1, p. 43) we can use: They must either be enclosed inside curly braces or follow an =
sign. We may not specify an in-class initializer inside parentheses.
In § 7.2 (p. 268), we’ll see that C++ has a second keyword, class
, that can be used to define our own data structures. We’ll explain in that section why we use struct
here. Until we cover additional class-related features in Chapter 7, you should use struct
to define your own data structures.
INFO
Exercises Section 2.6.1
Exercise 2.39: Compile the following program to see what happens when you forget the semicolon after a class definition. Remember the message for future reference.
struct Foo { /* empty */ } // Note: no semicolon
int main()
{
return 0;
}
Exercise 2.40: Write your own version of the Sales_data
class.
2.6.2. Using the Sales_data
Class
FundamentalUnlike the Sales_item
class, our Sales_data
class does not provide any operations. Users of Sales_data
have to write whatever operations they need. As an example, we’ll write a version of the program from § 1.5.2 (p. 23) that printed the sum of two transactions. The input to our program will be transactions such as
0-201-78345-X 3 20.00
0-201-78345-X 2 25.00
Each transaction holds an ISBN, the count of how many books were sold, and the price at which each book was sold.
Adding Two Sales_data
Objects
Because Sales_data
provides no operations, we will have to write our own code to do the input, output, and addition operations. We’ll assume that our Sales_data
class is defined inside Sales_data.h
. We’ll see how to define this header in § 2.6.3 (p. 76).
Because this program will be longer than any we’ve written so far, we’ll explain it in separate parts. Overall, our program will have the following structure:
#include <iostream>
#include <string>
#include "Sales_data.h"
int main()
{
Sales_data data1, data2;
// code to read into data1 and data2
// code to check whether data1 and data2 have the same ISBN
// and if so print the sum of data1 and data2
}
As in our original program, we begin by including the headers we’ll need and define variables to hold the input. Note that unlike the Sales_item
version, our new program includes the string
header. We need that header because our code will have to manage the bookNo
member, which has type string
.
Reading Data into a Sales_data
Object
Although we won’t describe the library string
type in detail until Chapters 3 and 10, we need to know only a little bit about string
s in order to define and use our ISBN member. The string
type holds a sequence of characters. Its operations include the >>, <<
, and ==
operators to read, write, and compare string
s, respectively. With this knowledge we can write the code to read the first transaction:
double price = 0; // price per book, used to calculate total revenue
// read the first transactions: ISBN, number of books sold, price per book
std::cin >> data1.bookNo >> data1.units_sold >> price;
// calculate total revenue from price and units_sold
data1.revenue = data1.units_sold * price;
Our transactions contain the price at which each book was sold but our data structure stores the total revenue. We’ll read the transaction data into a double
named price
, from which we’ll calculate the revenue
member. The input statement
std::cin >> data1.bookNo >> data1.units_sold >> price;
uses the dot operator (§ 1.5.2, p. 23) to read into the bookNo
and units_sold
members of the object named data1
.
The last statement assigns the product of data1.units_sold
and price
into the revenue
member of data1
.
Our program will next repeat the same code to read data into data2
:
// read the second transaction
std::cin >> data2.bookNo >> data2.units_sold >> price;
data2.revenue = data2.units_sold * price;
Printing the Sum of Two Sales_data
Objects
Our other task is to check that the transactions are for the same ISBN. If so, we’ll print their sum, otherwise, we’ll print an error message:
if (data1.bookNo == data2.bookNo) {
unsigned totalCnt = data1.units_sold + data2.units_sold;
double totalRevenue = data1.revenue + data2.revenue;
// print: ISBN, total sold, total revenue, average price per book
std::cout << data1.bookNo << " " << totalCnt
<< " " << totalRevenue << " ";
if (totalCnt != 0)
std::cout << totalRevenue/totalCnt << std::endl;
else
std::cout << "(no sales)" << std::endl;
return 0; // indicate success
} else { // transactions weren't for the same ISBN
std::cerr << "Data must refer to the same ISBN"
<< std::endl;
return -1; // indicate failure
}
In the first if
we compare the bookNo
members of data1
and data2
. If those members are the same ISBN, we execute the code inside the curly braces. That code adds the components of our two variables. Because we’ll need to print the average price, we start by computing the total of units_sold
and revenue
and store those in totalCnt
and totalRevenue
, respectively. We print those values. Next we check that there were books sold and, if so, print the computed average price per book. If there were no sales, we print a message noting that fact.
INFO
Exercises Section 2.6.2
Exercise 2.41: Use your Sales_data
class to rewrite the exercises in § 1.5.1 (p. 22), § 1.5.2 (p. 24), and § 1.6 (p. 25). For now, you should define your Sales_data
class in the same file as your main
function.
2.6.3. Writing Our Own Header Files
FundamentalAlthough as we’ll see in § 19.7 (p. 852), we can define a class inside a function, such classes have limited functionality. As a result, classes ordinarily are not defined inside functions. When we define a class outside of a function, there may be only one definition of that class in any given source file. In addition, if we use a class in several different files, the class’ definition must be the same in each file.
In order to ensure that the class definition is the same in each file, classes are usually defined in header files. Typically, classes are stored in headers whose name derives from the name of the class. For example, the string
library type is defined in the string
header. Similarly, as we’ve already seen, we will define our Sales_data
class in a header file named Sales_data.h
.
Headers (usually) contain entities (such as class definitions and const
and constexpr
variables (§ 2.4, p. 60)) that can be defined only once in any given file. However, headers often need to use facilities from other headers. For example, because our Sales_data
class has a string
member, Sales_data.h
must #include
the string
header. As we’ve seen, programs that use Sales_data
also need to include the string
header in order to use the bookNo
member. As a result, programs that use Sales_data
will include the string
header twice: once directly and once as a side effect of including Sales_data.h
. Because a header might be included more than once, we need to write our headers in a way that is safe even if the header is included multiple times.
INFO
Whenever a header is updated, the source files that use that header must be recompiled to get the new or changed declarations.
A Brief Introduction to the Preprocessor
The most common technique for making it safe to include a header multiple times relies on the preprocessor. The preprocessor—which C++ inherits from C—is a program that runs before the compiler and changes the source text of our programs. Our programs already rely on one preprocessor facility, #include
. When the preprocessor sees a #include
, it replaces the #include
with the contents of the specified header.
C++ programs also use the preprocessor to define header guards. Header guards rely on preprocessor variables (§ 2.3.2, p. 53). Preprocessor variables have one of two possible states: defined or not defined. The #define
directive takes a name and defines that name as a preprocessor variable. There are two other directives that test whether a given preprocessor variable has or has not been defined: #ifdef
is true if the variable has been defined, and #ifndef
is true if the variable has not been defined. If the test is true, then everything following the #ifdef
or #ifndef
is processed up to the matching #endif
.
We can use these facilities to guard against multiple inclusion as follows:
#ifndef SALES_DATA_H
#define SALES_DATA_H
#include <string>
struct Sales_data {
std::string bookNo;
unsigned units_sold = 0;
double revenue = 0.0;
};
#endif
The first time Sales_data.h
is included, the #ifndef
test will succeed. The preprocessor will process the lines following #ifndef
up to the #endif
. As a result, the preprocessor variable SALES_DATA_H
will be defined and the contents of Sales_data.h
will be copied into our program. If we include Sales_data.h
later on in the same file, the #ifndef
directive will be false. The lines between it and the #endif
directive will be ignored.
WARNING
Preprocessor variable names do not respect C++ scoping rules.
Preprocessor variables, including names of header guards, must be unique throughout the program. Typically we ensure uniqueness by basing the guard’s name on the name of a class in the header. To avoid name clashes with other entities in our programs, preprocessor variables usually are written in all uppercase.
TIP
Best Practices
Headers should have guards, even if they aren’t (yet) included by another header. Header guards are trivial to write, and by habitually defining them you don’t need to decide whether they are needed.