2.6. Defining Our Own Data Structures

Fundamental

At the most basic level, a data structure is a way to group together related data elements and a strategy for using those data. As one example, our Sales_item class groups an ISBN, a count of how many copies of that book had been sold, and the revenue associated with those sales. It also provides a set of operations such as the isbn function and the >>, <<, +, and += operators.

In C++ we define our own data types by defining a class. The library types string, istream, and ostream are all defined as classes, as is the Sales_item type we used in Chapter 1. C++ support for classes is extensive—in fact, Parts III and IV are largely devoted to describing class-related features. Even though the Sales_item class is pretty simple, we won’t be able to fully define that class until we learn how to write our own operators in Chapter 14.

2.6.1. Defining the `Sales_data` Type

Fundamental

Although we can’t yet write our Sales_item class, we can write a more concrete class that groups the same data elements. Our strategy for using this class is that users will be able to access the data elements directly and must implement needed operations for themselves.

Because our data structure does not support any operations, we’ll name our version Sales_data to distinguish it from Sales_item. We’ll define our class as follows:

c++

struct Sales_data {
    std::string bookNo;
    unsigned units_sold = 0;
    double revenue = 0.0;
};

Our class begins with the keyword struct, followed by the name of the class and a (possibly empty) class body. The class body is surrounded by curly braces and forms a new scope (§ 2.2.4, p. 48). The names defined inside the class must be unique within the class but can reuse names defined outside the class.

The close curly that ends the class body must be followed by a semicolon. The semicolon is needed because we can define variables after the class body:

c++

struct Sales_data { /* ... */ } accum, trans, *salesptr;
// equivalent, but better way to define these objects
struct Sales_data { /* ... */ };
Sales_data accum, trans, *salesptr;

The semicolon marks the end of the (usually empty) list of declarators. Ordinarily, it is a bad idea to define an object as part of a class definition. Doing so obscures the code by combining the definitions of two different entities—the class and a variable—in a single statement.

WARNING

It is a common mistake among new programmers to forget the semicolon at the end of a class definition.

Class Data Members

The class body defines the members of the class. Our class has only data members. The data members of a class define the contents of the objects of that class type. Each object has its own copy of the class data members. Modifying the data members of one object does not change the data in any other Sales_data object.

We define data members the same way that we define normal variables: We specify a base type followed by a list of one or more declarators. Our class has three data members: a member of type string named bookNo, an unsigned member named units_sold, and a member of type double named revenue. Each Sales_data object will have these three data members.

Under the new standard, we can supply an in-class initializer for a data member. When we create objects, the in-class initializers will be used to initialize the data members. Members without an initializer are default initialized (§ 2.2.1, p. 43). Thus, when we define Sales_data objects, units_sold and revenue will be initialized to 0, and bookNo will be initialized to the empty string.

C++11

In-class initializers are restricted as to the form (§ 2.2.1, p. 43) we can use: They must either be enclosed inside curly braces or follow an = sign. We may not specify an in-class initializer inside parentheses.

In § 7.2 (p. 268), we’ll see that C++ has a second keyword, class, that can be used to define our own data structures. We’ll explain in that section why we use struct here. Until we cover additional class-related features in Chapter 7, you should use struct to define your own data structures.

INFO

Exercises Section 2.6.1

Exercise 2.39: Compile the following program to see what happens when you forget the semicolon after a class definition. Remember the message for future reference.

c++

struct Foo { /* empty   */ } // Note: no semicolon
int main()
{
    return 0;
}

Exercise 2.40: Write your own version of the Sales_data class.

2.6.2. Using the `Sales_data` Class

Fundamental

Unlike the Sales_item class, our Sales_data class does not provide any operations. Users of Sales_data have to write whatever operations they need. As an example, we’ll write a version of the program from § 1.5.2 (p. 23) that printed the sum of two transactions. The input to our program will be transactions such as

0-201-78345-X 3 20.00
0-201-78345-X 2 25.00

Each transaction holds an ISBN, the count of how many books were sold, and the price at which each book was sold.

Adding Two `Sales_data` Objects

Because Sales_data provides no operations, we will have to write our own code to do the input, output, and addition operations. We’ll assume that our Sales_data class is defined inside Sales_data.h. We’ll see how to define this header in § 2.6.3 (p. 76).

Because this program will be longer than any we’ve written so far, we’ll explain it in separate parts. Overall, our program will have the following structure:

c++

#include <iostream>
#include <string>
#include "Sales_data.h"
int main()
{
    Sales_data data1, data2;
    // code to read into data1 and data2
    // code to check whether data1 and data2 have the same ISBN
    //    and if so print the sum of data1 and data2
}

As in our original program, we begin by including the headers we’ll need and define variables to hold the input. Note that unlike the Sales_item version, our new program includes the string header. We need that header because our code will have to manage the bookNo member, which has type string.

Reading Data into a `Sales_data` Object

Although we won’t describe the library string type in detail until Chapters 3 and 10, we need to know only a little bit about strings in order to define and use our ISBN member. The string type holds a sequence of characters. Its operations include the >>, <<, and == operators to read, write, and compare strings, respectively. With this knowledge we can write the code to read the first transaction:

c++

double price = 0;   // price per book, used to calculate total revenue
// read the first transactions: ISBN, number of books sold, price per book
std::cin >> data1.bookNo >> data1.units_sold >> price;
// calculate total revenue from price and units_sold
data1.revenue = data1.units_sold * price;

Our transactions contain the price at which each book was sold but our data structure stores the total revenue. We’ll read the transaction data into a double named price, from which we’ll calculate the revenue member. The input statement

c++

std::cin >> data1.bookNo >> data1.units_sold >> price;

uses the dot operator (§ 1.5.2, p. 23) to read into the bookNo and units_sold members of the object named data1.

The last statement assigns the product of data1.units_sold and price into the revenue member of data1.

Our program will next repeat the same code to read data into data2:

c++

// read the second transaction
std::cin >> data2.bookNo >> data2.units_sold >> price;
data2.revenue = data2.units_sold * price;

Printing the Sum of Two `Sales_data` Objects

Our other task is to check that the transactions are for the same ISBN. If so, we’ll print their sum, otherwise, we’ll print an error message:

c++

if (data1.bookNo == data2.bookNo) {
    unsigned totalCnt = data1.units_sold + data2.units_sold;
    double totalRevenue = data1.revenue + data2.revenue;
    // print: ISBN, total sold, total revenue, average price per book
    std::cout << data1.bookNo << " " << totalCnt
              << " " << totalRevenue << " ";
    if (totalCnt != 0)
        std::cout << totalRevenue/totalCnt << std::endl;
    else
        std::cout << "(no sales)" << std::endl;
    return 0; // indicate success
} else {  // transactions weren't for the same ISBN
    std::cerr << "Data must refer to the same ISBN"
              << std::endl;
    return -1; // indicate failure
}

In the first if we compare the bookNo members of data1 and data2. If those members are the same ISBN, we execute the code inside the curly braces. That code adds the components of our two variables. Because we’ll need to print the average price, we start by computing the total of units_sold and revenue and store those in totalCnt and totalRevenue, respectively. We print those values. Next we check that there were books sold and, if so, print the computed average price per book. If there were no sales, we print a message noting that fact.

INFO

Exercises Section 2.6.2

Exercise 2.41: Use your Sales_data class to rewrite the exercises in § 1.5.1 (p. 22), § 1.5.2 (p. 24), and § 1.6 (p. 25). For now, you should define your Sales_data class in the same file as your main function.

2.6.3. Writing Our Own Header Files

Fundamental

Although as we’ll see in § 19.7 (p. 852), we can define a class inside a function, such classes have limited functionality. As a result, classes ordinarily are not defined inside functions. When we define a class outside of a function, there may be only one definition of that class in any given source file. In addition, if we use a class in several different files, the class’ definition must be the same in each file.

In order to ensure that the class definition is the same in each file, classes are usually defined in header files. Typically, classes are stored in headers whose name derives from the name of the class. For example, the string library type is defined in the string header. Similarly, as we’ve already seen, we will define our Sales_data class in a header file named Sales_data.h.

Headers (usually) contain entities (such as class definitions and const and constexpr variables (§ 2.4, p. 60)) that can be defined only once in any given file. However, headers often need to use facilities from other headers. For example, because our Sales_data class has a string member, Sales_data.h must #include the string header. As we’ve seen, programs that use Sales_data also need to include the string header in order to use the bookNo member. As a result, programs that use Sales_data will include the string header twice: once directly and once as a side effect of including Sales_data.h. Because a header might be included more than once, we need to write our headers in a way that is safe even if the header is included multiple times.

INFO

Whenever a header is updated, the source files that use that header must be recompiled to get the new or changed declarations.

A Brief Introduction to the Preprocessor

The most common technique for making it safe to include a header multiple times relies on the preprocessor. The preprocessor—which C++ inherits from C—is a program that runs before the compiler and changes the source text of our programs. Our programs already rely on one preprocessor facility, #include. When the preprocessor sees a #include, it replaces the #include with the contents of the specified header.

C++ programs also use the preprocessor to define header guards. Header guards rely on preprocessor variables (§ 2.3.2, p. 53). Preprocessor variables have one of two possible states: defined or not defined. The #define directive takes a name and defines that name as a preprocessor variable. There are two other directives that test whether a given preprocessor variable has or has not been defined: #ifdef is true if the variable has been defined, and #ifndef is true if the variable has not been defined. If the test is true, then everything following the #ifdef or #ifndef is processed up to the matching #endif.

We can use these facilities to guard against multiple inclusion as follows:

c++

#ifndef SALES_DATA_H
#define SALES_DATA_H
#include <string>
struct Sales_data {
    std::string bookNo;
    unsigned units_sold = 0;
    double revenue = 0.0;
};
#endif

The first time Sales_data.h is included, the #ifndef test will succeed. The preprocessor will process the lines following #ifndef up to the #endif. As a result, the preprocessor variable SALES_DATA_H will be defined and the contents of Sales_data.h will be copied into our program. If we include Sales_data.h later on in the same file, the #ifndef directive will be false. The lines between it and the #endif directive will be ignored.

WARNING

Preprocessor variable names do not respect C++ scoping rules.

Preprocessor variables, including names of header guards, must be unique throughout the program. Typically we ensure uniqueness by basing the guard’s name on the name of a class in the header. To avoid name clashes with other entities in our programs, preprocessor variables usually are written in all uppercase.

TIP

Best Practices

Headers should have guards, even if they aren’t (yet) included by another header. Header guards are trivial to write, and by habitually defining them you don’t need to decide whether they are needed.

INFO

Exercises Section 2.6.3

Exercise 2.42: Write your own version of the Sales_data.h header and use it to rewrite the exercise from § 2.6.2 (p. 76).

Chapter 2. Variables and Basic Types

Chapter 3. Strings, Vectors, and Arrays

Chapter 4. Expressions

Chapter 5. Statements

Chapter 6. Functions

Chapter 7. Classes

Chapter 8. The IO Library

Chapter 9. Sequential Containers

Chapter 10. Generic Algorithms

Chapter 11. Associative Containers

Chapter 12. Dynamic Memory

Chapter 13. Copy Control

Chapter 14. Overloaded Operations and Conversions

Chapter 15. Object-Oriented Programming

Chapter 16. Templates and Generic Programming

Chapter 17. Specialized Library Facilities

Chapter 18. Tools for Large Programs

Chapter 19. Specialized Tools and Techniques

A.2. A Brief Tour of the Algorithms

A.3. Random Numbers