27.4 Functional Description
This section describes all the classes and major functions in our
program. For a more complete and detailed description, take a look at
the listings at the end of this chapter.
27.4.1 char_type Class
The char_type class sets the type of a
character. For the most part, this is
done through a table named type_info. Some types,
such as C_ALPHA_NUMERIC, include two different
types of characters, C_ALPHA and
C_DIGIT. Therefore, in addition to our table, we
need a little code for the special cases.
27.4.2 input_file Class
This class reads data from the input
file one character at a time. It buffers
a line and on command writes the line to the output.
27.4.3 token Class
We want an input stream of tokens. We have an input stream consisting
of characters. The main function of this class,
next_token, turns characters into tokens.
Actually, our tokenizer is rather simple, because we
don't have to deal with most of the details that a
full C++ tokenizer must handle.
The coding for this function is
fairly
straightforward, except for the fact that it breaks up multiline
comments into a series of T_COMMENT and
T_NEW_LINE tokens.
One clever trick is used in this section. The
TOKEN_LIST macro is used to generate an enumerated
list of token types and a string array containing the names of each
of the tokens. Let's examine how this is done in
more detail.
The definition of the TOKEN_LIST class is:
#define TOKEN_LIST \
T(T_NUMBER), /* Simple number (floating point or integer) */ \
T(T_STRING), /* String or character constant */ \
T(T_COMMENT), /* Comment */ \
T(T_NEWLINE), /* Newline character */ \
T(T_OPERATOR), /* Arithmetic operator */ \
T(T_L_PAREN), /* Character "(" */ \
T(T_R_PAREN), /* Character ")" */ \
T(T_L_CURLY), /* Character "{" */ \
T(T_R_CURLY), /* Character "}" */ \
T(T_ID), /* Identifier */ \
T(T_EOF) /* End of File */
When invoked, this macro will generate the code:
T(T_NUMBER),
T(T_STRING),
// .. and so on
If we define a T macro, it will be expanded when
the TOKEN_LIST macro is expanded. We would like to
use the TOKEN_LIST macro to generate a list of
names, so we define the T macro as:
#define T(x) x // Define T( ) as the name
Now, our TOKEN_LIST macro will generate:
T_NUMBER,
T_STRING,
// .. and so on
Putting all this together with a little more code, we get a way to
generate a TOKEN_TYPE enum
list:
#define T(x) x // Define T( ) as the name
enum TOKEN_TYPE {
TOKEN_LIST
};
#undef T // Remove old temporary macro
Later we redefine T so it generates a string:
#define T(x) #x // Define x as a string
This allows us to use TOKEN_LIST to generate a
list of strings containing the names of the tokens:
#define T(x) #x // Define x as a string
const char *const TOKEN_NAMES[] = {
TOKEN_LIST
};
#undef T // Remove old temporary macro
When expanded, this macro generates:
const char *const TOKEN_NAMES[] = {
"T_NUMBER",
"T_STRING",
//....
Using tricks like this is acceptable in limited cases. However, such
tricks should be extensively commented so the maintenance programmer
who has to fix your code can understand what you did.
27.4.4 stat Class
stat class is an
abstract class
that is used as a basis for the four
real statistics we are collecting. It starts with a member function
to consume tokens. This function is a pure virtual function, which
means that any derived classes must define the function
take_token:
class stat {
public:
virtual void take_token(TOKEN_TYPE token) = 0;
The function take_token generates statistics from
tokens. We need some way of printing them in two places. The first is
at the beginning of each line, and the second is at the end of the
file. Our abstract class contains two virtual functions to handle
these two cases:
virtual void line_start( ) {};
virtual void eof( ) {};
};
Unlike take_token, these functions have default
bodies—empty bodies, but bodies just the same. What does this
mean? Our derived classes must
define take_token. They don't
have to define line_start or eof.
27.4.5 line_counter Class
The simplest statistic we collect is a count
of the number of lines processed so far.
This counting is done through the line_counter
class. The only token it cares about is
T_NEW_LINE. At the beginning of each line it
outputs the line number (the current count of the
T_NEW_LINE tokens). At the end of file, this class
outputs nothing. As a matter of fact, the
line_counter class doesn't even
define an eof function. Instead, we let the
default in the base class (stat) do the
"work."
27.4.6 brace_counter Class
This class keeps track of the nesting level of
the curly
braces { }. We feed the class a stream of tokens through the
take_token member function. This function keeps
track of the left and right curly braces and ignores everything else:
// Consume tokens, count the nesting of {}
void brace_counter::take_token(TOKEN_TYPE token) {
switch (token) {
case T_L_CURLY:
++cur_level;
if (cur_level > max_level)
max_level = cur_level;
break;
case T_R_CURLY:
--cur_level;
break;
default:
// Ignore
break;
}
}
The results of this statistic are printed in two places. The first is
at the beginning of each line. The second is at the end-of-file. We
define two member functions to print these statistics:
// Output start of line statistics
// namely the current line number
void brace_counter::line_start( ) {
std::cout.setf(ios::left);
std::cout.width(2);
std::cout << '{' << cur_level << ' ';
std::cout.unsetf(std::ios::left);
std::cout.width( );
}
// Output eof statistics
// namely the total number of lines
void brace_counter::eof( ) {
std::cout << "Maximum nesting of {} : " << max_level << '\n';
}
27.4.7 paren_counter Class
This class is very similar to the brace_counter
class. As a matter of fact, it was created by copying the
brace_counter class and performing a few simple
edits.
We probably should combine the
paren_counter
class and the brace_counter class into one class
that uses a parameter to tell it what to count. Oh well, something
for the next version.
27.4.8 comment_counter Class
In this class, we keep track of
lines
with comments in them, lines with code in them, lines with both
comments and code, and lines with none. The results are printed at
the end of file.
27.4.9 do_file Procedure
The do_file procedure reads each
file
one token at a time, and sends them to the
take_token routine for every statistic class. But
how does it know what statistics classes to use? There is a list:
static line_counter line_count; // Counter of lines
static paren_counter paren_count; // Counter of ( ) levels
static brace_counter brace_count; // Counter of {} levels
static comment_counter comment_count; // Counter of comment info
// A list of the statistics we are collecting
static stat *stat_list[] = {
&line_count,
&paren_count,
&brace_count,
&comment_count,
NULL
};
A couple of things should be noted about this list: although
line_count, paren_count,
brace_count, and comment_count
are all different types, they are all based on the type
stat. This means that we can put them in an array
called stat_list. This design also makes it easy
to add another statistic to the list. All we have to do is define a
new class and put a new entry in the stat_list.
|