Chapter 6: The Generated Parser Class' Members

Bisonc++ generates a C++ class, rather than a function like Bison. Bisonc++'s class is a plain C++ class and not a fairly complex macro-based class like the one generated by Bison++. The C++ class generated by bisonc++ does not have (need) virtual members. Its essential member (the member parse()) is generated from the grammar specification and so the software engineer will therefore hardly ever feel the need to override that function. All but a few of the remaining predefined members have very clear definitions and meanings as well, making it unlikely that they should ever require overriding.

It is likely that members like lex() and/or error() need dedicated definitions with different parsers generated by Bison++; but then again: while defining the grammar the definition of the associated support members is a natural extension of defining the grammar, and can be realized in parallel with defining the grammar, in practice not requiring any virtual members. By not defining (requiring) virtual members the parser's class organization is simplified, and calling non-virtual members will be just a trifle faster than calling these member functions as virtual functions.

In this chapter all available members and features of the generated parser class are discussed. Having read this chapter you should be able to use the generated parser class in your program (using its public members) and to use its facilities in the actions defined for the various production rules and/or use these facilities in additional class members that you might have defined yourself.

In the remainder of this chapter the class's public members are first discussed, to be followed by the class's private members. While constructing the grammar all private members are available in the action parts of the grammaticalrules. Furthermore, any member (and so not just from the action blocks) may generate errors (thus initiating error recovery procedures) and may flag the (un)successful parsing of the information given to the parser (terminating the parsing function parse()).

6.1: Public Members and Types

The following public members and types are available to users of the parser classes generated by bisonc++ (parser class-name prefixes (e.g., Parser::) prefixes are silently implied):

LTYPE__:
The parser's location type (user-definable). Available only when either %lsp-needed, %ltype or %locationstruct has been declared.
STYPE__:
The parser's stack-type (user-definable), defaults to int.
Tokens__:
The enumeration type of all the symbolic tokens defined in the grammar file (i.e., bisonc++'s input file). The scanner should be prepared to return these symbolic tokens. Note that, since the symbolic tokens are defined in the parser's class and not in the scanner's class, the lexical scanner must prefix the parser's class name to the symbolic token names when they are returned. E.g., return Parser::IDENT should be used rather than return IDENT.
int parse():
The parser's parsing member function. It returns 0 when parsing was successfully completed; 1 if errors were encountered while parsing the input.
void setDebug(bool mode):
This member can be used to activate or deactivate the debug-code compiled into the parsing function. It is always defined but is only operational if the %debug directive or --debug option was specified. When debugging code has been compiled into the parsing function, it is not active by default. To activate the debugging code, use setDebug(true).
This member can be used to activate or deactivate the debug-code compiled into the parsing function. It is available but has no effect if no debug code has been compiled into the parsing function. When debugging code has been compiled into the parsing function, it is active by default, but debug-code is suppressed by calling setDebug(false).

When the %polymorphic directive is used:

Meta__:
Templates and classes that are required for implementing the polymorphic semantic values are all declared in the Meta__ namespace. The Meta__ namespace itself is nested under the namespace that may have been declared by the %namespace directive.
Tag__:
The (strongly typed) enum class Tag__ contains all the tag-identifiers specified by the %polymorphic directive. It is declared outside of the Parser's class, but within the namespace that may have been declared by the %namespace directive.

6.2: Protected Enumerations and Types

The following enumerations and types can be used by members of parser classes generated by bisonc++. They are actually protected members inherited from the parser's base class.

Base::ErrorRecovery__:
This enumeration defines two values:
```
    DEFAULT_RECOVERY_MODE__,
    UNEXPECTED_TOKEN__
        
```
The DEFAULT_RECOVERY_MODE__ terminates the parsing process. The non-default recovery procedure is available once an error token is used in a production rule. When the parsing process throws UNEXPECTED_TOKEN__ the recovery procedure is started (i.e., it is started whenever a syntactic error is encountered or ERROR() is called).
The recovery procedure consists of (1) looking for the first state on the state-stack having an error-production, followed by (2) handling all state transitions that are possible without retrieving a terminal token. Then, in the state requiring a terminal token and starting with the initial unexpected token (3) all subsequent terminal tokens are ignored until a token is retrieved which is a continuation token in that state.
If the error recovery procedure fails (i.e., if no acceptable token is ever encountered) error recovery falls back to the default recovery mode (i.e., the parsing process is terminated).
Base::Return__:
This enumeration defines two values:
```
    PARSE_ACCEPT = 0,
    PARSE_ABORT = 1
        
```
(which are of course the parse function's return values).

6.3: Non-public Member Functions

The following members can be used by members of parser classes generated by bisonc++. When prefixed by Base:: they are actually protected members inherited from the parser's base class. Members for which the phrase ``Used internally'' is used should not be called by user-defined code.

Base::ParserBase():
Used internally.
void Base::ABORT() const throw(Return__):
This member can be called from any member function (called from any of the parser's action blocks) to indicate a failure while parsing thus terminating the parsing function with an error value 1. Note that this offers a marked extension and improvement of the macro YYABORT defined by bison++ in that YYABORT could not be called from outside of the parsing member function.
void Base::ACCEPT() const throw(Return__):
This member can be called from any member function (called from any of the parser's action blocks) to indicate successful parsing and thus terminating the parsing function. Note that this offers a marked extension and improvement of the macro YYACCEPT defined by bison++ in that YYACCEPT could not be called from outside of the parsing member function.
void Base::clearin():
This member replaces bison(++)'s macro yyclearin and causes bisonc++ to request another token from its lex+nop()() member, even if the current token has not yet been processed. It is a useful member when the parser should be reset to its initial state, e.g., between successive calls of parse. In this situation the scanner must probably be reloaded with new information as well.
bool Base::debug() const:
This member returns the current value of the debug variable.
void Base::ERROR() const throw(ErrorRecovery__):
This member can be called from any member function (called from any of the parser's action blocks) to generate an error, and results in the parser executing its error recovery code. Note that this offers a marked extension and improvement of the macro YYERROR defined by bison++ in that YYERROR could not be called from outside of the parsing member function.
void error(char const *msg):
By default implemented inline in the parser.ih internal header file, it writes a simple message to the standard error stream. It is called when a syntactic error is encountered, and its default implementation may safely be altered.
void errorRecovery__():
Used internally.
void Base::errorVerbose__():
Used internally.
void exceptionHandler__(std::exception const &exc):
This member's default implementation is provided inline in the parser.ih internal header file. It consists of a mere throw statement, rethrowing a caught exception.
The parse member function's body essentially consists of a while statement, in which the next token is obtained via the parser's lex member. This token is then processed according to the current state of the parsing process. This may result in executing actions over which the parsing process has no control and which may result in exceptions being thrown.
Such exceptions do not necessarily have to terminate the parsing process: they could be thrown by code, linked to the parser, that simply checks for semantic errors (like divisions by zero) throwing exceptions if such errors are observed.
The member exceptionHandler__ receives and may handle such exceptions without necessarily ending the parsing process. It receives any std::exception thrown by the parser's actions, as though the action block itself was surrounded by a try ... catch statement. It is of course still possible to use an explicit try ... catch statement within action blocks. However, exceptionHandler__ can be used to factor out code that is common to various action blocks.
The next example shows an explicit implementation of exceptionHandler__: any std::exception thrown by the parser's action blocks is caught, showing the exception's message, and increasing the parser's error count. After this parsing continues as if no exception had been thrown:
```
    void Parser::exceptionHandler__(std::exception const &exc)
    {
        std::cout << exc.what() << '\n';
        ++d_nErrors__;
    }
            
```
Note: Parser-class header files (e.g., Parser.h) and parser-class internal header files (e.g., Parser.ih) generated with bisonc++ < 4.02.00 require two hand-modifications when using bisonc++ >= 4.02.00:
In Parser.h, just below the declaration
```
    void print__();
        
```
add:
```
    void exceptionHandler__(std::exception const &exc);
        
```
In Parser.ih, assuming the name of the generated class is `Parser', add the following member definition (if a namespace is used: within the namespace's scope):
```
    inline void Parser::exceptionHandler__(std::exception const &exc)
    {
        throw;  // re-implement to handle exceptions thrown by actions
    }
        
```
void executeAction(int):
Used internally.
int lex():
By default implemented inline in the parser.ih internal header file, it can be pre-implemented by bisonc++ using the scanner option or directive (see above); alternatively it must be implemented by the programmer. It interfaces to the lexical scanner, and should return the next token produced by the lexical scanner, either as a plain character or as one of the symbolic tokens defined in the Parser::Tokens__ enumeration. Zero or negative token values are interpreted as `end of input'.
int lookup(bool):
Used internally.
void nextToken():
Used internally.
void Base::pop__():
Used internally.
void Base::popToken__():
Used internally.
void print__()():
Used internally.
void print()

:
By default implemented inline in the parser.ih internal header file, this member calls print__ to display the last received token and corrseponding matched text. The print__ member is only implemented if the --print-tokens option or %print-tokens directive was used when the parsing function was generated. Calling print__ from print is unconditional, but can easily be controlled by the using program, by defining, e.g., a command-line option.

void Base::push__():
Used internally.

void Base::pushToken__():
Used internally.

void Base::reduce__():
Used internally.

void Base::symbol__():
Used internally.

void Base::top__():
Used internally. )

6.3.1: `lex()': Interfacing the Lexical Analyzer

The int lex() private member function is called by the parse() member to obtain the next lexical token. By default it is not implemented, but the %scanner directive (see section 5.5.19) may be used to pre-implement a standard interface to a lexical analyzer.

The lex() member function interfaces to the lexical scanner, and it is expected to return the next token produced by the lexical scanner. This token may either be a plain character or it may be one of the symbolic tokens defined in the Parser::Tokens enumeration. Any zero or negative token value is interpreted as `end of input', causing parse() to return.

The lex() member function may be implemented in various ways:

By default, if the --scanner option or %scanner directive is provided bisonc++ assumes that it should interface to the scanner generated by flexc++(1). In this case, the scanner token function is called as
```
    d_scanner.lex()
        
```
and the scanner's matched text function is called as
```
    d_scanner.matched()
        
```
lex() may itself implement a lexical analyzer (a scanner). This may actually be a useful option when the input offered to the program using bisonc++'s parser class is not overly complex. This approach was used when implementing the earlier examples (see sections 4.1.3 and 4.4.4).
lex() may call a external function or member function of class implementing a lexical scanner, and return the information offered by this external function. When using a class, an object of that class could also be defined as additional data member of the parser (see the next alternative). This approach can be followed when generating a lexical scanner from a lexical scanner generating tool like lex(1) or flex++(1). The latter program allows its users to generate a scanner class.
To interface bisonc++ to code generated by flex(1), the --flex option or %flex directive can be used in combination with the --scanner directive or %scanner option. In this case the scanner token function is called as
```
    d_scanner.yylex()
        
```
and the scanner's matched text function is called as
```
    d_scanner.YYText()
        
```

6.4: Protected Data Members

The following private members can be used by members of parser classes generated by bisonc++. All data members are actually protected members inherited from the parser's base class.

size_t d_acceptedTokens__:
Counts the number of accepted tokens since the start of the parse() function or since the last detected syntactic error. It is initialized to d_requiredTokens__ to allow an early error to be detected as well.
bool d_debug__:
When the debug option has been specified, this variable (true by default) determines whether debug information is actually displayed.
LTYPE__ d_loc__:
The location type value associated with a terminal token. It can be used by, e.g., lexical scanners to pass location information of a matched token to the parser in parallel with a returned token. It is available only when %lsp-needed, %ltype or %locationstruct has been defined.
Lexical scanners may be offered the facility to assign a value to this variable in parallel with a returned token. In order to allow a scanner access to d_loc__, d_loc__'s address should be passed to the scanner. This can be realized, for example, by defining a member void setLoc(STYPE__ *) in the lexical scanner, which is then called from the parser's constructor as follows:
```
            d_scanner.setSLoc(&d_loc__);
       
```
Subsequently, the lexical scanner may assign a value to the parser's d_loc__ variable through the pointer to d_loc__ stored inside the lexical scanner.
LTYPE__ d_lsp__:
The location stack pointer. Used internally.
size_t d_nErrors__:
The number of errors counted by parse(). It is initialized by the parser's base class initializer, and is updated while parse() executes. When parse() has returned it contains the total number of errors counted by parse(). Errors are not counted if suppressed (i.e., if d_acceptedTokens__ is less than d_requiredTokens__).
size_t d_nextToken__:
A pending token. Do not modify.
size_t d_requiredTokens__:
Defines the minimum number of accepted tokens that the parse() function must have processed before a syntactic error can be generated.
int d_state__:
The current parsing state. Do not modify.
int d_token__:
The current token. Do not modify.
STYPE d_val__:
The semantic value of a returned token or non-terminal symbol. With non-terminal tokens it is assigned a value through the action rule's symbol $$. Lexical scanners may be offered the facility to assign a semantic value to this variable in parallel with a returned token. In order to allow a scanner access to d_val__, d_val__'s address should be passed to the scanner. This can be realized, for example, by defining a member void setSval(STYPE__ *) in the lexical scanner, which is then called from the parser's constructor as follows:
```
            d_scanner.setSval(&d_val__);
       
```
Subsequently, the lexical scanner may assign a value to the parser's d_val__ variable through the pointer to d_val__ stored inside the lexical scanner.
Note that in some cases this approach must be used to make available the correct semantic value to the parser. In particular, when a grammar state defines multiple reductions, depending on the next token, the reduction's action only takes place following the retrieval of the next token, thus losing the initially matched token text. As an example, consider the following little grammar:
```
        expr:
            name
        |
            ident '(' ')'
        |
            NR
        ;

        name:   
            IDENT
        ;

        ident: IDENT ; 
            
```
Having recognized IDENT two reductions are possible: to name and to ident. The reduction to ident is appropriate when the next token is (, otherwise the reduction to name is performed. So, the parser asks for the next token, thereby destroying the text matching IDENT before ident or name's actions are able to save the text themselves. To enure the availability of the text matching IDENT is situations like these the scanner must assign the proper semantic value when it recognizes a token. Consequently the parser's d_val__ data member must be made available to the scanner.
LTYPE__ d_vsp__:
The semantic value stack pointer. Do not modify.

6.5: Types and Variables in the Anonymous Namespace

In the file defining the parse function the following types and variables are defined in the anonymous namespace. These are mentioned here for the sake of completeness, and are not normally accessible to other parts of the parser.

char const author[]:
Defining the name and e-mail address of Bisonc++'s author.
ReservedTokens:
This enumeration defines some token values used internally by the parsing functions. They are:
```
    PARSE_ACCEPT   =  0,
    _UNDETERMINED_ = -2,
    _EOF_          = -1,
    _error_        = 256,
       
```
These tokens are used by the parser to determine whether another token should be requested from the lexical scanner, and to handle error-conditions.

StateType:
This enumeration defines several moe token values used internally by the parsing functions. They are:


        NORMAL,
        ERR_ITEM,
        REQ_TOKEN,
        ERR_REQ,    // ERR_ITEM | REQ_TOKEN
        DEF_RED,    // state having default reduction
        ERR_DEF,    // ERR_ITEM | DEF_RED
        REQ_DEF,    // REQ_TOKEN | DEF_RED
        ERR_REQ_DEF // ERR_ITEM | REQ_TOKEN | DEF_RED

These tokens are used by the parser to define the types of the various states of the analyzed grammar.

PI__ (Production Info):
This struct provides information about production rules. It has two fields: d_nonTerm is the identification number of the production's non-terminal, d_size represents the number of elements of the productin rule.
static PI__ s_productionInfo:
Used internally by the parsing function.
SR__ (Shift-Reduce Info):
This struct provides the shift/reduce information for the various grammatic states. SR__ values are collected in arrays, one array per grammatic state. These array, named s_<nr>, where tt<nr> is a state number are defined in the anonymous namespace as well. The SR__ elements consist of two unions, defining fields that are applicable to, respectively, the first, intermediate and the last array elements.
The first element of each array consists of (1st field) a StateType and (2nd field) the index of the last array element; intermediate elements consist of (1st field) a symbol value and (2nd field) (if negative) the production rule number reducing to the indicated symbol value or (if positive) the next state when the symbol given in the 1st field is the current token; the last element of each array consists of (1st field) a placeholder for the current token and (2nd field) the (negative) rule number to reduce to by default or the (positive) number of an error-state to go to when an erroneous token has been retrieved. If the 2nd field is zero, no error or default action has been defined for the state, and error-recovery is attepted.
STACK_EXPANSION:
An enumeration value specifying the number of additional elements that are added to the state- and semantic value stacks when full.
static SR__ s_<nr>[]:
Here, <nr> is a numerical value representing a state number. Used internally by the parsing function.
static SR__ *s_state[]:
Used internally by the parsing function.

6.6: Summary of Special Constructions for Actions

Here is an overview of special syntactic constructions that may be used inside action blocks:

$$: This acts like a variable that contains the semantic value for the grouping made by the current rule. See section 5.6.4.
$n: This acts like a variable that contains the semantic value for the n-th component of the current rule. See section 5.6.4.
$<typealt>$ : This is like $$, but it specifies alternative typealt in the union specified by the %union directive. See sections 5.6.1 and 5.6.2.
$<typealt>n: This is like $n but it specifies an alternative typealt in the union specified by the %union directive. See sections 5.6.1 and 5.6.2.
@n: This acts like a structure variable containing information on the line numbers and column numbers of the nth component of the current rule. The default structure is defined like this (see section 5.5.10):
```
    struct LTYPE__
    {
        int timestamp;
        int first_line;
        int first_column;
        int last_line;
        int last_column;
        char *text;
    };
           
```
Thus, to get the starting line number of the third component, you would use @3.first_line.
In order for the members of this structure to contain valid information, you must make sure the lexical scanner supplies this information about each token. If you need only certain fields, then the lexical scanner only has to provide those fields.
Be advised that using this or corresponding (custom-defined, see sections 5.5.11 and 5.5.9) may slow down the parsing process noticeably.