bisonc++

bisonc++.4.09.02.tar.gz

2005-2014


bisonc++(1)

bisonc++(1)

bisonc++.4.09.02.tar.gz bisonc++ parser generator

2005-2014

NAME

bisonc++ - Generate a C++ parser class and parsing function

SYNOPSIS

bisonc++ [OPTIONS] grammar-file

SECTIONS

This manual page contains the following sections:

1. DESCRIPTION
overview and short history of of bisonc++;

2. GENERATED FILES
files bisonc++ may generate;

3. OPTIONS
Bisonc++'s command-line options;

4. DIRECTIVES
Bisonc++'s grammar-specification directives;

5. POLYMORPHIC SEMANTIC VALUES
How to use polymorphic semantic values in parsers generated by bisonc++;

6. PUBLIC MEMBERS AND -TYPES
Members and types that can be used by calling software;

7. PRIVATE ENUMS AND -TYPES
Enumerations and types only available to the Parser class;

8. PRIVATE MEMBER FUNCTIONS
Member functions that are only available to the Parser class;

9. PRIVATE DATA MEMBERS
Data members that are only available to the Parser class;

10. TYPES AND VARIABLES IN THE ANONYMOUS NAMESPACE
An overview of the types and variables that are used to define and store the grammar-tables generated by bisonc++;

11. RESTRICTIONS ON TOKEN NAMES
Name restrictions for user-defined symbols;

12. OBSOLETE SYMBOLS
Symbols available to bison(1), but not to bisonc++;

13. EXAMPLE
Guess what this is?

14. USING PARSER-CLASS SYMBOLS IN LEXICAL SCANNERS
How to refer to Parser tokens from within a lexical scanner;

15. FILES
(Skeleton) files used by bisonc++;

16. SEE ALSO
References to other programs and documentation;

17. BUGS
Some additional stuff that should not qualify as bugs.

18. ABOUT bisonc++
More history;

AUTHOR
At the end of this man-page.

Looking for a specific section? Search for its number + a dot.

1. DESCRIPTION

Bisonc++ derives from previous work on bison by Alain Coetmeur (coetmeur@icdc.fr), who created in the early '90s a C++ class encapsulating the yyparse function as generated by the GNU-bison parser generator.

Initial versions of bisonc++ (up to version 0.92) wrapped Alain's program in a program offering a more modern user-interface, removing all old-style (C) %define directives from bison++'s input specification file (see below for an in-depth discussion of the differences between bison++ and bisonc++). Starting with version 0.98, bisonc++ represents a complete rebuilt of the parser generator, closely following descriptions given in Aho, Sethi and Ullman's Dragon Book. Since version 0.98 bisonc++ is a C++ program, rather than a C program generating C++ code.

Bisonc++ expands the concepts initially implemented in bison and bison++, offering a cleaner setup of the generated parser class. The parser class is derived from a base-class, mainly containing the parser's token- and type-definitions as well as several member functions which should not be modified by the programmer.

Most of these base-class members might also be defined directly in the parser class, but were defined in the parser's base-class. This design results in a very lean parser class, declaring only members that are actually defined by the programmer or that have to be defined by bisonc++ itself (e.g., the member function parse as well as some support functions requiring access to facilities that are only available in the parser class itself, rather than in the parser's base class).

This design does not require any virtual members: the members which are not involved in the actual parsing process may always be (re)implemented directly by the programmer. Thus there is no need to apply or define virtual member functions.

In fact, there are only two public members in the parser class generated by bisonc++: setDebug (see below) and parse. Remaining members are private, and those that can be redefined by the programmer using bisonc++ usually receive initial, very simple default in-line implementations. The (partial) exception to this rule is the member function lex, producing the next lexical token. For lex either a standardized interface or a mere declaration is offered (requiring the programmer to provide his/her own lex implementation).

To enforce a primitive namespace, bison used a well-known naming-convention: all its public symbols started with yy or YY. Bison++ followed bison in this respect, even though a class by itself offers enough protection of its identifiers. Consequently, these yy and YY conventions are now outdated, and bisonc++ does not generate or use symbols defined in either the parser (base) class or in its member functions starting with yy or YY. Instead, following a suggestion by Lakos (2001), all data members start with d_, and all static data members start with s_. This convention was not introduced to enforce identifier protection, but to clarify the storage type of variables. Other (local) symbols lack specific prefixes. Furthermore, bisonc++ allows its users to define the parser class in a particular namespace of their own choice.

Bisonc++ should be used as follows:

2. GENERATED FILES

Bisonc++ may create the following files:

3. OPTIONS

Where available, single letter options are listed between parentheses beyond their associated long-option variants. Single letter options require arguments if their associated long options require arguments. Options affecting the class header or implementation header file are ignored if these files already exist. Options accepting a `filename' do not accept path names, i.e., they cannot contain directory separators (/); options accepting a 'pathname' may contain directory separators.

Some options may generate errors. This happens when an option conflicts with the contents of a file which bisonc++ cannot modify (e.g., a parser class header file exists, but doesn't define a name space, but a --namespace option was provided).

To solve the error the offending option could be omitted, the existing file could be removed, or the existing file could be hand-edited according to the option's specification. Note that bisonc++ currently does not handle the opposite error condition: if a previously used option is omitted, then bisonc++ does not detect the inconsistency. In those cases compilation errors may be generated.

4. DIRECTIVES

The following directives can be specified in the initial section of the grammar specification file. When command-line options for directives exist, they overrule the corresponding directives given in the grammar specification file. Directives affecting the class header or implementation header file are ignored if these files already exist.

Directives accepting a `filename' do not accept path names, i.e., they cannot contain directory separators (/); directives accepting a 'pathname' may contain directory separators. A 'pathname' using blank characters should be surrounded by double quotes.

Some directives may generate errors. This happens when a directive conflicts with the contents of a file which bisonc++ cannot modify (e.g., a parser class header file exists, but doesn't define a name space, but a %namespace directive was provided).

To solve such errore the offending directive could be omitted, the existing file could be removed, or the existing file could be hand-edited according to the directive's specification.

5. POLYMORPHIC SEMANTIC VALUES

The %polymorphic directive results in bisonc++ generating a parser using polymorphic semantic values. The various semantic values are specified as pairs, consisting of tags (which are C++ identifiers), and C++ type names. Tags and type names are separated from each other by colons. Multiple tag and type name combinations are separated from each other by semicolons, and an optional semicolon ends the final tag/type specification.

Here is an example, defining three semantic values: an int, a std::string and a std::vector<double>:


    %polymorphic INT: int; STRING: std::string; 
                 VECT: std::vector<double>
        
The identifier to the left of the colon is called the tag-identifier (or simply tag), and the type name to the right of the colon is called the type-name. The type-names must be built-in types or must offer default constructors.

If type-names refer to types declared in header files that were not already included by the parser's base class header, then these header files must be inserted using the %baseclass-preinclude directive.

The %type directive is used to associate (non-)terminals with semantic value types.

Semantic values may also be associated with terminal tokens. In that case it is the lexical scanner's responsibility to assign a properly typed value to the parser's STYPE__ d_val__ data member.

Non-terminals may automatically be associated with polymorphic semantic values using %type directives. E.g., after:


    %polymorphic INT: int; TEXT: std::string
    %type <INT> expr
        
the expr non-terminal returns int semantic values. In this case, a rule like:

    expr:
        expr '+' expr
        {
            $$ = $1 + $3;
        }
        
automatically associates $$, $1 and $3 with int values. $$ is an lvalue (representing the semantic value associated with the expr: rule), while $1 and $3 represent the int semantic value associated with the expr non-terminal in the production rule '-' expr (rvalues).

When negative dollar indices (like $-1) are used, pre-defined associations between non-terminals and semantic types are ignored. With positive indices or in combination with the production rule's return value $$, however, semantic value types can explicitly be specified using the common `$<type>$' or `$<type>1' syntax. (In this and following examples index number 1 represents any valid positive index; -1 represents any valid negative index).

The type-overruling syntax does not allow blanks to be used (so $<INT>$ is OK, $< INT >$ isn't).

Various combinations of type-associations and type specifications may be encountered:


$$ or $1 specifications

%type<TAG> $<tag> action:

absent no <tag> STYPE__ is used

$<id> tag-override

$<> STYPE__ is used

$<STYPE__> STYPE__ is used

STYPE__ no <tag> STYPE__ is used

$<id> tag-override

$<> STYPE__ is used

$<STYPE__> STYPE__ is used

(existing) tag no <tag> auto-tag

$<id> tag-override

$<> STYPE__ is used

$<STYPE__> STYPE__ is used

(undefined) tag no <tag> tag-error

$<id> tag-override

$<> STYPE__ is used

$<STYPE__> STYPE__ is used

auto-tag: $$ and $1 represent, respectively, $$.get<tag>() and $1.get<tag>();

tag-error: error: tag undefined;

tag-override: if id is a defined tag, then $<tag>$ and $<tag>1 represent the tag's type. Otherwise: error (using undefined tag id).


When using `$$.' or `$1.' default tags are ignored. A warning is issued that the default tag is ignored. This syntax allows members of the semantic value type (STYPE__) to be called explicitly. The default tag is only ignored if there are no additional characters (e.g., blanks, closing parentheses) between the dollar-expressions and the member selector operator (e.g., no tags are used with $1.member(), but tags are used with ($1).member()). The opposite, overriding default tag associations, is accomplished using constructions like $<STYPE__>$ and $<STYPE__>1.

When negative dollar indices are used, the appropriate tag must explicitly be specified. The next example shows how this is realized in the grammar specification file itself:


    %polymorphic INT: int
    %type <INT> ident
    %%
    
    type:
        ident arg
    ;
    
    arg:
        {
            call($-1->get<Tag__::INT>());
        }
    ;
        
In this example call may define an int or int & parameter.

It is also possible to delegate specification of the semantic value to the function call itself, as shown next:


    %polymorphic INT: int
    %type <INT> ident
    %%
    
    type:
        ident arg
    ;
    
    arg:
        {
            call($-1);
        }
    ;
        
Here, the function call could be implemented like this:

    void call(STYPE__ &st)
    {
        st->get<Tag__::INT>() = 5;
    }
        

The %polymorphic directive adds the following definitions and declarations to the generated base class header and parser source file (if the %namespace directive was used then all declared/defined elements are placed inside the name space that is specified by the %namespace directive):

The name space Meta__ contains the following elements:

Since bisonc++ declares typedef Meta__::SType STYPE__, polymorphic semantic values can be used without referring to the name space Meta__.

6. PUBLIC MEMBERS AND -TYPES

The following public members and types are available to users of the parser classes generated by bisonc++ (parser class-name prefixes (e.g., Parser::) prefixes are silently implied):

When the %polymorphic directive is used:

7. PRIVATE ENUMS AND -TYPES

The following enumerations and types can be used by members of parser classes generated by bisonc++. They are actually protected members inherited from the parser's base class.

8. PRIVATE MEMBER FUNCTIONS

The following members can be used by members of parser classes generated by bisonc++. When prefixed by Base:: they are actually protected members inherited from the parser's base class. Members for which the phrase ``Used internally'' is used should not be called by user-defined code.

:
By default implemented inline in the parser.ih internal header file, this member calls print__ to display the last received token and corrseponding matched text. The print__ member is only implemented if the --print-tokens option or %print-tokens directive was used when the parsing function was generated. Calling print__ from print is unconditional, but can easily be controlled by the using program, by defining, e.g., a command-line option.
  • void Base::push__():
    Used internally.
  • void Base::pushToken__():
    Used internally.
  • void Base::reduce__():
    Used internally.
  • void Base::symbol__():
    Used internally.
  • void Base::top__():
    Used internally. )

    9. PRIVATE DATA MEMBERS

    The following data members can be used by members of parser classes generated by bisonc++. All data members are actually protected members inherited from the parser's base class.

    10. TYPES AND VARIABLES IN THE ANONYMOUS NAMESPACE

    In the file defining the parse function the following types and variables are defined in the anonymous namespace. These are mentioned here for the sake of completeness, and are not normally accessible to other parts of the parser.

    11. RESTRICTIONS ON TOKEN NAMES

    To avoid collisions with names defined by the parser's (base) class, the following identifiers should not be used as token names:

    12. OBSOLETE SYMBOLS

    All DECLARATIONS and DEFINE symbols not listed above but defined in bison++ are obsolete with bisonc++. In particular, there is no %header{ ... %} section anymore. Also, all DEFINE symbols related to member functions are now obsolete. There is no need for these symbols anymore as they can simply be declared in the class header file and defined elsewhere.

    13. EXAMPLE

    Using a fairly worn-out example, we'll construct a simple calculator below. The basic operators as well as parentheses can be used to specify expressions, and each expression should be terminated by a newline. The program terminates when a q is entered. Empty lines result in a mere prompt.

    First an associated grammar is constructed. When a syntactic error is encountered all tokens are skipped until then next newline and a simple message is printed using the default error function. It is assumed that no semantic errors occur (in particular, no divisions by zero). The grammar is decorated with actions performed when the corresponding grammatical production rule is recognized. The grammar itself is rather standard and straightforward, but note the first part of the specification file, containing various other directives, among which the %scanner directive, resulting in a composed d_scanner object as well as an implementation of the member function int lex. In this example, a common Scanner class construction strategy was used: the class Scanner was derived from the class yyFlexLexer generated by flex++(1). The actual process of constructing a class using flex++(1) is beyond the scope of this man-page, but flex++(1)'s specification file is mentioned below, to further complete the example. Here is bisonc++'s input file:

    %filenames parser
    %scanner ../scanner/scanner.h
    
                                    // lowest precedence
    %token  NUMBER                  // integral numbers
            EOLN                    // newline
    
    %left   '+' '-' 
    %left   '*' '/' 
    %right  UNARY
                                    // highest precedence 
    
    %%
    
    expressions:
        expressions  evaluate
    |
        prompt
    ;
    
    evaluate:
        alternative prompt
    ;
    
    prompt:
        {
            prompt();
        }
    ;
    
    alternative:
        expression EOLN
        {
            cout << $1 << endl;
        }
    |
        'q' done
    |
        EOLN
    |
        error EOLN
    ;
    
    done:
        {
            cout << "Done.\n";
            ACCEPT();
        }
    ;
    
    expression:
        expression '+' expression
        {
            $$ = $1 + $3;
        }
    |
        expression '-' expression
        {
            $$ = $1 - $3;
        }
    |
        expression '*' expression
        {
            $$ = $1 * $3;
        }
    |
        expression '/' expression
        {
            $$ = $1 / $3;
        }
    |
        '-' expression      %prec UNARY
        {
            $$ = -$2;
        }
    |
        '+' expression      %prec UNARY
        {
            $$ = $2;
        }
    |
        '(' expression ')'
        {
            $$ = $2;
        }
    |
        NUMBER
        {
            $$ = stoul(d_scanner.matched());
        }
    ;
    

    Next, bisonc++ processes this file. In the process, bisonc++ generates the following files from its skeletons:

    14. USING PARSER-CLASS SYMBOLS IN LEXICAL SCANNERS

    Note here that although the file parserbase.h, defining the parser class' base-class, rather than the header file parser.h defining the parser class is included, the lexical scanner may simply return tokens of the class Parser (e.g., Parser::NUMBER rather than ParserBase::NUMBER). In fact, using a simple #define - #undef pair generated by the bisonc++ respectively at the end of the base class header the file and just before the definition of the parser class itself it is the possible to assume in the lexical scanner that all symbols defined in the the parser's base class are actually defined in the parser class itself. It the should be noted that this feature can only be used to access base class the enum and types. The actual parser class is not available by the time the the lexical scanner is defined, thus avoiding circular class dependencies.

    15. FILES

    16. SEE ALSO

    bison(1), bison++(1), bison.info (using texinfo), flex++(1)

    Lakos, J. (2001) Large Scale C++ Software Design, Addison Wesley.
    Aho, A.V., Sethi, R., Ullman, J.D. (1986) Compilers, Addison Wesley.

    17. BUGS

    Parser-class header files (e.g., Parser.h) and parser-class internal header files (e.g., Parser.ih) generated with bisonc++ < 4.02.00 require two hand-modifications when used in combination with bisonc++ >= 4.02.00. See the description of exceptionHandler__ for details.

    Discontinued options:

    To avoid collisions with names defined by the parser's (base) class, the following identifiers should not be used as token nams:

    When re-using files generated by bisonc++ before version 2.0.0, minor hand-modification might be necessary. The identifiers in the following list (defined in the parser's base class) now have two underscores affixed to them: LTYPE, STYPE and Tokens. When using classes derived from the generated parser class, the following identifiers are available in such derived classes: DEFAULT_RECOVERY_MODE, ErrorRecovery, Return, UNEXPECTED_TOKEN, d_debug, d_loc, d_lsp, d_nErrors, d_nextToken, d_state, d_token, d_val, and d_vsp. When used in derived classes, they too need two underscores affixed to them.

    The member function void lookup (< 1.00) was replaced by int lookup. When regenerating parsers created by early versions of bisonc++ (versions before version 1.00), lookup's prototype should be corrected by hand, since bisonc++ will not by itself rewrite the parser class's header file.

    The Semantic parser, mentioned in bison++(1) is not implemented in bisonc++(1). According to bison++(1) the semantic parser was not available in bison++ either. It is possible that the Pure parser is now available through the --thread-safe option.

    18. ABOUT bisonc++

    Bisonc++ was based on bison++, originally developed by Alain Coetmeur (coetmeur@icdc.fr), R&D department (RDT), Informatique-CDC, France, who based his work on bison, GNU version 1.21.

    Bisonc++ version 0.98 and beyond is a complete rewrite of an LALR-1 parser generator, closely following the construction process as described in Aho, Sethi and Ullman's (1986) book Compilers (i.e., the Dragon book). It uses the same grammar specification as bison and bison++, and it uses practically the same options and directives as bisonc++ versions earlier than 0.98. Variables, declarations and macros that are obsolete were removed.

    AUTHOR

    Frank B. Brokken (f.b.brokken@rug.nl).