flexc++ - Generate a C++ scanner class and parsing function
SYNOPSIS
flexc++ [options] rules-file
DESCRIPTION
Flexc++(1) was designed after flex(1) and flex++(1). Like these
latter two programs flexc++ generates code performing pattern-matching on text,
possibly executing actions when certain regular expressions are
recognized.
Flexc++, contrary to flex and flex++, generates code that is
explicitly intended for use by C++ programs. The well-known flex(1)
program generates C source-code and flex++(1) merely offers a
C++-like shell around the yylex function generated by flex(1) and
hardly supports present-day ideas about C++ software development.
Contrary to this, flexc++ creates a C++ class offering a predefined
member function lex matching input against regular expressions and
possibly executing C++ code once regular expressions were matched. The
code generated by flexc++ is pure C++, allowing its users to apply all
of the features offered by that language.
Not every aspect of flexc++ is covered by the man-pages. In addition to
what's summarized by the man-pages the flexc++ manual offers a chapter covering
pre-loading of input lines (allowing you to, e.g, display lines in which
errors are observed even though not all of the line's tokens have already been
scanned), as well as a chapter covering technical documentation about the
inner working of flexc++.
From version 0.92.00 Until version 1.07.00 flexc++ offered one big manual
page. The advantage of that being that you never had to look for which manual
page contained which information. At the same time, flexc++'s man-page grew into
a huge man-page, in which it was hard to find your way. Starting with release
1.08.00 we reverted back to using multiple man-pages. The following index
relates manual pages to their specific contents:
This man-page
This man-page offers the following sections:
1. QUICK START: a quick start overview about how to use flexc++.
2. QUICK START: FLEXC++ and BISONC++: a quick start overview
about how to use flexc++ in combination with bisonc++(1)
3. GENERATED FILES: files generated by flexc++ and their purposes
4. OPTIONS: options available for flexc++
The flexc++api(3) man-page:
This man-page describes the classes generated by flexc++, describing flexc++'s
actions from the programmer's point of view.
1. INTERACTIVE SCANNERS: how to create an interactive scanner
2. THE CLASS INTERFACE: SCANNER.H: Constructors and members
of the scanner class generated by flexc++
3. NAMING CONVENTION: symbols defined by flexc++ in the scanner
class.
4. CONSTRUCTORS: constructors defined in the scanner class.
5. PUBLIC MEMBER FUNCTION: public member declared in the scanner
class.
6. PRIVATE MEMBER FUNCTIONS: private members declared in the
scanner class.
7. SCANNER CLASS HEADER EXAMPLE: an example of a generated
scanner class header
8. THE SCANNER BASE CLASS: the scanner class is derived from a
base class. The base class is described in this section
9. PUBLIC ENUMS AND -TYPES: enums and types declared by the
base class
10. PROTECTED ENUMS AND -TYPES: enumerations and types used by
the scanner and scanner base classes
11. NO PUBLIC CONSTRUCTORS: the scanner base class does not
offer public constructors.
12. PUBLIC MEMBER FUNCTIONS: several members defined by the
scanner base class have public access rights.
13. PROTECTED CONSTRUCTORS: the base class can be constructed by
a derived class. Usually this is the scanner class generated by flexc++.
14. PROTECTED MEMBER FUNCTIONS: this section covers the base
class member functions that can only be used by scanner class or
scanner base class members
15. PROTECTED DATA MEMBERS: this section covers the base class
data members that can only be used by scanner class or scanner base
class members
16. FLEX++ TO FLEXC++ MEMBERS: a short overview of frequently
used flex(1) members that received different names in flexc++.
17. THE CLASS INPUT: the scanner's job is completely decoupled
from the actual input stream. The class Input, nested within the
scanner base class handles the communication with the input
streams. The class Input, is described in this section.
18. INPUT CONSTRUCTORS: the class Input can easily be
replaced by another class. The constructor-requirements are described
in this section.
19. REQUIRED PUBLIC MEMBER FUNCTIONS: this section covers the
required public members of a self-made Input class
The flexc++input(7) man-page:
This man-page describes how flexc++'s input s should be organized. It
contains the following sections:
1. SPECIFICATION FILE(S): the format and contents of flexc++ input
files, specifying the Scanner's characteristics
2. FILE SWITCHING: how to switch to another input specification
file
3. DIRECTIVES: directives that can be used in input
specification files
4. MINI SCANNERS: how to declare mini-scanners
5. DEFINITIONS: how to define symbolic names for regular
expressions
6. %% SEPARATOR: the separator between the input specification
sections
7. REGULAR EXPRESSIONS: regular expressions supported by flexc++
8. SPECIFICATION EXAMPLE: an example of a specification file
1. QUICK START
A bare-bones, no-frills scanner is generated as follows:
Create a file lexer defining the regular expressions to
recognize, and the tokens to return. Use token values exceeding 0xff if plain
ascii character values can also be used as token values. Example (assume
capitalized words are token-symbols defined in an enum defined by the scanner
class):
%%
[ \t\n]+ // skip white space chars.
[0-9]+ return NUMBER;
[[:alpha:]_][[:alpha:][:digit:]_]* return IDENTIFIER;
. return matched()[0];
Execute:
flexc++ lexer
This generates four files
:Scanner.h, Scanner.ih, Scannerbase.h, and
lex.cc
Edit Scanner.h, add the enum defining the token-symbols in
(usually) the public section of the class Scanner. E.g.,
class Scanner: public ScannerBase
{
public:
enum Tokens
{
IDENTIFIER = 0x100,
NUMBER
};
// ... (etc, as generated by flexc++)
Create a file defining int main, e.g.:
#include <iostream>
#include "Scanner.h"
using namespace std;
int main()
{
Scanner scanner; // define a Scanner object
while (int token = scanner.lex()) // get all tokens
{
string const &text = scanner.matched();
switch (token)
{
case Scanner::IDENTIFIER:
cout << "identifier: " << text << '\n';
break;
case Scanner::NUMBER:
cout << "number: " << text << '\n';
break;
default:
cout << "char. token: `" << text << "'\n";
break;
}
}
}
Compile all .cc files:
g++ --std=c++11 *.cc
To `tokenize' main.cc, execute:
a.out < main.cc
)
2. QUICK START: FLEXC++ and BISONC++
To interface flexc++ to the bisonc++(1) parser generator proceed as follows:
Specify a grammar that can be processed by bisonc++(1). Assuming
that the scanner and parser are developed in, respectively, the
sub-directories scanner and parser, then a simple grammar
specification that can be used with the scanner developed in the previous
section is, e.g., write the file parser/grammar:
Execute the program, providing it some source file to be processed:
a.out < main.cc
3. GENERATED FILES
Flexc++ generates four files from a well-formed input file:
A file containing the implementation of the lex member function
and its support functions. By default this file is named lex.cc.
A file containing the scanner's class interface. By default this file
is named Scanner.h. The scanner class itself is generated once and is
thereafter `owned' by the programmer, who may change it ad-lib. Newly
added members (data members, function members) will survive future flexc++ runs
as flexc++ will never rewrite an existing scanner class interface file, unless
explicitly ordered to do so.
A file containing the interface of the scanner class's base
class. The scanner class is publicly derived from this base class. It is used
to minimize the size of the scanner interface itself. The scanner base class
is `owned' by flexc++ and should never be hand-modified. By
default the scanner's base class is provided in the file
Scannerbase.h. At each new flexc++ run this file is rewritten unless flexc++
is explicitly ordered not to do so.
A file containing the implementation header. This file should
contain includes and declarations that are only required when compiling the
members of the scanner class. By default this file is named
Scanner.ih. This file, like the file containing the scanner class's
interface is never rewritten by flexc++ unless flexc++ is explicitly ordered to do
so.
4. OPTIONS
Where available, single letter options are listed between parentheses
following their associated long-option variants. Single letter options require
arguments if their associated long options require arguments as well. Options
affecting the class header or implementation header file are ignored if these
files already exist. Options accepting a `filename' do not accept path names,
i.e., they cannot contain directory separators (/); options accepting a
'pathname' may contain directory separators.
Some options may generate errors. This happens when an option conflicts with
the contents of an existing file which flexc++ cannot modify (e.g., a scanner
class header file exists, but doesn't define a name space, but a
--namespace option was provided). To solve the error the offending option
could be omitted, the existing file could be removed, or the existing file
could be hand-edited according to the option's specification. Note that flexc++
currently does not handle the opposite error condition: if a previously used
option is omitted, then flexc++ does not detect the inconsistency. In those
cases you may encounter compilation errors.
--baseclass-header=filename (-b)
Use filename as the name of the file to contain the scanner
class's base class. Defaults to the name of the scanner class plus
base.h
It is an error if this option is used and an already
existing scanner-class header file does not include
`filename'.
--baseclass-skeleton=pathname (-C)
Use pathname as the path to the file containing the skeleton of
the scanner class's base class. Its filename defaults to
flexc++base.h.
--case-insensitive
Use this option to generate a scanner case insensitively
matching regular expressions. All regular expressions specified in
flexc++'s input file are interpreted case insensitively and the
resulting scanner object will case insensitively interpret its
input.
When this option is specified the resulting scanner does not
distinguish between the following rules:
First // initial F is transformed to f
first
FIRST // all capitals are transformed to lower case chars
With a case-insensitive scanner only the first rule can be matched,
and flexc++ will issue warnings for the second and third rule about
rules that cannot be matched.
Input processed by a case-insensitive scanner is also handled case
insensitively. The above mentioned First rule is matched for
all of the following input words: first First FIRST firST.
Although the matching process proceeds case insensitively, the
matched text (as returned by the scanner's matched() member)
always contains the original, unmodified text. So, with the above
input matched() returns, respectively first, First, FIRST
and firST, while matching the rule First.
--class-header=filename (-c)
Use filename as the name of the file to contain the scanner
class. Defaults to the name of the scanner class plus the suffix
.h
--class-name=className
Use className (rather than Scanner) as the name of the
scanner class. Unless overridden by other options generated files
will be given the (transformed to lower case) className* name
instead of scanner*.
It is an error if this option is used and an already
existing scanner-class header file does not define class
`className'
--class-skeleton=pathname (-C)
Use pathname as the path to the file containing the skeleton of
the scanner class. Its filename defaults to flexc++.h.
--construction (-K)
Write details about the lexical scanner to the file
`rules-file'.output. Details cover the used character ranges,
information about the regexes, the raw NFA states, and the final
DFAs.
--debug (-d)
Provide lex and its support functions with debugging code,
showing the actual parsing process on the standard output
stream. When included, the debugging output is active by default,
but its activity may be controlled using the setDebug(bool
on-off) member. Note that #ifdef DEBUG macros are not used
anymore. By rerunning flexc++ without the --debug option an
equivalent scanner is generated not containing the debugging
code.
--filenames=genericName (-f)
Generic name of generated files (header files, not the
lex-function source file, see the --lex-source option for
that). By default the header file names will be equal to the name
of the generated class.
--help (-h)
Write basic usage information to the standard output stream and
terminate.
--implementation-header=filename (-i)
Use filename as the name of the file to contain the
implementation header. Defaults to the name of the generated
scanner class plus the suffix .ih. The implementation header
should contain all directives and declarations only used by
the implementations of the scanner's member functions. It is the
only header file that is included by the source file containing
lex()'s implementation. User defined implementation of other
class members may use the same convention, thus concentrating all
directives and declarations that are required for the compilation
of other source files belonging to the scanner class in one header
file.
It is an error if this option is used and an already
'filename' file does not include the scanner class header
file.
--implementation-skeleton=pathname (-I)
Use pathname as the path to the file containing the skeleton of
the implementation header. Its filename defaults to
flexc++.ih.
--lex-skeleton=pathname (-L)
Use pathname as the path to the file containing the
lex() member function's skeleton. Its filename defaults to
flexc++.cc.
--lex-function-name=funname
Use funname rather than lex as the name of the member
function performing the lexical scanning.
--lex-source=filename (-l)
Define filename as the name of the source file to contain the
scanner member function lex. Defaults to lex.cc.
--matched-rules (-'R')
The generated scanner will write the numbers of matched rules to
the standard output. It is implied by the --debug option.
Displaying the matched rules can be suppressed by calling the
generated scanner's member setDebug(false) (or, of course, by
re-generating the scanner without using specifying
--matched-rules).
--max-depth=depth (-m)
Set the maximum inclusion depth of the lexical scanner's
specification files to depth. By default the maximum depth is
set to 10. When more than depth specification files are used
the scanner throws a Max stream stack size exceededstd::length_error exception.
--namespace=identifier
Define the scanner class in the namespace identifier. By default
no namespace is used. If this options is used the
implementation header is provided with a commented out using
namespace declaration for the requested namespace. In addition,
the scanner and scanner base class header files also use the
specified namespace to define their include guard directives.
It is an error if this option is used and an already
scanner-class header file does not define namespace
identifier.
--no-baseclass-header
Do not write the file containing the scanner's base class interface
even if it doesn't yet exist. By default the file containing the
scanner's base class interface is (re)written each time flexc++ is
called.
--no-lines
Do not put #line preprocessor directives in the file containing
the scanner's lex function. By default #line directives
are entered at the beginning of the action statements in the
generated lex.cc file, allowing the compiler and debuggers
to associate errors with lines in your grammar specification
file, rather than with the source file containing the lex
function itself.
--no-lex-source
Do not write the file containing the scanner's predefined scanner
member functions, even if that file doesn't yet exist. By default
the file containing the scanner's lex member function is
(re)written each time flexc++ is called. This option
should normally be avoided, as this file contains parsing
tables which are altered whenever the grammar definition is
modified.
--own-tokens (-T)
The tokens returned as well as the text matched when flexc++ reads
its input files(s) are shown when this option is used.
This option does not result in the generated program displaying
returned tokens and matched text. If that is what you want, use
the --print-tokens option.
--print-tokens (-t)
The tokens returned as well as the text matched by the generated
lex function are displayed on the standard output stream, just
before returning the token to lex's caller. Displaying tokens
and matched text is suppressed again when the lex.cc file is
generated without using this option. The function showing the
tokens (ScannerBase::print__) is called from
Scanner::printTokens, which is defined in-line in
Scanner.h. Calling ScannerBase::print__, therefore, can
also easily be controlled by an option controlled by the program
using the scanner object.
This option does not show the tokens returned and text matched
by flexc++ itself when reading its input s. If that is what
you want, use the --own-tokens option.
--regex-calls
Show the function call order when parsing regular expressions (this
option is normally not required. Its main purpose is to help
developers understand what happens when regular expressions are
parsed).
--show-filenames (-F)
Write the names of the files that are generated to the
standard error stream.
--skeleton-directory=pathname (-S)
Defines the directory containing the skeleton files. This option
can be overridden by the specific skeleton-specifying options
(-B -C, -H, and -I).
--target-directory=pathname
Specifies the directory where generated files should be written.
By default this is the directory where flexc++ is called.
--usage (-h)
Write basic usage information to the standard output stream and
terminate.
--verbose(-V)
The verbose option generates on the standard output stream various
pieces of additional information, not covered by the
--construction and --show-filenames options.
--version (-v)
Display flexc++'s version number and terminate.
FILES
Flexc++'s default skeleton files are in /usr/share/flexc++.
By default, flexc++ generates the following files:
Scanner.h: the header file containing the scanner class's
interface.
Scannerbase.h: the header file containing the interface of the
scanner class's base class.
Scanner.ih: the internal header file that is meant to be included
by the scanner class's source files (e.g., it is included by
lex.cc, see the next item's file), and that should contain all
declarations required for compiling the scanner class's sources.
lex.cc: the source file implementing the scanner class member
function lex (and support functions), performing the lexical
scan.
SEE ALSO
bisonc++(1), flexc++api(3), flexc++input(7)
BUGS
None reported
ABOUT flexc++
Flexc++ was originally started as a programming project by Jean-Paul van
Oosten and Richard Berendsen in the 2007-2008 academic year. After graduating,
Richard left the project and moved to Amsterdam. Jean-Paul remained in
Groningen, and after on-and-off activities on the project, in close
cooperation with Frank B. Brokken, Frank undertook a rewrite of the project's
code around 2010. During the development of flexc++, the lookahead-operator
handling continuously threatened the completion of the project. By now, the
project has evolved to a level that we feel it's defensible to publish the
program, although we still tend to consider the program in its experimental
stage; it will remain that way until we decide to move its version from the
0.9x.xx series to the 1.xx.xx series.
COPYRIGHT
This is free software, distributed under the terms of the
GNU General Public License (GPL).
AUTHOR
Frank B. Brokken (f.b.brokken@rug.nl),
Jean-Paul van Oosten (j.p.van.oosten@rug.nl),
Richard Berendsen (richardberendsen@xs4all.nl) (until 2010).