flexc++api - Application programmer's interface of flexc++ generated classes
DESCRIPTION
Flexc++(1) was designed after flex(1) and flex++(1). Like these
latter two programs flexc++ generates code performing pattern-matching on text,
possibly executing actions when certain regular expressions are
recognized.
Refer to flexc++(1) for a general overview. This manual page covers
the Application Programmer's Interface of classes generated by flexc++, offering
the following sections:
1. INTERACTIVE SCANNERS: how to create an interactive scanner
2. THE CLASS INTERFACE: SCANNER.H: Constructors and members
of the scanner class generated by flexc++
3. NAMING CONVENTION: symbols defined by flexc++ in the scanner
class.
4. CONSTRUCTORS: constructors defined in the scanner class.
5. PUBLIC MEMBER FUNCTION: public member declared in the scanner
class.
6. PRIVATE MEMBER FUNCTIONS: private members declared in the
scanner class.
7. SCANNER CLASS HEADER EXAMPLE: an example of a generated
scanner class header
8. THE SCANNER BASE CLASS: the scanner class is derived from a
base class. The base class is described in this section
9. PUBLIC ENUMS AND -TYPES: enums and types declared by the
base class
10. PROTECTED ENUMS AND -TYPES: enumerations and types used by
the scanner and scanner base classes
11. NO PUBLIC CONSTRUCTORS: the scanner base class does not
offer public constructors.
12. PUBLIC MEMBER FUNCTIONS: several members defined by the
scanner base class have public access rights.
13. PROTECTED CONSTRUCTORS: the base class can be constructed by
a derived class. Usually this is the scanner class generated by flexc++.
14. PROTECTED MEMBER FUNCTIONS: this section covers the base
class member functions that can only be used by scanner class or
scanner base class members
15. PROTECTED DATA MEMBERS: this section covers the base class
data members that can only be used by scanner class or scanner base
class members
16. FLEX++ TO FLEXC++ MEMBERS: a short overview of frequently
used flex(1) members that received different names in flexc++.
17. THE CLASS INPUT: the scanner's job is completely decoupled
from the actual input stream. The class Input, nested within the
scanner base class handles the communication with the input
streams. The class Input, is described in this section.
18. INPUT CONSTRUCTORS: the class Input can easily be
replaced by another class. The constructor-requirements are described
in this section.
19. REQUIRED PUBLIC MEMBER FUNCTIONS: this section covers the
required public members of a self-made Input class
1. INTERACTIVE SCANNERS
An interactive scanner is characterized by the fact that scanning is postponed
until an end-of-line character has been received, followed by reading all
information on the line, read so far. Flexc++ supports the
%interactive directive), generating an
interactive scanner. Here it is assumed that Scanner is the name of the
scanner class generated by flexc++.
Caveat: generating interactive and non-interactive scanners should not be
mixed as their class organizations fundamentally differ, and several of the
Scanner class's members are only available in the non-interactive scanner. As
the Scanner.h file contains the Scanner class's interface, which is
normally left untouched by flexc++, flexc++ cannot adapt the Scanner class when
requested to change the interactivity of an existing Scanner class. Because of
this support for the --interactive option was discontinued at flexc++'s
1.01.00 release.
The interactive scanner generated by flexc++ has the following characteristics:
The Scanner class is derived privately from
std::istringstream and (as usual) publicly from ScannerBase.
The istringstream base class is constructed by its default
constructor.
The function lex's default implementation is removed from
Scanner.h and is implemented in the generated lex.cc source
file. It performs the following tasks:
- If the token returned by the scanner is not equal to 0 it is
returned as then next token;
- Otherwise the next line is retrieved from the input stream
passed to the Scanner's constructor (by default std::cin).
If this fails, 0 is returned.
- A '\n' character is appended to the just read line, and the
scanner's std::istringstream base class object is
re-initialized with that line;
- The member lex__ returns the next token.
This implementation allows code calling Scanner::lex() to conclude, as
usual, that the input is exhausted when lex returns 0.
Here is an example of how such a scanner could be used:
// scanner generated using 'flexc++ lexer' with lexer containing
// the %interactive directive
int main()
{
Scanner scanner; // by default: read from std::cin
while (true)
{
cout << "? "; // prompt at each line
while (true) // process all the line's tokens
{
int token = scanner.lex();
if (token == '\n') // end of line: new prompt
break;
if (token == 0) // end of input: done
return 0;
// process other tokens
cout << scanner.matched() << '\n';
if (scanner.matched()[0] == 'q')
return 0;
}
}
}
2. THE CLASS INTERFACE: SCANNER.H
By default, flexc++ generates a file Scanner.h containing the initial
interface of the scanner class performing the lexical scan according to the
specifications given in flexc++'s input file. The name of the file that is
generated can easily be changed using flexc++'s --class-header
option. In this man-page we'll stick to using the default name.
The file Scanner.h is generated only once, unless an explicit request
is made to rewrite it (using flexc++'s --force-class-header option).
The provided interface is very light-weight, primarily offering a link to
the scanner's base class (see this manpage's sections 8 through 16).
Many of the facilities offered by the scanner class are inherited from
the ScannerBase base class. Additional facilities offered by the
Scanner class. are covered below.
3. NAMING CONVENTION
All symbols that are required by the generated scanner class end in two
consecutive underscore characters (e.g., executeAction__). These names
should not be redefined. As they are part of the Scanner and
ScannerBase class their scope is immediately clear and confusion with
identically named identifiers elsewhere is unlikely.
Some member functions do not use the underscore convention. These are the
scanner class's constructors, or names that are similar or equal to names that
have historically been used (e.g., length). Also, some functions are
offered offering hooks into the implementation (like preCode). The latter
category of function also have names that don't end in underscores.
4. CONSTRUCTORS
explicit Scanner(std::istream &in = std::cin,
std::ostream &out = std::cout)
This constructor by default reads information from the standard input
stream and writes to the standard output stream. When the
Scanner object goes out of scope the input and output files are closed.
With interactive scanners input stream switching or stacking is not
available; switching output streams, however, is.
Scanner(std::string const &infile, std::string const &outfile)
This constructor opens the input and output streams whose file names
were specified. When the Scanner object goes out of scope the input and
output files are closed. If outfile == "-" then the standard output stream
is used as the scanner's output medium; if outfile == "" then the
standard error stream is used as the scanner's output medium.
This constructor is not available with interactive scanners.
5. PUBLIC MEMBER FUNCTIONS
int lex()
The lex function performs the lexical scanning of the input file
specified at construction time (but also see section 6.1. for information
about intermediate stream-switching facilities). It returns an int
representing the token associated with the matched regular expression. The
returned value 0 indicates end-of-file. Considering its default
implementation, it could be redefined by the user. Lex's default
implementation merely calls lex__:
inline int Scanner::lex()
{
return lex__();
}
Caveat: with interactive scanners the lex function is defined in
the generated lex.cc file. Once flexc++ has generated the scanner class
header file this scanner class header file isn't automatically rewritten by
flexc++. If, at some later stage, an interactive scanner must be generated, then
the inline lex implementation must be removed `by hand' from the scanner
class header file. Likewise, a lex member implementation (like the above)
must be provided `by hand' if a non-interactive scanner is required after
first having generated files implementing an interactive scanner.
6. PRIVATE MEMBER FUNCTIONS
int lex__()
This function is used internally by lex and should not otherwise
be used.
int executeAction__()
This function is used internally by lex and should not otherwise
be used.
void preCode()
By default this function has an empty, inline implementation in
Scanner.h. It can safely be replaced by a user-defined
implementation. This function is called by lex__, just before it starts to
match input characters against its rules: preCode is called by lex__
when lex__ is called and also after having executed the actions of a rule
which did not execute a return statement. The outline of lex__'s
implementation looks like this:
int Scanner::lex__()
{
...
preCode();
while (true)
{
size_t ch = get__(); // fetch next char
...
switch (actionType__(range)) // determine the action
{
... maybe return
}
... no return, continue scanning
preCode();
} // while
}
void postCode(PostEnum__ type)
By default this function has an empty, inline implementation in
Scanner.h. It can safely be replaced by a user-defined
implementation. This function is called by lex__, just after a rule has
been matched. Values of the enum class PostEnum__ indicate the
characteristic of the matched rule. PostEnum__ has four values:
PostEnum__::END, PostEnum__::POP, PostEnum__::RETURN, and
PostEnum__::WIP. Refer to section 10 for their meanings.
void print()
When the --print-tokens or %print-tokens directive is used
this function is called to display, on the standard output stream,
the tokens returned and text matched by the scanner generated by
flexc++.
Displaying is suppressed when the lex.cc file is (re)generated
without using this directive. The function actually showing the
tokens (ScannerBase::print__) is called from print, which
is defined in-line in Scanner.h. Calling
ScannerBase::print__, therefore, can also easily be controlled
by an option controlled by the program using the scanner object.
7. SCANNER CLASS HEADER EXAMPLE
#ifndef Scanner_H_INCLUDED_
#define Scanner_H_INCLUDED_
// $insert baseclass_h
#include "Scannerbase.h"
// $insert classHead
class Scanner: public ScannerBase
{
public:
explicit Scanner(std::istream &in = std::cin,
std::ostream &out = std::cout);
Scanner(std::string const &infile, std::string const &outfile);
// $insert lexFunctionDecl
int lex();
private:
int lex__();
int executeAction__(size_t ruleNr);
void print();
void preCode(); // re-implement this function for code that must
// be exec'ed before the patternmatching starts
void postCode(PostEnum__ type);
// re-implement this function for code that must
// be exec'ed after the rules's actions.
};
// $insert scannerConstructors
inline Scanner::Scanner(std::istream &in, std::ostream &out)
:
ScannerBase(in, out)
{}
inline Scanner::Scanner(std::string const &infile, std::string const &outfile)
:
ScannerBase(infile, outfile)
{}
// $insert inlineLexFunction
inline int Scanner::lex()
{
return lex__();
}
inline void Scanner::preCode()
{
// optionally replace by your own code
}
inline void Scanner::postCode(PostEnum__ type)
{
// optionally replace by your own code
}
inline void Scanner::print()
{
print__();
}
#endif // Scanner_H_INCLUDED_
8. THE SCANNER BASE CLASS
By default, flexc++ generates a file Scannerbase.h containing the
interface of the base class of the scanner class also generated by flexc++. The
name of the file that is generated can easily be changed using flexc++'s
--baseclass-header option. In this man-page we use the default name.
The file Scannerbase.h is generated at each new flexc++ run. It contains
no user-serviceable or extensible parts. Rewriting can be prevented by
specifying flexc++'s --no-baseclass-header option).
9. PUBLIC ENUMS AND -TYPES
enum class StartCondition__
This strongly typed enumeration defines the names of the start
conditions (i.e., mini scanners). It at least contains INITIAL, but when
the %s or %x directives were used it also contains the identifiers of
the mini scanners declared by these directives. Since StartCondition__ is
a strongly typed enum its values must be preceded by its enum name. E.g.,
begin(StartCondition__::INITIAL);
10. PROTECTED ENUMS AND -TYPES
enum class ActionType__
This strongly typed enumeration is for internal use only.
enum Leave__
This enumeration is for internal use only.
enum class PostEnum__
Values of this strongly typed enumeration are passed to the scanner's
private member postCode, indicating the scanner's action after
matching a rule. The values of this enumeration are:
PostEnum__::END: the function lex__ immediately returns 0
once postCode returns, indicating the end of the input was
reached;
PostEnum__::POP: the end of an input stream was reached, and
processing continues with the previously pushed input stream. In
this case the function lex__ doesn't return, it simply
coontinues processing the previously pushed stream;
PostEnum__::RETURN: the function lex__ immediately returns
once postCode returns, returning the next token;
PostEnum__::WIP: the function lex__ has matched a
non-returning rule, and continues its rule-matching process.
11. NO PUBLIC CONSTRUCTORS
There are no public constructors. ScannerBase is a base class for the
Scanner class generated by flexc++. ScannerBase only offers
protected constructors.
12. PUBLIC MEMBER FUNCTIONS
bool debug() const
returns true if --debug or %debug was specified, otherwise
false.
bool interactiveLine()
this member is only available with interactive scanners. All remaining
contents of the current interactive line buffer is discarded, and the
interactive line buffer is filled with the contents of the next input
line. This member can be used when a condition is encountered which
invalidates the remaining contents of a line. Following a call to
interactiveLine the next token that is returned by the lexical
scanner will be the first token on the next line. This member returns
true if the next line is available and false otherwise.
std::string const &filename() const
returns the name of the file currently processed by the scanner object.
size_t length() const
returns the length of the text that was matched by lex. With
flex++ this function was called leng.
size_t lineNr() const
returns the line number of the currently scanned line. This function is
always available (note: flex++ only offered a similar function
(called lineno) after using the %lineno option).
std::string const &matched() const
returns the text matched by lex (note: flex++ offers a similar
member called YYText).
void setDebug(bool onOff)
Switches on/off debugging output by providing the argument true
or false. Switching on debugging output only has visible effects
if the debug option was specified.
void switchIstream(std::string const &infilename)
The currently processed input stream is closed, and processing
continues at the stream whose name is specified as the function's
argument. This is not a stack-operation: after processing
infilename processing does not return to the original stream.
This member is not available with interactive scanners.
void switchOstream(std::ostream &out)
The currently processed output stream is closed, and
new output is written to out.
The current output stream is closed, and output is written to
outfilename. If this file already exists, it is rewritten.
void switchStreams(std::istream &in,
std::ostream &out = std::cout)
The currently processed input and output streams are closed, and
processing continues at in, writing output to out. This is
not a stack-operation: after processing in processing
does not return to the original stream.
This member is not available with interactive scanners.
void switchStreams(std::string const &infilename,
std::string const &outfilename)
The currently processed input and output streams are closed, and
processing continues at the stream whose name is specified as the
function's first argument, writing output to the file whose name is
specified as the function's second argument. This latter file is
rewritten. This is not a stack-operation: after processing
infilename processing does not return to the original stream.
If outfilename == "-" then the standard output stream
is used as the scanner's output medium; if outfilename == "" then
the standard error stream is used as the scanner's output medium.
If outfilename == "-" then the standard output stream
is used as the scanner's output medium; if outfilename == "" then
the standard error stream is used as the scanner's output medium.
This member is not available with interactive scanners.
13. PROTECTED CONSTRUCTORS
ScannerBase(std::string const &infilename,
std::string const &outfilename)
The scanner object opens and reads infilename and opens (rewrites)
and writes outfilename. It is called from the corresponding
Scanner constructor.
This member is not available for interactive scanners.
ScannerBase(std::istream &in, std::ostream &out)
The in and out parameters are, respectively, the derived class
constructor's input stream and output streams.
14. PROTECTED MEMBER FUNCTIONS
All member functions ending in two underscore characters are for internal
use only and should not be called by user-defined members of the
Scanner class.
The following members, however, can safely be called by members of the
generated Scanner class:
void accept(size_t nChars = 0)accept(n) returns all but the first `nChars' characters of the
current token back to the input stream, where they will be rescanned
when the scanner looks for the next match. So, it matches `nChars' of
the characters in the input buffer, rescanning the rest. This function
effectively sets length's return value to nChars (note: with
flex++ this function was called less);
void begin(StartCondition__ startCondition)
activate the regular expression rules associated with
StartCondition__ startCondition. As this enumeration is a strongly
typed enum the StartCondition__ scope must be specified as
well. E.g.,
begin(StartCondition__::INITIAL);
void echo() const
The currently matched text (i.e., the text returned by the member
matched) is inserted into the scanner object's output stream;
void leave(int retValue)
actions defined in the lexical scanner specification file may or may
not return. This frequently results in complicated or overlong
compound statements, blurring the readability of the specification
file. By encapsulating the actions in a member function readability is
enhanced. However, frequently a compound statement is still required,
as in:
regex-to-match {
if (int ret = memberFunction())
return ret;
}
The member leave removes the need for constructions like the
above. The member leave can be called from within member
functions encapsulating actions performed when a regular expression
has been matched. It ends lex, returning retValue to its
caller. The above rule can now be written like this:
regex-to-match memberFunction();
and memberFunction could be implemented as follows:
void memberFunction()
{
if (someCondition())
{ // any action, e.g.,
// switch mini-scanner
begin(StartCondition__::INITIAL);
leave(Parser::TOKENVALUE); // lex returns TOKENVALUE
// this point is never reached
}
pushStream(d_matched); // switch to the next stream
// lex continues
}
The member leave should only (indirectly) be called
(usually nested) from actions defined in the scanner's specification
s; calling leave outside of this context results in
undefined behavior.
void more()
the matched text is kept and will be prefixed to the text that is
matched at the next lexical scan;
std::ostream &out()
returns a reference to the scanner's output stream;
bool popStream()
closes the currently processed input stream and continues to process
the most recently stacked input stream (removing it from the stack of
streams). If this switch was successfully performed true is
returned, otherwise (e.g., when the stream stack is empty) false
is returned;
void push(size_t ch)
character ch is pushed back onto the input stream. I.e., it will be
the character that is retrieved at the next attempt to obtain a
character from the input stream;
void push(std::string const &txt)
the characters in the string txt are pushed back onto the input
stream. I.e., they will be the characters that are retrieved at the
next attempt to obtain characters from the input stream. The
characters in txt are retrieved from the first character to the
last. So if txt == "hello" then the 'h' will be the character
that's retrieved next, followed by 'e', etc, until 'o';
void pushStream(std::istream &curStream)
this function pushes curStream on the stream stack;
This member is not available with interactive scanners.
void pushStream(std::string const &curName)
same, but the stream curName is opened first, and the resulting
istream is pushed on the stream stack;
This member is not available with interactive scanners.
void redo(size_t nChars = 0)
this member acts like accept but its argument counts backward from
the end of the matched text. All but these nChars characters are
kept and the last nChar characters are rescanned. This function
effectively reduces length's return value by nChars;
void setFilename(std::string const &name)
this function sets the name of the stream returned by filename to
name;
void setMatched(std::string const &text)
this function stores text in the matched text buffer. Following a
call to this function matched returns text.
StartCondition__ startCondition() const
returns the currently active start condition (mini scanner);
std::vector<StreamStruct> const &streamStack() const
returns the vector of currently stacked input streams. The vector's
size equals 0 unless pushStream has been used. So flexc++'s input
file is not counted here. The StreamStruct is a struct only
having one accessible member: std::string const &pushedName, which
holds the name of the pushed stream. The vector is used internally
as a stack: the stream that was first pushed is found at index
position 0, the most recently pushed stream is found at
streamStack().back().
This member is not available with interactive scanners.
15. PROTECTED DATA MEMBERS
All protected data members are for internal use only, allowing lex__
to access them. All of them end in two underscore characters.
16. FLEX++ TO FLEXC++ MEMBERS
Flex++ (old)
Flexc++ (new)
lineno()
lineNr()
YYText()
matched()
less()
accept()
17. THE CLASS INPUT
Flexc++ generates a file Scannerbase.h defining the scanner class's base
class, by default named ScannerBase (which is the name used in this
man-page). The base class ScannerBase contains a nested class Input
whose interface looks like this:
The members of this class are all required and offer a level in between
the operations of ScannerBase and flexc++'s actual input file that's being
processed.
By default, flexc++ provides an implementation for all of Input's
required members. Therefore, in most situations this section of this man-page
can safely be ignored.
However, users may define and extend their own Input class and provide
flexc++'s base class with that Input class. To do so flexc++'s rules file must
contain the following two directives:
Here, interface is the name of a file containing the class Input's
interface. This interface is then inserted into ScannerBase's interface
instead of the default class Input's interface. This interface must at
least offer the aforementioned members and constructors (their functions are
described below). The class may contain additional members if required by the
user-defined implementation. The implementation itself is expected in
sourcefile. The contents of this file are inserted in the generated
lex.cc file instead of Input's default implementation. The file
sourcefile should probably not have a .cc extension to prevent its
compilation by a program maintenance utility.
When the lexical scanner generated by flexc++ switches streams using the
//include directive (see also section 2. FILE SWITCHING) in the
flexc++input(7) man page), then the input stream that's currently
processed is pushed on an Input stack maintained by ScannerBase, and
processing continues at the file named at the //include directive. Once
the latter file has been processed, the previously pushed stream is popped off
the stack, and processing of the popped stream continues. This implies that
Input objects must be `stack-able'. The required interface is designed to
satisfy this requirement.
18. INPUT CONSTRUCTORS
Input()
The default constructor is used by ScannerBase to prepare the
stack for Input objects. It must make sure that a default (empty)
Input object is in a valid state and can be destroyed. It serves no
further purpose. Input objects, however, must support the default (or
overloaded) assignment operator.
Input(std::istream *iStream, size_t lineNr = 1)
This constructor receives a pointer to a dynamically allocated
istream object. The Input constructor should preserve this pointer
when the Input object is pushed on and popped off the stack. A
shared_ptr probably comes in handy here. The Input object becomes the
owner of the istream object, albeit that its destructor is not
supposed to destroy the istream object. Destruction remains the
responsibility of the ScannerBase object, which calls the Input::close
member (see below) when it's time to destroy (close) the stream.
The new input stream's line counter is set to lineNr, by default
1.
19. REQUIRED PUBLIC MEMBER FUNCTIONS
size_t get()
returns the next character to be processed by the lexical
scanner. Usually it will be the next character from the istream passed to
the Input class at construction time. It is never called by the
ScannerBase object for Input objects defined using Input's default
constructor. It should return 0x100 once istream's end-of-file has been
reached.
size_t lineNr() const
should return the (1-based) number of the istream object passed to
the Input object. At construction time the istream has just been
opened and so at that point lineNr should return 1.
size_t nPending() const
should return the number of pending characters (i.e., the number of
characters which were passed back to the Input object using its reRead
members which were not yet retrieved again by its get member).
void setPending(size_t nPending)
should remove nPending characters from the head of the Input
object's pending input queue. The lexical scanner always passes the value
received from nPending to setPending, without calling get in
between.
void reRead(size_t ch)
if provided with a value smaller than 0x100 ch should be pushed
back onto the istream, where it becomes the character next to be
returned. Physically the character doesn't have to be pushed back. The default
implementation uses a deque onto which the character is pushed-front. Only
when this deque is exhausted characters are retrieved from the Input
object's istream.
void reRead(std::string const &str, size_t fmIdx)
the characters in str from fmIdx until the string's final
character are pushed back onto the istream object so that the string's
first character is retrieved first and the string's last character is
retrieved last.
void close()
the istream object initially passed to the Input object is
deleted by close, thereby not only freeing the stream's memory, but also
closing the stream if the stream in fact was an ifstream. Note that the
Input's destructor should not destroy the Input's istream
object.
FILES
Flexc++'s default skeleton files are in /usr/share/flexc++.
By default, flexc++ generates the following files:
Scanner.h: the header file containing the scanner class's
interface.
Scannerbase.h: the header file containing the interface of the
scanner class's base class.
Scanner.ih: the internal header file that is meant to be included
by the scanner class's source files (e.g., it is included by
lex.cc, see the next item's file), and that should contain all
declarations required for compiling the scanner class's sources.
lex.cc: the source file implementing the scanner class member
function lex (and support functions), performing the lexical
scan.
SEE ALSO
flexc++(1), flexc++input(7)
BUGS
Generating interactive and non-interactive scanners (see section
1. INTERACTIVE SCANNERS) cannot be mixed.
COPYRIGHT
This is free software, distributed under the terms of the
GNU General Public License (GPL).
AUTHOR
Frank B. Brokken (f.b.brokken@rug.nl),
Jean-Paul van Oosten (j.p.van.oosten@rug.nl),
Richard Berendsen (richardberendsen@xs4all.nl) (until 2010).