Table of Contents
This section answers the question: "How do I include an Alex lexer in my program?"
Alex provides for a great deal of flexibility in how the
lexer is exposed to the rest of the program. For instance,
there's no need to parse a String
directly if
you have some special character-buffer operations that avoid the
overheads of ordinary Haskell String
s. You
might want Alex to keep track of the line and column number in the
input text, or you might wish to do it yourself (perhaps you use a
different tab width from the standard 8-columns, for
example).
The general story is this: Alex provides a basic interface to the generated lexer (described in the next section), which you can use to parse tokens given an abstract input type with operations over it. You also have the option of including a wrapper, which provides a higher-level abstraction over the basic interface; Alex comes with several wrappers.
Lexer specifications are written in terms of Unicode characters, but Alex works internally on a UTF-8 encoded byte sequence.
Depending on how you use Alex, the fact that Alex uses UTF-8
encoding internally may or may not affect you. If you use one
of the wrappers (below) that takes input from a
Haskell String
, then the UTF-8 encoding is
handled automatically. However, if you take input from
a ByteString
, then it is your
responsibility to ensure that the input is properly UTF-8
encoded.
None of this applies if you used the --latin1
option to Alex. In that case, the input is just a sequence of
8-bit bytes, interpreted as characters in the Latin-1
character set.