Alex has changed a lot between versions 1.x and 2.0. The following is supposed to be an exhaustive list of the changes:
Code blocks are now surrounded by
{...}
rather than
%{...%}
.
Character-set macros now begin with
‘$
’ instead of
‘^
’ and have
multi-character names.
Regular expression macros now begin with
‘@
’ instead of
‘%
’ and have
multi-character names.
Macro definitions are no longer surrounded by
{ ... }
.
Rules are now of the form
<c1,c2,...> regex { code }
where c1
, c2
are
startcodes, and code
is an arbitrary
Haskell expression.
Regular expression syntax changes:
()
is the empty regular
expression (used to be
‘$
’)
set complement can now be expressed as
[^sets]
(for similarity with lex
regular expressions).
The 'abc'
form is no longer
available, use [abc]
instead.
‘^
’ and
‘$
’ have the usual
meanings: ‘^
’ matches
just after a ‘\n
’, and
‘$
’ matches just before
a ‘\n
’.
‘\n
’ is now the
escape character, not
‘^
’.
The form "..."
means the same
as the sequence of characters inside the quotes, the
difference being that special characters do not need
to be escaped inside "..."
.
Rules can have arbitrary predicates attached to them. This subsumes the previous left-context and right-context facilities (although these are still allowed as syntactic sugar).
Each file can now only define a single grammar. This change was made to simplify code generation. Multiple grammars can be simulated using startcodes, or split into separate modules.
The programmer experience has been simplified, and at the same time made more flexible. See the Chapter 5, The Interface to an Alex-generated lexer for details.
You no longer need to import the
Alex
module.
The command-line syntax is quite different. See Chapter 6, Invoking Alex.
A more efficient table representation, coupled with standard table-compression techniques, are used to keep the size of the generated code down.
When compiling a grammar with GHC, the -g switch causes an even faster and smaller grammar to be generated.
Startcodes are implemented in a different way: each state corresponds to a different initial state in the DFA, so the scanner doesn't have to check the startcode when it gets to an accept state. This results in a larger, but quicker, scanner.