Table of Contents
Alex can always be obtained from its home page. The latest
source code lives in the git
repository on GitHub
.
Unicode support (contributed mostly by Jean-Philippe Bernardy, with help from Alan Zimmerman).
An Alex lexer now takes a UTF-8 encoded byte sequence
as input (see Section 5.1, “Unicode and UTF-8”. If you are
using the "basic" wrapper or one of the other wrappers
that takes a Haskell String
as
input, the string is automatically encoded into UTF-8
by Alex. If your input is
a ByteString
, you are responsible
for ensuring that the input is UTF-8 encoded. The old
8-bit behaviour is still available via
the --latin1
option.
Alex source files are asumed to be in UTF-8, like Haskell source files. The lexer specification can use Unicode characters and ranges.
alexGetChar
is renamed to
alexGetByte
in the generated code.
There is a new option, --latin1
, that
restores the old behaviour.
Alex now does DFA minimization, which helps to reduce the size of the generated tables, especially for lexers that use Unicode.