1.4. Release Notes for version 2.0

Alex has changed a lot between versions 1.x and 2.0. The following is supposed to be an exhaustive list of the changes:

1.4.1. Syntax changes

  • Code blocks are now surrounded by {...} rather than %{...%}.

  • Character-set macros now begin with ‘$’ instead of ‘^’ and have multi-character names.

  • Regular expression macros now begin with ‘@’ instead of ‘%’ and have multi-character names.

  • Macro definitions are no longer surrounded by { ... }.

  • Rules are now of the form

    <c1,c2,...>  regex   { code }

    where c1, c2 are startcodes, and code is an arbitrary Haskell expression.

  • Regular expression syntax changes:

    • () is the empty regular expression (used to be ‘$’)

    • set complement can now be expressed as [^sets] (for similarity with lex regular expressions).

    • The 'abc' form is no longer available, use [abc] instead.

    • ^’ and ‘$’ have the usual meanings: ‘^’ matches just after a ‘\n’, and ‘$’ matches just before a ‘\n’.

    • \n’ is now the escape character, not ‘^’.

    • The form "..." means the same as the sequence of characters inside the quotes, the difference being that special characters do not need to be escaped inside "...".

  • Rules can have arbitrary predicates attached to them. This subsumes the previous left-context and right-context facilities (although these are still allowed as syntactic sugar).

1.4.2. Changes in the form of an Alex file

  • Each file can now only define a single grammar. This change was made to simplify code generation. Multiple grammars can be simulated using startcodes, or split into separate modules.

  • The programmer experience has been simplified, and at the same time made more flexible. See the Chapter 5, The Interface to an Alex-generated lexer for details.

  • You no longer need to import the Alex module.

1.4.3. Usage changes

The command-line syntax is quite different. See Chapter 6, Invoking Alex.

1.4.4. Implementation changes

  • A more efficient table representation, coupled with standard table-compression techniques, are used to keep the size of the generated code down.

  • When compiling a grammar with GHC, the -g switch causes an even faster and smaller grammar to be generated.

  • Startcodes are implemented in a different way: each state corresponds to a different initial state in the DFA, so the scanner doesn't have to check the startcode when it gets to an accept state. This results in a larger, but quicker, scanner.