Chapter 4. Regular Expression

Table of Contents

4.1. Syntax of regular expressions
4.2. Syntax of character sets

Regular expressions are the patterns that Alex uses to match tokens in the input stream.

4.1. Syntax of regular expressions

regexp  := rexp2 { '|' rexp2 }

rexp2   := rexp1 { rexp1 }

rexp1   := rexp0 [ '*' | '+' | '?' | repeat ]

rexp0   := set
         | @rmac
         | @string
         | '(' [ regexp ] ')'

repeat  := '{' $digit '}'
         | '{' $digit ',' '}'
         | '{' $digit ',' $digit '}'

The syntax of regular expressions is fairly standard, the only difference from normal lex-style regular expressions being that we allow the sequence () to denote the regular expression that matches the empty string.

Spaces are ignored in a regular expression, so feel free to space out your regular expression as much as you like, even split it over multiple lines and include comments. Literal whitespace can be included by surrounding it with quotes "   ", or by escaping each whitespace character with \.

set

Matches any of the characters in set. See Section 4.2, “Syntax of character sets” for the syntax of sets.

@foo

Expands to the definition of the appropriate regular expression macro.

"..."

Matches the sequence of characters in the string, in that order.

r*

Matches zero or more occurrences of r.

r+

Matches one or more occurrences of r.

r?

Matches zero or one occurrences of r.

r{n}

Matches n occurrences of r.

r{n,}

Matches n or more occurrences of r.

r{n,m}

Matches between n and m (inclusive) occurrences of r.