If you compile your Alex file without a
%wrapper
declaration, then you get access to
the lowest-level API to the lexer. You must provide definitions
for the following, either in the same module or imported from
another module:
type AlexInput alexGetByte :: AlexInput -> Maybe (Word8,AlexInput) alexInputPrevChar :: AlexInput -> Char
The generated lexer is independent of the input type,
which is why you have to provide a definition for the input type
yourself. Note that the input type needs to keep track of the
previous character in the input stream;
this is used for implementing patterns with a left-context
(those that begin with ^
or
). If you
don't ever use patterns with a left-context in your lexical
specification, then you can safely forget about the previous
character in the input stream, and have
set
^alexInputPrevChar
return
undefined
.
Alex will provide the following function:
alexScan :: AlexInput -- The current input -> Int -- The "start code" -> AlexReturn action -- The return value data AlexReturn action = AlexEOF | AlexError !AlexInput -- Remaining input | AlexSkip !AlexInput -- Remaining input !Int -- Token length | AlexToken !AlexInput -- Remaining input !Int -- Token length action -- action value
Calling alexScan
will scan a single
token from the input stream, and return a value of type
AlexReturn
. The value returned is either:
AlexEOF
The end-of-file was reached.
AlexError
A valid token could not be recognised.
AlexSkip
The matched token did not have an action associated with it.
AlexToken
A token was matched, and the action associated with it is returned.
The action
is simply the value of the
expression inside {...}
on the
right-hand-side of the appropriate rule in the Alex file.
Alex doesn't specify what type these expressions should have, it
simply requires that they all have the same type, or else you'll
get a type error when you try to compile the generated
lexer.
Once you have the action
, it is up to
you what to do with it. The type of action
could be a function which takes the String
representation of the token and returns a value in some token
type, or it could be a continuation that takes the new input and
calls alexScan
again, building a list of
tokens as it goes.
This is pretty low-level stuff; you have complete flexibility about how you use the lexer, but there might be a fair amount of support code to write before you can actually use it. For this reason, we also provide a selection of wrappers that add some common functionality to this basic scheme. Wrappers are described in the next section.
There is another entry point, which is useful if your grammar contains any predicates (see Section 3.2.2.1, “Contexts”):
alexScanUser :: user -- predicate state -> AlexInput -- The current input -> Int -- The "start code" -> AlexReturn action
The extra argument, of some type user
,
is passed to each predicate.