The grammar section comes after the directives, separated
from them by a double-percent (%%
) symbol.
This section contains a number of
productions, each of which defines a single
non-terminal. Each production has the following syntax:
<non-terminal> [ :: { <type> } ] : <id> ... {[%] <expression> } [ | <id> ... {[%] <expression> } ... ]
The first line gives the non-terminal to be defined by the production and optionally its type (type signatures for productions are discussed in Section 2.4, “Type Signatures”).
Each production has at least one, and possibly many right-hand sides. Each right-hand side consists of zero or more symbols (terminals or non-terminals) and a Haskell expression enclosed in braces.
The expression represents the semantic value of the
non-terminal, and may refer to the semantic values of the
symbols in the right-hand side using the meta-variables
$1 ... $n
. It is an error to
refer to $i
when i
is larger than the number of symbols on the right hand side of
the current rule. The symbol $
may be
inserted literally in the Haskell expression using the sequence
\$
(this isn't necessary inside a
string or character literal).
Additionally, the sequence $>
can be used to represent the value of the rightmost symbol.
A semantic value of the form {% ... }
is a
monadic action, and is only valid when the grammar
file contains a %monad
directive (Section 6.3.5, “Monad Directive”). Monadic actions are discussed in
Section 2.5, “Monadic Parsers”.
Remember that all the expressions for a production must have the same type.
This functionality is best illustrated with an example:
opt(p) : p { Just $1 } | { Nothing } rev_list1(p) : p { [$1] } | rev_list1(p) p { $2 : $1 }
The first production, opt
, is used for optional
components of a grammar. It is just like p?
in
regular expressions or EBNF. The second production,
rev_list1
, is for parsing a list of 1 or more
occurrences of p
. Parameterized productions are
just like ordinary productions, except that they have parameter in
parenthesis after the production name. Multiple parameters should
be separated by commas:
fst(p,q) : p q { $1 } snd(p,q) : p q { $2 } both(p,q) : p q { ($1,$2) }
To use a parameterized production, we have to pass values for the parameters, as if we are calling a function. The parameters can be either terminals, non-terminals, or other instantiations of parameterized productions. Here are some examples:
list1(p) : rev_list1(p) { reverse $1 } list(p) : list1(p) { $1 } | { [] }
The first production uses rev_list
to define
a production that behaves like p+
, returning
a list of elements in the same order as they occurred in the input.
The second one, list
is like p*
.
Parameterized productions are implemented as a prepossessing pass in Happy: each instantiation of a production turns into a separate non-terminal, but are careful to avoid generating the same rule multiple times, as this would lead to an ambiguous grammar. Consider, for example, the following parameterized rule:
sep1(p,q) : p list(snd(q,p)) { $1 : $2 }
The rules that would be generated for sep1(EXPR,SEP)
sep1(EXPR,SEP) : EXPR list(snd(SEP,EXPR)) { $1 : $2 } list(snd(SEP,EXPR)) : list1(snd(SEP,EXPR)) { $1 } | { [] } list1(snd(SEP,EXPR)) : rev_list1(snd(SEP,EXPR)) { reverse $1 } rev_list1(snd(SEP,EXPR)) : snd(SEP,EXPR)) { [$1] } | rev_list1(snd(SEP,EXPR)) snd(SEP,EXPR) { $2 : $1 } snd(SEP,EXPR) : SEP EXPR { $2 }
Note that this is just a normal grammar, with slightly strange names for the non-terminals.
A drawback of the current implementation is that it does not support type signatures for the parameterized productions, that depend on the types of the parameters. We plan to implement that in the future---the current workaround is to omit the type signatures for such rules.