4.2. Attribute Grammars in Happy

4.2.1. Declaring Attributes

The presence of one or more %attribute directives indicates that a grammar is an attribute grammar. Attributes are calculated properties that are associated with the non-terminals in a parse tree. Each %attribute directive generates a field in the attributes record with the given name and type.

The first %attribute directive in a grammar defines the default attribute. The default attribute is distinguished in two ways: 1) if no attribute specifier is given on an attribute reference, the default attribute is assumed (see Section 4.2.2, “Semantic Rules”) and 2) the value for the default attribute of the starting non-terminal becomes the return value of the parse.

Optionally, one may specify a type declaration for the attribute record using the %attributetype declaration. This allows you to define the type given to the attribute record and, more importantly, allows you to introduce type variables that can be subsequently used in %attribute declarations. If the %attributetype directive is given without any %attribute declarations, then the %attributetype declaration has no effect.

For example, the following declarations:

%attributetype { MyAttributes a }
%attribute value { a }
%attribute num   { Int }
%attribute label { String }

would generate this attribute record declaration in the parser:

data MyAttributes a =
   HappyAttributes {
     value :: a,
     num :: Int,
     label :: String
   }

and value would be the default attribute.

4.2.2. Semantic Rules

In an ordinary Happy grammar, a production consists of a list of terminals and/or non-terminals followed by an uninterpreted code fragment enclosed in braces. With an attribute grammar, the format is very similar, but the braces enclose a set of semantic rules rather than uninterpreted Haskell code. Each semantic rule is either an attribute calculation or a conditional, and rules are separated by semicolons[3].

Both attribute calculations and conditionals may contain attribute references and/or terminal references. Just like regular Happy grammars, the tokens $1 through $<n>, where n is the number of symbols in the production, refer to subtrees of the parse. If the referenced symbol is a terminal, then the value of the reference is just the value of the terminal, the same way as in a regular Happy grammar. If the referenced symbol is a non-terminal, then the reference may be followed by an attribute specifier, which is a dot followed by an attribute name. If the attribute specifier is omitted, then the default attribute is assumed (the default attribute is the first attribute appearing in an %attribute declaration). The special reference $$ references the attributes of the current node in the parse tree; it behaves exactly like the numbered references. Additionally, the reference $> always references the rightmost symbol in the production.

An attribute calculation rule is of the form:

<attribute reference> = <Haskell expression>

A rule of this form defines the value of an attribute, possibly as a function of the attributes of $$ (inherited attributes), the attributes of non-terminals in the production (synthesized attributes), or the values of terminals in the production. The value for an attribute can only be defined once for a particular production.

The following rule calculates the default attribute of the current production in terms of the first and second items of the production (a synthesized attribute):

$$ = $1 : $2

This rule calculates the length attribute of a non-terminal in terms of the length of the current non-terminal (an inherited attribute):

$1.length = $$.length + 1

Conditional rules allow the rejection of strings due to context-sensitive properties. All conditional rules have the form:

where <Haskell expression>

For non-monadic parsers, all conditional expressions must be of the same (monomorphic) type. At the end of the parse, the conditionals will be reduced using seq, which gives the grammar an opportunity to call error with an informative message. For monadic parsers, all conditional statements must have type Monad m => m () where m is the monad in which the parser operates. All conditionals will be sequenced at the end of the parse, which allows the conditionals to call fail with an informative message.

The following conditional rule will cause the (non-monadic) parser to fail if the inherited length attribute is not 0.

where if $$.length == 0 then () else error "length not equal to 0"

This conditional is the monadic equivalent:

where unless ($$.length == 0) (fail "length not equal to 0")


[3] Note that semantic rules must not rely on layout, because whitespace alignment is not guaranteed to be preserved