Table of Contents
This chapter explains how to use the GLR parsing extension,
which allows Happy to parse ambiguous
grammars and produce useful results.
This extension is triggered with the --glr
flag,
which causes Happy
to use a different driver for the LALR(1) parsing
tables. The result of parsing is a structure which encodes compactly
all of the possible parses.
There are two options for how semantic information is combined with
the structural information.
This extension was developed by Paul Callaghan and Ben Medlock
(University of Durham). It is based on the structural parser
implemented in Medlock's undergraduate project, but significantly
extended and improved by Callaghan.
Bug reports, comments, questions etc should be sent to
<P.C.Callaghan@durham.ac.uk>
.
Further information can be found on Callaghan's
GLR parser
page.
Here's an ambiguous grammar. It has no information about the
associativity of +
, so for example,
1+2+3
can be parsed as
(1+(2+3))
or ((1+2)+3)
.
In conventional mode, Happy,
would complain about a shift/reduce
conflict, although it would generate a parser which always shifts
in such a conflict, and hence would produce only
the first alternative above.
E -> E + E E -> i -- any integer
GLR parsing will accept this grammar without complaint, and produce
a result which encodes both alternatives
simultaneously. Now consider the more interesting example of
1+2+3+4
, which has five distinct parses -- try to
list them! You will see that some of the subtrees are identical.
A further property of the GLR output is that such sub-results are
shared, hence efficiently represented: there is no combinatorial
explosion.
Below is the simplified output of the GLR parser for this example.
Root (0,7,G_E) (0,1,G_E) => [[(0,1,Tok '1'))]] (0,3,G_E) => [[(0,1,G_E),(1,2,Tok '+'),(2,3,G_E)]] (0,5,G_E) => [[(0,1,G_E),(1,2,Tok '+'),(2,5,G_E)] ,[(0,3,G_E),(3,4,Tok '+'),(4,5,G_E)]] (0,7,G_E) => [[(0,3,G_E),(3,4,Tok '+'),(4,7,G_E)] ,[(0,1,G_E),(1,2,Tok '+'),(2,7,G_E)] ,[(0,5,G_E),(5,6,Tok '+'),(6,7,G_E)]}] (2,3,G_E) => [[(2,3,Tok '2'))]}] (2,5,G_E) => [[(2,3,G_E),(3,4,Tok '+'),(4,5,G_E)]}] (2,7,G_E) => [[(2,3,G_E),(3,4,Tok '+'),(4,7,G_E)]} ,[(2,5,G_E),(5,6,Tok '+'),(6,7,G_E)]}] (4,5,G_E) => [[(4,5,Tok '3'))]}] (4,7,G_E) => [[(4,5,G_E),(5,6,Tok '+'),(6,7,G_E)]}] (6,7,G_E) => [[(6,7,Tok '4'))]}]
This is a directed, acyclic and-or graph.
The node "names" are of form (a,b,c)
where a
and b
are the start and end points (as positions in the input string)
and c
is a category (or name of grammar rule).
For example (2,7,G_E)
spans positions 2 to 7
and contains analyses which match the E
grammar rule.
Such analyses are given as a list of alternatives (disjunctions),
each corresponding to some use of a production of that
category, which in turn are a conjunction of sub-analyses,
each represented as a node in the graph or an instance of a token.
Hence (2,7,G_E)
contains two alternatives,
one which has (2,3,G_E)
as its first child
and the other with (2,5,G_E)
as its first child,
respectively corresponding to sub-analyses
(2+(3+4))
and ((2+3)+4)
.
Both alternatives have the token +
as their
second child, but note that they are difference occurrences of
+
in the input!
We strongly recommend looking at such results in graphical form
to understand these points. If you build the
expr-eval
example in the directory
examples/glr
(NB you need to use GHC for this,
unless you know how to use the -F
flag for Hugs),
running the example will produce a file which can be viewed with
the daVinci graph visualization tool.
(See http://www.informatik.uni-bremen.de/~davinci/
for more information. Educational use licenses are currently
available without charge.)
The GLR extension also allows semantic information to be attached to productions, as in conventional Happy, although there are further issues to consider. Two modes are provided, one for simple applications and one for more complex use. See Section 3.3, “Including semantic results”. The extension is also integrated with Happy's token handling, e.g. extraction of information from tokens.
One key feature of this implementation in Haskell is that its main result is a graph. Other implementations effectively produce a list of trees, but this limits practical use to small examples. For large and interesting applications, some of which are discussed in Section 3.4.3, “Some Applications of GLR parsing”, a graph is essential due to the large number of possibilities and the need to analyse the structure of the ambiguity. Converting the graph to trees could produce huge numbers of results and will lose information about sharing etc.
One final comment. You may have learnt through using yacc-style tools that ambiguous grammars are to be avoided, and that ambiguity is something that appears only in Natural Language processing. This is definitely not true. Many interesting grammars are ambiguous, and with GLR tools they can be used effectively. We hope you enjoy exploring this fascinating area!