lex, yacc, and xml

David Megginson (ak117@freenet.carleton.ca)
Mon, 22 Dec 1997 17:07:06 -0500


Ward Harold writes:

> <question name=3D"why hand code parsers" class=3D"potentially stupid=
">
> Why is it that all of the XML parsers/processors I've seen appear to=
be
> hand coded rather than generated via lex/yacc or flex/bison? I seem =
to
> recall seeing something to the effect that yacc/bison can't handle t=
he
> class of grammar that XML falls into. Then again I'm not a compiler
> constructor, opted for the AI sequence in graduate school, so I may =
be
> imagining things. Even if there is a technical reason for eschewing
> parser generation surely the basic lexing and scanning could be done=

> with lex/flex, no?
> </question>

This is actually a very good question, but I will second most of Tim's
comments. With =C6lfred, I set out to produce an Java-based XML parser=

under 20K (I missed by about 6K, but I'm still working on it). A
hand-crafted recursive-descent parser seemed like the only reasonable
choice, and it turned out to be very fast as well.

In fact, it is not much harder to write a recursive-descent parser
than it is to write out EBNF productions, at least not once you get
into a rhythm and write a few helper methods for lexical scanning
(like "readName()").

All the best,

David

--=20
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/