I would be grateful for some or all of the following:
- a java-based library routine (I think this may be optimistic in 1997)
- an algorithm, or a pointer to one on the WWW
- some wise words about how much effort is involved in writing an algorithm.
[Norbert solved this in NXP by including JACC - a java-based yacc-like
beast - but it is cumbersome for just analysing single content models
against instances].
The operation seems to be somewhere in between a graph matching routine
(which I can do except for the optionality) and a BNF parser (e.g. yacc)
which I certainly can't. My recollection of regexps is that they use a
'maximal munch' of some sort and so I would try to match as many of the
early nodes and then unwind the stack repeatedly if it failed. However,
yacc throws up the 'shift-reduce' conflicts which I imagine still pertain
in XML. (This means there is more than one way of mapping a document onto
the content model, I assume.)
I'd really hate to have to hack this myself - maybe there is a mythical
grad student on this list who really loves writing parsers. If so, I'll
write to her supervisor with a glowing reference :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg