Validation algorithm/code wanted

Peter Murray-Rust (peter@ursus.demon.co.uk)
Wed, 03 Dec 1997 01:01:45


This may come as a shock to some, but I would actually like to use
DTD-based validation in JUMBO. The primary purpose is to be able to read in
a document and map the content of each ELEMENT onto the DTD. This is so I
can have a GUI-based authoring tool. [ATTLISTs are relatively easy and I
have already done them, I think].

I would be grateful for some or all of the following:
- a java-based library routine (I think this may be optimistic in 1997)
- an algorithm, or a pointer to one on the WWW
- some wise words about how much effort is involved in writing an algorithm.

[Norbert solved this in NXP by including JACC - a java-based yacc-like
beast - but it is cumbersome for just analysing single content models
against instances].

The operation seems to be somewhere in between a graph matching routine
(which I can do except for the optionality) and a BNF parser (e.g. yacc)
which I certainly can't. My recollection of regexps is that they use a
'maximal munch' of some sort and so I would try to match as many of the
early nodes and then unwind the stack repeatedly if it failed. However,
yacc throws up the 'shift-reduce' conflicts which I imagine still pertain
in XML. (This means there is more than one way of mapping a document onto
the content model, I assume.)

I'd really hate to have to hack this myself - maybe there is a mythical
grad student on this list who really loves writing parsers. If so, I'll
write to her supervisor with a glowing reference :-)

P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg