NXP.
----
NXP has an interface Esis, with function such as open_tag, close_tag,
process_instruction, etc.  [I think they would be more properly called 
start_element??].  JUMBO uses this to build up a Vector representing the
ESIS event stream, somthing like:
"_START_TAG" "CML"  AttributeList "_START_TAG" "MOL" ... "_END_TAG" "MOL"...
JUMBO then builds a tree out of this, adding attributes, etc.
NXP has a class XML which is built by JACC.  This contains inter alia
an Esis_Stdout object (implements Esis).  There are several objects in XML
which are private and therefore not easily accessed - I think they should
have accessors, but at present I have subclassed it to PMRXML, which has
the requisiste accessors.
My test program then creates a PMRXML object, and extracts the event stream
which is then passed to JUMBO's existing tree object:
    NXP.PMRXML xml = new PMRXML(NXP.Streams.load_File(file, true));
    pmr.chemime.ChemTree chemTree = new ChemTree(xml.getStreamVector());
    pmr.sgml.GeneralTOC toc = chemTree.createGeneralTOC(3);
Comments:  I have still to work out what whitespace NXP creates - there seems 
to be a lot of content which is simply white.  Maybe we have to address
COLLAPSE and KEEP at this stage?  Also it isn't easy to extract certain 
info - for example I had to hack XML.java to get the doctype - this isn't a good
idea and we need an accessor.  I am also still not clear how NXP does (or should)
behave with:
<!DOCTYPE CML>
and <!DOCTYPE CML SYSTEM "cml.dtd">
(the default on the latter is to try to validate, I think, even if validate
is set to false.  I'd prefer to be able to turn off validation, but I may have
missed something).
	In general I'd like to be able to treat NXP as a black box, and subclass
my Esis object.  That could mean passing it as an argument to XML, e.g.:
   
public class PMREsis implements Esis {
    public void open_tag(String name) {
...
    }
}
    PMREsis esis = new PMREsis();
    NXP.XML xml = new NXP.XML(esis, NXP.Streams.load_File(file, true))
    pmr.sgml.SGMLTree tree = new pmr.sgml.SGMLTree(xml);
and so on.
NXP is a validatin parser, but my DTDs are still struggling with Parameter
Entities so I have no experience here.
Lark
----
	Lark creates a tree (called Lark) and provides a handler for 
the user to pick up a variety of events (e.g. doDoctype(), doPI()).  The
tree contains Elements ('Nodes') which have Attributes and a type (String).
Rather than subclassing these elements, I process Lark but iterating through
the Elements and creating a JUMBO SGMLTree (this can be delayed if required).
The tree seems complete, but I am not sure I have got all the doFOO routines
working correctly.  I have also had problems with PIs (if the ?> delimiter
is used) - these may be mine.
Lark does not validate.  However it is easy to interface and is fast.
General
-------
I do not use PIs myself though I shall start to do so.  If they are
kept in the document tree, is there a convention where they live?  (The last
opened element?  What if they occur in PCDATA?).
I intend to make JUMBO available with both Lark and NXP but it's a bit creaky
at present and the interface is a bit slow.  I have been told that the larger
the number of classes, the slower the program - any comments?  Also I don't
know whether I should be deliberately garbage-collecting at this stage.
Any general thoughts would be welcome.  I intend to bolt a crude search tool
into JUMBO along the TEI lines.  I shall also see whether I can extract the
bits of NXP that do the validating, because then we have a crude validating
editor.  
Any feedback from the current JUMBos would be appreciated.  (I already know
it's slow, and the graphics creak in several places :-)
P.
-- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/