In CoST Joe English supported both eventStreams and trees (I'm sure Joe will
have some wisdom on this one). I started off using the event mechanism
and switched to a tree-based one but I suspect that this was the nature of the
application.
My current problem may highlight this. A CML document is highly
tree-structured and contains no mixed content, so that eventStreams don't
contribute much. BUT it also includes chunks of HTML where a tree structure
is quite inappropriate. If I take a Lark-based approach (or my own
parser) the HTML gets rendered into a tree. I am now hacking this
back into an event stream to render the hypertext. Not only does it
take more effort, but I'm sure that holding HTML as a tree has a
memory hit. Ideally when I'm parsing CML, and come to the
tag <XHTML> (sic) which contains <BODY>, I'd like to tell the parser
'stop parsing as a tree and just hold a hypertext string until </XHTML>.
We *could* do this with a PI, but would have to all agree.
>
> Some questions that will affect the API is whether one sees empty
> element as elements containing nothing, or as elements unable to
Yes.
> contain anything, and wether entity/attribute type information needs
> to be passed across thr API.
I have been convinced that entity information needs to be preserved and I
assume there are people who are concerned about attribute_type. If
nothing else, this is probably critical for ID/IDREF.
>
> What do people think? How much information must the parser pass
> along?
At least what comes out of sgmls/ESIS, probably with general entities
added. We also need to know the DOCTYPE info.
[...]
P.
-- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/