Re: XML vs the Dreaded Whitespace

David Megginson (ak117@freenet.carleton.ca)
Thu, 11 Dec 1997 06:41:45 -0500


Peter Murray-Rust writes:

> As a corollary: Is anyone testing the ESIS output of the current cro=
p of
> XML parsers (4 Java + nsgmls, I think)? Regardless of the whitespace=
model
> or the value of xml:space they should all produce identical ESIS (ri=
ght?)
> If not, then one or more is wrong. And all applications should (IMO)=
be
> prepared to work with ESIS which I think is isomorphous with a WF XM=
L
> document.

There are quite a few more XML parsers out there, including at least
one in TCL -- see=20

http://www.sil.org/sgml/XML.html#xmlSoftware

As for ESIS, there are some problems that we'd have to overcome first:

1) How should empty elements be represented? Right now, =C6lfred gener=
ates a
startElement event immediately followed by an endElement event.

2) How should the XML declaration be represented? Should it appear as
a processing instruction, or should it be ignored?

3) How should space in element content be handled? According to the
spec, a DTD-aware parser should handle whitespace in element
content differently from whitespace in mixed content (=C6lfred just
ignores whitespace in element content right now).

4) DTD-aware and non-DTD-aware parsers will handle whitespace in
attribute values differently. Non-DTD-aware parsers will treat all
attributes as CDATA, but DTD-aware parsers will treat tokenised
attributes specially, by stripping all leading an trailing
whitespace, and normalising internal whitespace to single spaces.

All the best,

David

--=20
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/