SAX: Whitespace Handling (question 5 of 10)

David Megginson (ak117@freenet.carleton.ca)
Sat, 3 Jan 1998 13:02:33 -0500


[SAX is a proposal for a simple, event-based XML API, using
callbacks. This is one in a series of ten design questions that we
need to answer to implement the API.]

Should SAX allow DTD-driven parsers to distinguish ignorable
whitespace from other character data?

public void ignorableWhitespace (char ch[], int length);

(We have already had some discussion on this topic.)

CON

---

- this method would make SAX slightly larger;

- parsers that use the DTD will return different results than parsers that do not (though it would be trivial to map the two);

- the concept of ignorable whitespace can be confusing for non-specialists.

PRO

---

- the PR requires "validating" parsers to flag ignorable whitespace for the application;

- there would be no need to implement anything here for most applications;

- whitespace in element content is almost never significant for formatting or database applications (if it were significant, then the element type would have mixed content).

MY RECOMMENDATION -----------------

Qualified no.

As someone who has worked with SGML for many years, I would rather not see the ignorable whitespace at all; however, the PR requires parsers to report all whitespace.

Tim Bray's recent comments on this list imply that a validating parser using SAX could report ignorable whitespace as regular character data and still be conforming; if I have inferred correctly, then I am willing to omit this callback.

OTHER CONSIDERATIONS --------------------

It would also be possible to implement this in the charData callback itself:

public void charData (char ch[], int length, boolean isIgnorable);

However, given that charData will probably be the most heavily-implemented handler, and that very few applications will care about ignorable whitespace, I would prefer not to complicate things unnecessarily. If we need to distinguish it to be conforming, then ignorable whitespace should probably be shuffled off to its own callback, to make it easier to ignore.

All the best,

David

-- 
David Megginson                 ak117@freenet.carleton.ca
Microstar Software Ltd.         dmeggins@microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)