Re: [Q] How should SAX support Namespaces?

John Cowan (cowan@locke.ccil.org)
Fri, 24 Jul 1998 16:36:43 -0400


Toby Speight scripsit:

> David> How should SAX support namespaces? I can think of three
> David> options:
>
> I'd like to add a fourth: define namespace support as a layer above
> SAX, which can interface with any SAX parser, and produce output
> similar to that of SAX with additional information. Then this
> "namespace library" can do one of your proposed actions:

I absolutely agree with this. In fact, I think that this simple
species of namespace support should be provided as part of the
SAX helper library, so that everyone has it readily available.
That way, SAX-compliant XML parsers can just do XML and leave
namespaces up to common code.

> David> 1. Simply ignore them, and require the XML application to do
> David> the work (all of the necessary information is passed on by
> David> SAX).

This corresponds to not using a helper class at all.

> David> 2. Use the current interface, but allow namespace-aware SAX
> David> processors to prepend namespace URIs to element type and
> David> attribute names, as in
> David> startElement("urn:www.megginson.com:doc", ...)
> David> endElement("urn:www.megginson.com:doc", ...)

This corresponds to a helper class that is itself a SAX-compliant
parser. Just initialize it with the object representing some other
(real) parser, and it decodes namespace PIs and alters the
startElement, endElement etc. calls.

I must point out, however, that since ":" is valid in URIs
it should *not* be used as the separator between URIs and Names,
and instead some character not valid in either should be used.
Since we are in a Java/Unicode context, I propose '\u2020' DAGGER
as the separator: both Arial Sans Unicode and Bitstream Cyberbit
have glyphs for it, which assists debugging.

> David> 3. Revised org.xml.sax.AttributeList and org.xml.sax.DocumentHandler
> David> to include the namespace as a separate (possibly-null) argument:
> David>
> David> startElement("urn:www.megginson.com", "doc", ...)
> David> endElement("urn:www.megginson.com", "doc", ...)

And this would be a helper class that exports an interface related
to, but different from, the SAX 1.0 interface.

> 4. Apply an application-specified re-mapping to the names. So the
> above could be initialised with
> ns_processor.setPrefix("urn:www.megginson.com", "davids-ns");
> and have its document handler called with
> startElement("davids-ns:doc", ...)
> endElement("davids-ns:doc", ...)

This would be another helper class, also a SAX 1.0 compliant parser.

> I don't particularly like (2) above, since it means that different SAX
> parsers may return different values for the same document.

I don't understand this comment.

> I do feel that namespace handling is orthogonal to syntactic parsing
> of XML, and that SAX itself should confine itself to the latter. I'm
> not sure whether namespace processing should happen between SAX and
> the application (layered approach) or separately "at the side" of the
> application (application developers' utility class).

Layered, layered, says I. Option #2's class would be IMHO the
most useful, since you can plug it into an existing SAX-using
application with zero changes to the interface, and just start
testing for names like "http://purl.oclc.org/NET/xschema\u2020XSchema".

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)