Re: SAX: String Internalisation and a CORBA/DCOM Question

James Clark (jjc@jclark.com)
Sun, 19 Apr 1998 12:28:28 +0700


David Megginson wrote:
>
> Here's another last-minute SAX question: should org.xml.sax.Parser
> expose a method for internalising strings?
>
> public abstract String intern (String s);

Absolutely not.

> Most Java-based parsers, at least, already use some type of
> internalisation (but not, usually, the inefficient
> java.lang.String.intern() method) for names -- the SAX driver could
> expose this functionality if support is already there, or do its own
> internalising if support is absent.

That would be a significant performance hit on SAX use with parsers that
don't do internalisation. XP does not do this sort of internalisation
because it would make it slower.

> As someone has already pointed out, internalised strings will make a
> dramatic difference for the speed of applications, since applications
> can use a simple '==' operator (or the local equivalent) to test for
> equality rather than a slow subroutine like java.lang.String.equals().

Doing lots of comparisions on the type of each element whether using
equals or == is not a good way to write an efficient application. It's
typically better to have a hash-table that associates each element type
with either an integer (which you can then use in a switch statement) or
an object (which you then make a method call on).

This could be done a little more efficiently with help from the parser.
For example, you could have a method on SAXParser

setElementTypeUserData(String elementType, Object userData);

Then startElement() and endElement() in SAXDocumentHandler could have an
additional Object userData argument.

This would allow apps to do something like:

void startElement(String name, Object userData, SAXAttributeList atts) {
switch (((Integer)userData).intValue()) {
...
}
}

or

void startElement(String name, Object userData, SAXAttributeList atts) {
((ElementHandler)userData).start();
}

I don't think it's worth the complexity.

> By the way, here's the minimum list of what should be internalised in
> the callbacks from the SAX parser:

SAX should not require the internalization of anything.

James