On Case and Performance

Tim Bray (tbray@textuality.com)
Sat, 06 Sep 1997 14:06:53 -0700


Recently, I whined:

>What we need is a truly good profiler. Anyone with a good Java profiler
>experience to share? I speeded Lark up substantially with 0.91, just by
>code-walking and guessing. This is not the right way to do it. -T.

Disgusted with myself, I went and found the Java Workshop Beta from
java.sun.com, downloaded it (16M!) and ran its profiler. Well well,
surprise, Lark was spending 91% of its time in this little routine
that looks up a GI to see if we've seen it before. And in that
routine, it was spending most of its time in Character.toUpperCase.

Ouch. The code used to be:
for (i = 0; i < name.length; i++)
name[i] = sToUpper[name[i]];

Now it says

for (i = 0; i < name.length; i++)
if (name[i] < 127)
name[i] = sToUpper[name[i]]; // 127-entry upcasing table
else
name[i] = Character.toUpperCase(name[i]);

Note that toUpperCase is called only in the case when non-ASCII
characters show up in GI/Attribute/Entity names.

Resulting performance improvement in Lark, in processing the XML spec:
a factor of 11.9.

The Sun profiler is not quite as slick as gprof of yore, but it's
not bad at all. -T.