Re: SAX: Byte Streams and Character Streams

Tim Bray (tbray@textuality.com)
Sun, 19 Apr 1998 15:15:18 -0700


At 05:45 PM 4/19/98 -0400, David Megginson wrote:
>The Lark driver in the current pre-release of SAX feeds a character
>stream to Lark as an InputStream of UTF-8 bytes, using a surprisingly
>inefficient algorithm that I can fix when I have time. Will the next
>version of Lark support character streams?

Well, the current version of Lark really doesn't really support anything
*but* character streams... that and synchronization, if my measurements
are correct, amount to >50% of the difference between XP and Lark.
It is clear and (sigh) not surprising that method-dispatch-per-char
is, well, less than optimal. Thus my plan had been to move to
a three-arg-read read call.

As a result of this, I'm a bit conflicted about James' suggestion that
we lose the int read() methods. While they are a surefire way
to run slow, I spent enough years in Unix that doing things via
getc() feels natural and I appreciate its advantages; assuming
of course that getc() is a macro with buffering, which of course
a Java method dispatch, uh, isn't. Nice thing about stdio is it made
it easy for the programmer to pretend to do character streams without
having to really do serious per-char work. -T.