Why do you need this?
> public abstract int read (char ch[], int start, int count)
> throws SAXException;
> }
>
> (Where SAXException is, in the Java version, a direct and unmodified
> subclass of java.io.IOException). The result of either method is -1
> if there are no characters left to read; otherwise, it is a UTF-16
> character value for the first, and the number of characters read for
> the second.
>
> The advantage of using SAXCharacterStream is that behaviour over CORBA
> (or, I suppose, DCOM) is now well-defined. The disadvantage is
> another bloody interface.
>
> I had also written a SAXByteStream, but then I started wondering why
> we really need it -- information coming from a database, for example,
> or from a buffer should already be in characters, not in raw bytes
> (and in Java, at least, it is simply to wrap a Reader around any
> InputStream when necessary -- I expect that other languages will have
> good internationalisation support soon).
>
> Can anyone put forward a convincing case for having a standard SAX
> method parsing from a raw byte stream (remembering that
> implementations can always extend the SAXParser interface themselves
> for special requirements)?
You would be biasing SAX towards implementations that work internally by
converting into UTF-16 and then parsing. Not all parsers work like this
and it is not the most efficient way to write a parser. My parsers work
directly on a stream of bytes and don't convert to a character stream
first. That's one reason why they are faster than other parsers. In
fact the way I would implement support for a SAXCharacterStream is to
wrap an InputStream around it to turn it into a sequence of bytes.
XML implementations may well provide their own machinery for converting
from bytes to characters. The system provided facilties (as in Java)
are in practice often slow, buggy (lacking surrogate support for
example), with inconsistences between platforms. By providing only
SAXCharacterStream you would be preventing users from taking advantage
of this machinery when not reading from a URL.
Another reason is that the XML defined mechanisms for specification of
the encoding (with the encoding declaration and auto-detection of
encodings) would not be available when reading from a stream.
Yet another issue is that the XML spec specifies how to parse byte
streams not character streams. When you try to infer from it how to
parse character streams, issues arise like treatment of the byte order
mark and encoding declaration which are not defined by the XML spec.
I think SAX is getting way too complicated and these should be left out
for now. If you are going to have only one it should be SAXByteStream
not SAXCharacterStream.
James