RE: SDATA or UNICODE

Paul Prescod (papresco@technologist.com)
Wed, 28 Jan 1998 17:29:39 -0500 (EST)


On Wed, 28 Jan 1998, Gavin McKenzie wrote:
>
> XML provides a way for specifying the encoding of an entity with the
> ?XML pi encoding declaration. Why wouldn't this be sufficient. If the
> euro or florin symbol is available in some non-Unicode character
> encoding scheme, isn't it sufficient to encode the text which requires
> the symbol in the appropriate scheme and use the encoding declaration?

No, for the reason Tim points out. On the other hand, you might be on the
right track. A processing instruction would serve as a hack to tell the
application where to insert the euro. <?EURO>

> On a related note...I have felt that it should be possible to attach the
> encoding declaration to any element in a manner similar to xml:lang.
> Typically our customers (who often are not able to make use of Unicode)
> require the ability to switch from one character encoding scheme to
> another on the fly within the same physical document (e.g. switching
> from Shift-JIS to Latin-1 and back). Referencing an external entity
> makes it possible, but not acceptable for our customers.

Egad. This is one of those things that is a good idea at the user level,
but would make implementation prohibitive. Imagine the poor desperate
Python hacker trying to "grep" through that!

I think you should implement a language that allows this and is preprocessed
into XML. If I were you I would use marked sections and not attributes to
describe the boundaries. Marked sections are really easy to scan for.

Paul Prescod