I'm not sure how your observation argues against my proposed hack to
insert a non-Unicode character into a Unicode document. This is not an
issue of encodings, but of character sets.
> If your customers
> require multiple encodings, then they have to source each one from a separate
> external entity. These entities can be bundled up or interleaved in any
> fashion you like, but this is a *PRE* XML storage management issue, not
....
> But once you have changed encodings, do you scan for the end of the
> marked section using the old or the new encoding? These kinds of ISO 2022
> mode changing are what we are trying to get rid of from XML (and from
> SGML).
It is exactly *because* the issues do not belong in XML, and are "*PRE*
XML" that I advised a preprocessor. I don't see anything that argues
against that here. As far as the signalling of mode switches -- it
depends on the encodings in question.
> So you can have multiple encodings before the parser, but not being presented
> to the parser. The other choice is multiple encodings after the parser: e.g.
> embedded the SJIS encoded in a latin-1-safe way. This is the same as Dave's
> comment about transliteration using notation. You can have a document like
>
> <?XML version="1.0" encoding="8859-1"?>
> <!DOCTYPE x SYSTEM "x.dtd"
> [
> <!NOTATION sjis-Qencoded SYSTEM "SjisQ.pl">
> <!ELEMENT SJIS-SECTION ( #PCDATA ) >
> <!ATTLIST SJIS-SECTION
> I-need-decoding NOTATION ( sjis-Qencoded ) >
> ]>
> <x>
> ...
>
> <SJIS-SECTION><![CDATA[
> smdkfjhhjwfnnweofijslkdm
> ]]></SJIS-SECTION>
> ...
> </x>
You had better hope that CDEnd does not appear in the encoded data!
Paul Prescod