Re: SDATA or UNICODE

Paul Prescod (papresco@technologist.com)
Thu, 29 Jan 1998 00:28:29 -0500 (EST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Previous message: Rick Jelliffe: "Re: SDATA or UNICODE"
In reply to: Paul Prescod: "Re: SDATA or UNICODE"
Reply: Paul Prescod: "Re: SDATA or UNICODE"

On Thu, 29 Jan 1998, Rick Jelliffe wrote:
> > No, for the reason Tim points out. On the other hand, you might be on the
> > right track. A processing instruction would serve as a hack to tell the
> > application where to insert the euro. <?EURO>
>
> XML has, underlying its decisions, the SGML model which separates the
> encoding of data (i.e. "storage management") from their logical representation
> as streams of characters in a single character set (i.e. "entity management").

I'm not sure how your observation argues against my proposed hack to
insert a non-Unicode character into a Unicode document. This is not an
issue of encodings, but of character sets.

> If your customers
> require multiple encodings, then they have to source each one from a separate
> external entity. These entities can be bundled up or interleaved in any
> fashion you like, but this is a *PRE* XML storage management issue, not
....
> But once you have changed encodings, do you scan for the end of the
> marked section using the old or the new encoding? These kinds of ISO 2022
> mode changing are what we are trying to get rid of from XML (and from
> SGML).

It is exactly *because* the issues do not belong in XML, and are "*PRE*
XML" that I advised a preprocessor. I don't see anything that argues
against that here. As far as the signalling of mode switches -- it
depends on the encodings in question.

> So you can have multiple encodings before the parser, but not being presented
> to the parser. The other choice is multiple encodings after the parser: e.g.
> embedded the SJIS encoded in a latin-1-safe way. This is the same as Dave's
> comment about transliteration using notation. You can have a document like
>
> <?XML version="1.0" encoding="8859-1"?>
> <!DOCTYPE x SYSTEM "x.dtd"
> [
> <!NOTATION sjis-Qencoded SYSTEM "SjisQ.pl">
> <!ELEMENT SJIS-SECTION ( #PCDATA ) >
> <!ATTLIST SJIS-SECTION
> I-need-decoding NOTATION ( sjis-Qencoded ) >
> ]>
> <x>
> ...
>
> <SJIS-SECTION><![CDATA[
> smdkfjhhjwfnnweofijslkdm
> ]]></SJIS-SECTION>
> ...
> </x>

You had better hope that CDEnd does not appear in the encoded data!

Paul Prescod

Previous message: Rick Jelliffe: "Re: SDATA or UNICODE"
In reply to: Paul Prescod: "Re: SDATA or UNICODE"
Reply: Paul Prescod: "Re: SDATA or UNICODE"