Re: Mix encodings in a document?

Tony Graham (tgraham@mulberrytech.com)
Mon, 28 Sep 1998 15:31:52 -0400 (EDT)


At 23 Sep 1998 16:21 -0400, John Cowan wrote:
> Deke Smith wrote:
> > And what is the implications of this (if any) for XML rendering? I'm not
> > sure of what you mean by "surrogates are correctly processed."
>
> Essentially it means that the two 16-bit values that form a
> surrogate-pair (representing a Unicode character on the Astral
> Plane) is always treated as a single character.
>
> In XML, surrogate-pairs can appear only in attribute values, #PCDATA
> content, PIs, and comments; they are not allowed in element GIs,
> attribute names, or the like.

Surrogate pairs are not allowed in parsed entities. The production
for Char excludes the surrogate blocks:

[2] Char::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD]
| [#x10000-#x10FFFF]

You can include non-BMP/non-UCS-2 characters by making numeric
references to their Unicode Scalar Value (or by using UCS-4).

Regards,

Tony Graham
======================================================================
Tony Graham mailto:tgraham@mulberrytech.com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9632
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================