RE: ASCII control characters in XML

David Brownell (David.Brownell@Eng.Sun.COM)
Tue, 28 Apr 1998 10:52:08 -0700


> Also, you could define appropriate unparsed entities and then refer to them
> by attribute <char val="x0007"/> (having previously defined x0007 to be an
> unparsed entity and val to be an entity attribute).

Or, just define the semantics of the DTD (as usual, using a natural
language) to be that the "val" attribute of the "char" element is a
numeric string giving a UCS-4 character value. In any case you'll
need to add a semantic interpretation ... either that the entity is
single character entity, or that it's a number denoting a character.
I used the latter, since it's a simpler rule to implement.

You get a similar need if you must transmit a UNICODE surrogate, which
can't appear in XML (although it can appear in a UTF-16 or UTF-8
encoding of XML, as part of a pair) or any other character outside the
range allowed by XML (which maxes out at hex 00.10.00.00).

- Dave