Re: Character encodings

John Cowan (cowan@locke.ccil.org)
Mon, 07 Dec 1998 16:05:54 -0500


Chris von See scripsit:

> In section 4.3.3, the XML spec implies
> that support of ISO 10646 UCS-2 encoding (i.e. Unicode)

Unicode = UTF-16, not UCS-2. In practice there is no difference
at the moment, because no 10646 planes other than the BMP (plane 0)
contain any characters. UTF-16 = Unicode uses surrogate
characters, whereas UCS-2 simply cannot represent the Astral Planes.
Therefore, UTF-16 encoding should be used instead of UCS-2.

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)