> Under Unicode version 2.0,
> what I should've said is:
>
> Unicode == ISO-10646-UCS-2 != UTF-16
>
> as Unicode and 10646 in UCS-2 format should be identical, but UTF-16
> differs from both of these in it allows the use of code surrogate
> pairs to enable encoding the BMP and next 16 planes of UCS-4. From
> what I can see at Unicode's home page, it now looks like Unicode is
> dropping UCS-2 character encoding and now only endorses UTF-8 and
> UTF-16, so that the situation now is:
>
> Unicode != ISO-10646-UCS-2
>
> and Unicode sometimes does/sometimes does not equal UTF-16. Is that
> more or less the case at the moment?
"Unicode 2.0" and "Unicode 2.1" always mean UTF-16. UCS-2 proper
(that is, the encoding that does not allow references to what
10646 calls Planes 1 to 10) has never been Unicode since the
distinction between UCS-2 and UTF-16 was invented. Before that,
there was only UCS-2 and Unicode = UCS-2.
So Unicode = UTF-16 != UCS-2, but the distinction is usually
trivial: UCS-2 per se does not define any meaning for surrogate
characters.
-- John Cowan http://www.ccil.org/~cowan cowan@ccil.org You tollerday donsk? N. You tolkatiff scowegian? Nn. You spigotty anglease? Nnn. You phonio saxo? Nnnn. Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)