RE: CDATA by any other name... (was The raw and the cooked)

Rick Jelliffe (david@megginson.com)
Tue, 3 Nov 1998 07:00:04 -0500 (EST)


Rick Jelliffe writes:

> A CDATA marked section is not only a way to prevent delimiter
> recognition. It is also a way to declare that the characters in
> that section are limited to ones available in the direct document
> encoding of the originating system. (SGML has a CDATA keyword you
> can use instead of content models: XML was felt not to need it
> because you could use <![CDATA[, however that perhaps shows the
> mind of the XML WG at that time, in that they were down-playing the
> need for schemas.) It declares "this section does not use character
> references or entities or subelements". So, conceptually, it could
> sometimes be markup, not merely delimiter recognition.

While I agree that there are always interesting new uses for markup
constructions, I think that we're straining here. My basic rule in
system design is to keep things as simple and obvious as possible; if
I wanted to signal to my application that an element contained only a
certain type of information (such as a limited character repetoire), I
would use an attribute that made that point clear, either a NOTATION
attribute or a simple CDATA attribute named something like
"character-encoding".

That said, I don't see the usefulness of limiting content to a
specific character repetoire arbitrarily; I *do* see the usefulness in
combination with an "xml:lang" or "mime-type" attribute, though. An
intelligent editor could already act on xml:lang to limit character
selection, if such a thing were desirable.

All the best,

David

-- 
David Megginson                 david@megginson.com
           http://www.megginson.com/