Instead of having to confront a DTD in order to find out whether an
element(type) is CData or not, I feel it would be better if this info were
explicit in the document instance.
F.i. by adding a special CData symbol at the end of the start-tag if one
wants its content treated as CData.
If the symbol is present, the content is "plain/unstructured" text (CData),
if it's not, the content is tagged/structured text (Mixed/PCData).
Example: (Here the symbol '*' is used to indicate CData)
<MyText>This is not CData, <MyText*>but this is! Here I can write &, <,
<element> or whatever I want without being concerned about clashing with
other "magic" symbols until the end-tag.</MyText></MyText>
(Perhaps it also would be natural to have a special signal at the end-tag
as well as at the start-tag.)
Pro:
* Easy for the user, no need to remember/have DTD knowledge about
CData/PCData when reading or writing a document without DTD-aware tools.
* Easy for the parser (for the same reason)
Con:
* Incompatible with SGML
Sorry if suggestions like these already have been debated (perhaps decades
ago!) and turned down.
Cheers,
Jarle Stabell