Re: PCDATA vs CDATA

Richard Tobin (richard@cogsci.ed.ac.uk)
Tue, 30 Jun 1998 23:35:26 +0100


> Hmm, is that the only case where an XML parser might do the "wrong thing" if
> it came across a document without a supporting DTD?

Yes. There are some things an XML parser can't do without a DTD:

- validating (obviously)
- determining which whitespace is ignorable
- normalising attributes and inserting default values
- expanding entity references

but despite those constraints it can parse the document and determine
whether it is well-formed.

> It seems to me that if
> a document comes through without a DTD, and an element contained data not
> explicitly escaped, then it would not be unreasonable to assume PCDATA and
> try to parse it. However, if a DTD is there to provide more info, then use
> it. I am not sure I see how it is significantly different than validating
> that an element may, or may not, be a child of another element.

If the parser doesn't know that the content of an element is CDATA it
will very likely parse a correct document wrongly. This is not the
case if it just doesn't know what children are allowed.

For example, if c were declared CDATA and the parser didn't have the
DTD, it would report a syntax error for

<c>></c>

Various other features of SGML have been omitted for the same reason,
in particular start- and end-tag omission. Similarly a new syntax has
been created for empty elements, because without the DTD a parser
can't tell that an element must be empty.

-- Richard