Re: Mixed Content Models

Chris Maden (crism@oreilly.com)
Mon, 21 Sep 1998 15:04:34 -0400 (EDT)


[Jerome McDonough]
> In looking over the XML spec (3.2.2) on mixed content models,
> something isn't clear to me. I'm hoping someone here can enlighten
> me.
>
> I've inherited a DTD for development that was originally
> intended to be an SGML DTD, and has been converted to XML.
> Contained within it is the following:
>
> <!ELEMENT qstn (#PCDATA | (preQTxt?, qstnLit?, postQTxt?, forward?,
> backward?, ivuInstr*))*
>
> Is this a legitimate content model under XML section 3.2.2?

No. See production [51]. A mixed content declaration MUST be of the
forms

<!ELEMENT e-type1 (#PCDATA | sub1 | sub2 | sub3)*>
<!ELEMENT e-type2 (#PCDATA)>

This is not optional.

> Msxml doesn't have a problem with it, and nsgmls using the -wxml
> flag also happily parses the DTD. IBM's xml4j, however, complains:
> "Codebook.dtd: 1256, 33: This content model is not matched with the
> mixed model '(#PCDATA|FOO|BAR|. . .|BAZ)*': '(#PCDATA|(preQTxt?,
> qstnLit?, postQTxt?,forward?,backward?,ivuInstr*))*".

I'm a little surprised that nsgmls doesn't catch this; however, the
-wxml option warns about some, or even most, XML errors, but not all
of them.

Fortunately, your content model is equivalent to

<!ELEMENT qstn (#PCDATA | preQTxt | qstnLit | postQTxt | forward |
backward | ivuInstr)*>

so this isn't a real problem. In some cases, it is true that content
models will need to be either tightened or loosened to be expressed as
XML (notably models involving exceptions).

-Chris

-- 
<!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN"
"<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487
<USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>