Norbert's answers agree with what I got and also with the consensus
of the group. It's clear that WF files can give *different* data from
those with some or all of the ELEMENT declarations. I do not find the
behaviour intuitive and believe we have to address it in some manner.
I am sympathetic to trashing the whitespace PCDATA elements, but there is
no clear idea of how. An application like:
<PRE>
<IMG SRC="dot1">
<IMG SRC="dot2">
</PRE>
may wish the result to have 3 newlines as children (i.e. 5 elements in
all). But equally an app may be frustrated by the extra elements. It's
easy to ask for the TEI pointer "DESCENDANT(1,PRE)CHILD(1,*)" and expect to get
the dot1. This can be criticised as bad style but it's as likely to arise
from ignorance rather than sloppiness.
There has rightly been concern about the conformance of parsers (esp. their
reaction to errors). This is an area where I suspect conformance is
non-trivial.
P.
-- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/