> I feel strongly that this project will need an implementation, though =
I also=20
> fear that I'm not a good programmer to execute it. I'd like to see =
the=20
> implementation built on SAX if possible, to continue the tradition of =
openness=20
> it began. I can see something like a 'validating SAX', (vSAX?) a =
program=20
> which uses the SAX API to parse a DTD (or whatever we call it) and =
then uses=20
> SAX again to parse the document, validating it against the DTD. vSAX =
would=20
> then use the same SAX API to pass the information to the routine which =
called=20
> it in the first place. Applications already using SAX could call vSAX =
without=20
> having to make many changes.
>=20
> This may go beyond the capabilities of the event-driven model. =
Building this=20
> project in such a way that the vSAX parser could validate documents =
without=20
> having to build an entire tree would likely warp the DTDs =
dramatically. That=20
> could be interesting, but I suspect vSAX would have to build a tree=20
> internally.=20
I might be getting a bit ahead of the game here, so please bear with me =
-- these=20
thoughts are in my head now and I'd like to get them down.
Trees vs. Events
----------------
It seems like we need to decide early on whether we are interested in =
getting=20
the DTD as events or a tree. Arguing in favor of events is the fact =
that it is=20
more reasonable to build a tree from events than vice versa (less memory =
usage),=20
so events are the more basic form. However, I also think that what is =
returned=20
really depends on intended usage.
In my limited imagination, events are mostly useful for display -- read =
in the=20
DTD definition-by-definition and display it. This is a common operation =
with=20
the text in an XML document and is presumably why SAX returns events. =
Except=20
for displaying a DTD or building a tree, how else would DTD events be =
used?
The two prime uses of DTDs that I can think of are validation and =
exploration. =20
Both of these require the information to stay in memory and be accessed=20
randomly, which (to me) implies a tree, hash table, or similar =
structure. Are=20
there any common uses of DTDs that require serial access?
Flat Trees vs. Tree Trees
-------------------------
If trees are used, another question is what form the tree takes. =
XML-Data=20
currently defines a tree that uses XML's hierarchy as a way to group =
information=20
about individual elements. However, the relation between those elements =
is=20
actually flat. For example, the following DTD converts to the following =
=20
XML-Data structure:
DTD:
<!DOCTYPE a [
<!ELEMENT a (b)>
<!ELEMENT b (#PCDATA)>
]>
XML-Data:
<schema id =3D "a">
<elementType id =3D "a">
<element type =3D "#b"/>
</elementType>
<elementType id =3D "b">
<string/>
<?elementType>
</schema>
Notice that the definitions of a and b are at the same level. That is, =
when I=20
build a DOM tree from this XML, a and b are siblings, not parent and =
child. =20
When exploring a DTD, the parent-child relationship is far nicer -- I =
move up=20
and down the DOM tree and get the metadata I need at each level. On the =
other=20
hand, such a tree complicates the DTD (sorry ;) for XSchema/XTD/etc. and =
I'm not=20
sure if representing children with multiple parents would even be =
possible,=20
given the strict nesting requirements of XML. Comments?
-- Ron Bourret