Yesterday evening I converted a typical chemical manuscript into CML
including RDF and DC metadata, images, spectra, molecules, bibliography,
XML-LINKs to several related XML and non-XML documents, and so on. I found
the freedom of NOT having a 'conventional' DTD was very liberating. I
believe that (with the latest JUMBO) it displays quite attractively and
meaningfully to human readers.
So what is the formal value of the document to *non-human* readers? I can
see at least the following:
- TEI 'searches' of the document (especially with STRING) are very
powerful. [BTW, the fact that TEI defines substrings in PCDATA but not in
attribute values means that I now favour using subelements rather than
attributes. To that extent I think the XML-specs tilt the balance.] I
should like to 'extend' the TEI approach to search for more complex
fragments (early drafts suggested a FOREIGN keyword, which means that any
algorithm can be tacked on). I'd like to keep in step with others here - is
there any consensus on a formalised search language for XML documents?
- many 'readers' will not need to access all the data in the document, and
can reasonably extract small fragments, e.g.
DESCENDANT(ALL,PERSON)CHILD(1,VAR,BUILTIN,EMAIL)
will locate all the people who have e-mail addresses.
- XML-STYLE looks likes being extremely valuable for many document
transformations. [In the early days of JUMBO I wrote a lot of horrible code
to process and display specific elements, and I now realise this should be
done in XML-STYLE. Is anyone else hacking a Java version of XML-STYLE or do
I have to do it myself?].
The most common operations on a generic CML document look like being:
- display this attractively to a human
- search document(s) for particular chunks of information and <do
something useful with them>
I then see a role for more specific DTDs for those people who need their
documents to conform to specific formats (e.g. regulatory submissions,
safety sheets, pharmacopeias, etc.) Hopefully they will pick features out
of CML so that the semantics of elements is consistent throughout the
community. I am an idealist :-)
BTW I am particularly interested in actual implementations of things
discussed on this list, or people who are interested in developing them
collaboratively. Although XML has come a long way, we are nowhere near
having enough examples of tools to convince the rest of the world :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg