Consider the following macaronic document (about as much as I can manage :-):
<!DOCTYPE bier [
<!ELEMENT bier (rot, gut)>
<!ATTLIST bier xml:lang NMTOKEN #IMPLIED>
<!ELEMENT rot (#PCDATA)>
]>
<bier xml:lang="de"><rot>gut</rot></bier>
What are the attributes of <rot>? Correct: there aren't any, but the
"intent" of xml:lang="de" applies to <rot>. I regard this as a subtle form
of minimisation which may create considerable problems further downstream.
I think this will be particularly problematical for software which relies
on attributes to identify or process parts of a document.
It's reasonable to assume that generic text-aware XML software might be
asked "please find all elements in a document which are expressed in
German." It can't look for all elements with xml:lang="de" because they
don't have this explicitly. So general mechanisms such as XPointers can't
be used, and bespoke software must be written. This software (presumably)
finds all elements which *do* have the attribute and continues recursively.
At this point an application developer has to ask: "[how] should I include
support for xml:lang in my *application*?". The spec makes it clear that
the *parser* does not add an attribute xml:lang, i.e. the above document is
not equivalent to:
<!DOCTYPE bier [
<!ELEMENT bier (rot, gut)>
<!ATTLIST bier xml:lang NMTOKEN #IMPLIED>
<!ELEMENT rot (#PCDATA)>
]>
<bier xml:lang="de"><rot xml:lang="de">gut</rot></bier>
(which is well-formed but *invalid*).
or
<!DOCTYPE bier [
<!ELEMENT bier (rot, gut)>
<!ATTLIST bier xml:lang NMTOKEN #IMPLIED>
<!ELEMENT rot (#PCDATA)>
<!ATTLIST rot xml:lang NMTOKEN #IMPLIED>
]>
<bier xml:lang="de"><rot xml:lang="de">gut</rot></bier>
which explicitly shows the author's intent.
The three documents will presumably *behave* similarly (I hope identically).
The same concern applies to xml:space.
XLink is similar but uses a different phrasing (4.3): "If any such
[semantic] attribute [such as role] is omitted from a locator element, the
value providing on the containing linking element is to be used". Example:
<reaction xml:link="extended" role="reactant">
<molecule xml:link="locator" href="mol1.xml"/>
<molecule xml:link="locator" href="mol2.xml"/>
</reaction>
The locator elements *behave* *as if* they were represented as:
<molecule xml:link="locator" href="mol1.xml" role="reactant"/>
<molecule xml:link="locator" href="mol2.xml" role="reactant"/>
Note: this would be well-formed but *invalid* unless <molecule> was
declared with
<!ATTLIST molecule role CDATA #IMPLIED>
----------------------------------
The problem is that *all* applications dealing with XML and XLink now have
to consider three separate non-trivial questions. They can ignore the first
two and assume that humans can deal with the resulting fuzziness (though
the automation of the language aspect is important for machine-based
translation and terminology). If XLink is to work, some mechanism *must* be
provided and I am struggling with this at present :-)
Since the three mechanisms seem to be supportable by a single software
module it seem to make sense to discuss whether this is possible. If so
perhaps we can devise a communal mechanism/interface. Otherwise we shall
have semantic fragmentation where some tools support parts of some of these
and other support other parts and some behave inconsistently w.r.t others.
I have voiced these concerns before - are there others who see this as
needing addressing at an implementation level? If so, we should tackle this
quickly before the fuzziness increases [and certainly before other drafters
of new specifications adopt this approach, thus increasing the load on the
implementers.]
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg