Attributes with Intent

Peter Murray-Rust (peter@ursus.demon.co.uk)
Mon, 04 May 1998 09:05:31

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Previous message: Peter Murray-Rust: "Re: comp.text.xml"

The XML Recommendation creates a category of attributes which have intent.
(2.10 xml:space, 2.12 xml:lang). I believe that this raises implementation
problems for which we may wish to devise a common protocol. The same
problem arises in implementing XLink which, although not final, appears
likely to contain the same (or closely related constructs).

Consider the following macaronic document (about as much as I can manage :-):

<!DOCTYPE bier [
<!ELEMENT bier (rot, gut)>
<!ATTLIST bier xml:lang NMTOKEN #IMPLIED>
<!ELEMENT rot (#PCDATA)>
]>
<bier xml:lang="de"><rot>gut</rot></bier>

What are the attributes of <rot>? Correct: there aren't any, but the
"intent" of xml:lang="de" applies to <rot>. I regard this as a subtle form
of minimisation which may create considerable problems further downstream.
I think this will be particularly problematical for software which relies
on attributes to identify or process parts of a document.

It's reasonable to assume that generic text-aware XML software might be
asked "please find all elements in a document which are expressed in
German." It can't look for all elements with xml:lang="de" because they
don't have this explicitly. So general mechanisms such as XPointers can't
be used, and bespoke software must be written. This software (presumably)
finds all elements which *do* have the attribute and continues recursively.

At this point an application developer has to ask: "[how] should I include
support for xml:lang in my *application*?". The spec makes it clear that
the *parser* does not add an attribute xml:lang, i.e. the above document is
not equivalent to:

<!DOCTYPE bier [
<!ELEMENT bier (rot, gut)>
<!ATTLIST bier xml:lang NMTOKEN #IMPLIED>
<!ELEMENT rot (#PCDATA)>
]>
<bier xml:lang="de"><rot xml:lang="de">gut</rot></bier>

(which is well-formed but *invalid*).

<!DOCTYPE bier [
<!ELEMENT bier (rot, gut)>
<!ATTLIST bier xml:lang NMTOKEN #IMPLIED>
<!ELEMENT rot (#PCDATA)>
<!ATTLIST rot xml:lang NMTOKEN #IMPLIED>
]>
<bier xml:lang="de"><rot xml:lang="de">gut</rot></bier>

which explicitly shows the author's intent.
The three documents will presumably *behave* similarly (I hope identically).

The same concern applies to xml:space.

XLink is similar but uses a different phrasing (4.3): "If any such
[semantic] attribute [such as role] is omitted from a locator element, the
value providing on the containing linking element is to be used". Example:
<reaction xml:link="extended" role="reactant">
<molecule xml:link="locator" href="mol1.xml"/>
<molecule xml:link="locator" href="mol2.xml"/>
</reaction>

The locator elements *behave* *as if* they were represented as:
<molecule xml:link="locator" href="mol1.xml" role="reactant"/>
<molecule xml:link="locator" href="mol2.xml" role="reactant"/>

Note: this would be well-formed but *invalid* unless <molecule> was
declared with
<!ATTLIST molecule role CDATA #IMPLIED>

----------------------------------

The problem is that *all* applications dealing with XML and XLink now have
to consider three separate non-trivial questions. They can ignore the first
two and assume that humans can deal with the resulting fuzziness (though
the automation of the language aspect is important for machine-based
translation and terminology). If XLink is to work, some mechanism *must* be
provided and I am struggling with this at present :-)

Since the three mechanisms seem to be supportable by a single software
module it seem to make sense to discuss whether this is possible. If so
perhaps we can devise a communal mechanism/interface. Otherwise we shall
have semantic fragmentation where some tools support parts of some of these
and other support other parts and some behave inconsistently w.r.t others.

I have voiced these concerns before - are there others who see this as
needing addressing at an implementation level? If so, we should tackle this
quickly before the fuzziness increases [and certainly before other drafters
of new specifications adopt this approach, thus increasing the load on the
implementers.]

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

Previous message: Peter Murray-Rust: "Re: comp.text.xml"