Re: Why XML data typing is hard (was Re: Internal subset equivalent in new schema proposals?)

Ketil Z Malde (david@megginson.com)
Fri, 27 Nov 1998 08:47:59 -0500 (EST)


Ketil Z Malde writes:

> Catching illegal values early on - in validation of the document -
> instead of relying on some obscure run-time error in some program,
> is a *feature*.

Agreed -- this is a very good choice, especially if you have human
authors.

The real question, though, is how constraints could be enforced.
Let's start with an extremely simple example:

<value xml:type="float"></value>

What are the allowed contents? Certainly, +, -, and the digits 0-9
should be allowed, as well as the letter 'e', but which of the
following should throw an error?

<value xml:type="float">1,5</value>
<value xml:type="float">1.5</value>

There are three obvious answers:

1. Both are accepted.
2. Only one is accepted, and everyone learns to use that format.
3. Only the correct one for the current locale is accepted.

Option #2 is politically unworkable (either France or the U.S. would
take up arms), and option #1 seriously weakens validation (what if an
English author had mistakenly intended to use the comma to specify a
range?). Option #3 looks OK on the surface, but it is actually the
worst of the three because it destroys interoperability: same XML
document may be considered correct by some parsers and erroneous by
others, depending on what locale the user happened to choose.

This is a very simple example; after you've worked this out, you can
start worrying about how to count combining characters with
field-length restrictions, etc.

All the best,

David

-- 
David Megginson                 david@megginson.com
           http://www.megginson.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)