identify them in namespaces. If they want to use FPIs, fine. But they
should make it clear which the FPIs *are* and use them consistently. If=
they want to use URLs instead - fine. But they shouldn't encourage the =
use
of both simultaneously."
Ok, so I'm searching the 300,000 acronyms in my head and FPI is not pop=
ping up
:-) Did I miss class that day? What does FPI stand for?
On the issue of URNs, I, like you I guess, had in my mind as I read the=
namespace specs that there would eventually be some sort of NSNS (Names=
pace
Naming Server) scheme out there. It seems like the obvious way to go ov=
er the
long haul, though I can see that it would be a big, big step and involv=
e a lot
of overhead.
Would not a another possibility be to take another cue from the object =
oriented
world of components? COM objects and other components have a globally u=
nique
identifier that indicates a particular version of a particular interfac=
e. Could
you not come up with a scheme where each creator of a DTD or Schema gen=
erates a
unique id (using a well published algorithm for which public domain too=
ls are
easily available) and publishes some canonical name of the DTD, the ver=
sions
that exist, the namespace names each one defines and any gotchas, and t=
he
unique id of each DTD version.
Then the parser could see, even if a DTD came in from different sources=
via
different URIs, that in effect they were the exact same version and tha=
t
subsequent instances of the DTD could be just ignored and the current c=
ontent
used? It would involve some statement in the DTD that must come first a=
nd which
identifies the cononical name and the unique id that represents the par=
ticular
version. The same could applied to Schemas and XML document instances i=
n
general as well I assume.
It could also allow a parser to recognize that two or more versions of =
the same
DTD/Schema was being used simultaneously and warn about it. Really smar=
t
systems could use this to automatically build a map of synonyms I guess=
, though
that's kind of a scary thought in some ways. It could let particular
applications insure that they were getting only particular versions of
particular DTDs, etc...
Unfortunately its probably a bit late to be suggesting something like t=
his,
since it will require some new verbiage in the file format. But, if the=
re is no
real likelihood of getting some registration mechanism out there (and/o=
r such a
mechanism would be too much of a burden), then some such unique identif=
ication
mechanism could be a decent second choice, don't you think?
Something like the MD5 hash generates 128 bit hashes. Its well known, f=
ree,
etc... All you need is a simple algorithm to feed it a semi-consistent =
input
buffer made up of the canonical name, current time in milliseconds on y=
our
system, version string of the particular version, author name, etc... a=
nd the
likelihood of it generating a clash with 2^128 possibilities (many, man=
y times
the number of atoms in the universe I believe) is extraordinarily low. =
The
likelihood of two files with a clash being used by the same document is=
probably not even worth thinking about.
Of course you could argue that you could just do a DOMHash on both file=
s and
consider them the same if they hash the same, but that seems to be unre=
asonable
since it would leave out any non-DOM parsers from the fun. The scheme a=
bove
would let everyone play and push the overhead to 'compile' time instead=
of
runtime.
So is this a totally dumb idea, or does it make some sense? It was a ba=
sically
off the cuff, unencumbered by the thought process suggestion, but it ma=
kes some
sense on the face of it.
----------------------------------------
Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
roddey@us.ibm.com
=