RE: Namespaces and URNs

Dean Roddey (roddey@us.ibm.com)
Thu, 6 Aug 1998 14:43:49 -0400

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Previous message: David G. Durand: "Re: Namespace Comments (and dtd encoding)"

"So if we are serious about making namespaces work we need to start usi=
ng
the same strings to refer to the same namespace. I agree that that mean=
s
that DTD owners/maintainers need to be involved. Since most of the comm=
on
DTDs are within the remit of the W3C, they should be thinking of how to=

identify them in namespaces. If they want to use FPIs, fine. But they
should make it clear which the FPIs *are* and use them consistently. If=

they want to use URLs instead - fine. But they shouldn't encourage the =
use
of both simultaneously."

Ok, so I'm searching the 300,000 acronyms in my head and FPI is not pop=
ping up
:-) Did I miss class that day? What does FPI stand for?

On the issue of URNs, I, like you I guess, had in my mind as I read the=

namespace specs that there would eventually be some sort of NSNS (Names=
pace
Naming Server) scheme out there. It seems like the obvious way to go ov=
er the
long haul, though I can see that it would be a big, big step and involv=
e a lot
of overhead.

Would not a another possibility be to take another cue from the object =
oriented
world of components? COM objects and other components have a globally u=
nique
identifier that indicates a particular version of a particular interfac=
e. Could
you not come up with a scheme where each creator of a DTD or Schema gen=
erates a
unique id (using a well published algorithm for which public domain too=
ls are
easily available) and publishes some canonical name of the DTD, the ver=
sions
that exist, the namespace names each one defines and any gotchas, and t=
he
unique id of each DTD version.

Then the parser could see, even if a DTD came in from different sources=
via
different URIs, that in effect they were the exact same version and tha=
t
subsequent instances of the DTD could be just ignored and the current c=
ontent
used? It would involve some statement in the DTD that must come first a=
nd which
identifies the cononical name and the unique id that represents the par=
ticular
version. The same could applied to Schemas and XML document instances i=
n
general as well I assume.

It could also allow a parser to recognize that two or more versions of =
the same
DTD/Schema was being used simultaneously and warn about it. Really smar=
t
systems could use this to automatically build a map of synonyms I guess=
, though
that's kind of a scary thought in some ways. It could let particular
applications insure that they were getting only particular versions of
particular DTDs, etc...

Unfortunately its probably a bit late to be suggesting something like t=
his,
since it will require some new verbiage in the file format. But, if the=
re is no
real likelihood of getting some registration mechanism out there (and/o=
r such a
mechanism would be too much of a burden), then some such unique identif=
ication
mechanism could be a decent second choice, don't you think?

Something like the MD5 hash generates 128 bit hashes. Its well known, f=
ree,
etc... All you need is a simple algorithm to feed it a semi-consistent =
input
buffer made up of the canonical name, current time in milliseconds on y=
our
system, version string of the particular version, author name, etc... a=
nd the
likelihood of it generating a clash with 2^128 possibilities (many, man=
y times
the number of atoms in the universe I believe) is extraordinarily low. =
The
likelihood of two files with a clash being used by the same document is=

probably not even worth thinking about.

Of course you could argue that you could just do a DOMHash on both file=
s and
consider them the same if they hash the same, but that seems to be unre=
asonable
since it would leave out any non-DOM parsers from the fun. The scheme a=
bove
would let everyone play and push the overhead to 'compile' time instead=
of
runtime.

So is this a totally dumb idea, or does it make some sense? It was a ba=
sically
off the cuff, unencumbered by the thought process suggestion, but it ma=
kes some
sense on the face of it.

----------------------------------------
Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
roddey@us.ibm.com
=

Previous message: David G. Durand: "Re: Namespace Comments (and dtd encoding)"