Re: SAX: New Idea for Entity Resolution

Alex Milowski (lex@www.copsol.com)
Wed, 15 Apr 1998 23:42:38 -0500 (CDT)


> Alex Milowski writes:
>
> > In effect, although the above interface is useful, it reduces
> > interchange in that I can make a document with broken system
> > identifiers work on my system. Essentially, I can make an
> > *invalid* document valid!
>
> You can do this in any case, though -- you can intercept URIs in the
> system libraries (Java, for example, lets you register your own
> schemes), or you can redirect them with a proxy server.
>
> With URLs, file:// will almost always break on exchange, as will http:
> system identifiers that refer to hostnames visible only within a
> private network.

Yes, but then if you do this, don't expect it to work elsewhere. ;-)

Why would you use absolute URLs? Bad author, bad! Ok, maybe you would
use them for a standard DTD. ;-) (This is where I beat the URN drum)

<SGMLRANT type='mild'>
In the SGML world, I could come up with a scheme that made location
orthogonal to my documents. I *never* put a system identifier in my
documents. In XML, this is much harder.
</SGMLRANT>

<URNRANT>
Now, if URN support was *standard*, I could at least put a URN in the
place of every system identifier I needed and then my document is
quite portable. The key phrase here is *standard*.
</URNRANT>

Of course, we could also fix public identifiers and forget about the
URN stuff. ...but, then we would have to come up with
yet-another-resolution-mechanism... which sounds too much like URNs.

> Your other points (which I omitted above) are well taken -- public
> identifiers are a bit of a muddle right now, but since they're in XML
> 1.0, it makes sense to support them. The interface is not only for
> public identifiers, however -- users can also remote URIs to
> local/secure equivalents, and they can even screen out certain URIs if
> necessary. I'd better copyright "XML-Nanny" before someone else
> thinks of it.

Well, a further point I was making off-line is that this kind
of mapping could be lead people down the wrong road. I have run into
so many SGML users over the years that didn't know how to or *couldn't* use
public identifiers without system identifiers. In an SGML world, I see this
as bad practice. Likewise, I see mapping system identifiers in XML as bad
practice.

Two general rules I can recommend:

1. Use an internal resolution system inside your production
systems. Locations will change even inside your own system.

2. Use a fairly static naming system (URN/Public identifier) when
you exchange documents.

One thing XML has over SGML is that it is tied more closely to a location
mechanism. If you add in URN ability, there is no issue of "configuring"
you local system to know about mappings--you just do a URN lookup.

(Obviously, URNs can be miss-configured or not available. Ever had
problems on the Internet with DNS names? Same idea, same problem, same
frustration when it is wrong!)

==============================================================================
R. Alexander Milowski http://www.copsol.com/ alex@copsol.com
Copernican Solutions Incorporated (612) 379 - 3608