RE: XML Wishes (</>, quotes and entity resolution)

Jarle Stabell (jarle.stabell@dokpro.uio.no)
Wed, 1 Oct 1997 22:23:37 +0200


Jarle Stabell wrote:
> 1. Please incorporate the </> tag, it would take a parser-writer 5 =
minutes to implement it, as well as save=20
bandwitdth, diskspace, typing and in some cases ease reading. (It could =
also be used to write=20
hard-to-understand/maintain documents, but that's up to the user)

Murray Altheim writes:
As both a document type designer, a parser writer, and a document =
author,=20
I think one of the main advantages to XML is the requirement of =
explicitly-
named end tags.

[JS] Agree.

[MA] The save-typing argument is moot in that most people will=20
probably not hand-edit tags.

[JS] Maybe. But I know I will. Therefore I would like it. :-)

I really think

<LASTNAME>Doe</>
<FIRSTNAME>John</>

are faster/easier to read than

<LASTNAME>Doe</LASTNAME>
<FIRSTNAME>John</FIRSTNAME>

and I keep seeing lots of things like this.
Having this possibility would perhaps also prevent people from using =
"cryptic" abbreviations as element type names/ID's
I agree that closing an element having subelements with a </> would be a =
"bad thing" for a document writer to do.

[MA] For those that do, having the explicit end=20
tags is probably a Very Good Thing, in that it saves confusion. And =
while it
maybe only takes '5 minutes' (NOTHING takes five minutes) to add in a =
parser,
suddenly a simple parser must build a document tree in order to know =
which
element is being closed by '</>', which makes simple parsers into much =
more
complicated ones. This is not a benefit.

[JS] Ok, I didn't think of the possibility of anyone building XML =
parsers without building the document tree. (I won't disclose any =
estimate for building the document tree...:-) )

> [JS] 2. Allow non-quoted attribute values. I guess support for this =
is also a 5 minutes project for the=20
parser-writer.

[MA] We're up to ten minutes. Actually, this makes the parser more =
complicated,=20
since knowing that attribute values are delimited allows a simple =
'scan-literal'
approach, ie., if the first character after the equals sign is a single =
quote,
one scans to the next single quote. If a double, scan to the next =
double. If=20
they are optional things get much more complicated, and we now must care =
about
what type of characters are in the content of the literal. Options and=20
minimization features generally add a lot of work for parser writers.

[JS] I think the complexity this adds for the parser writers are =
neglible, it's a very local thing, typically located to a single =
method/routine.
If having the possibility of omitting the quotes would benefit users, =
perhaps by making it more SGML compatible, I definitely think one should =
allow this. I've already seen documents on the web stated as being XML =
documents without the quotes. If some parsers allow it (I don't know!), =
then the other parsers would seem unecessary "stubborn" from a user's =
perspective.

> [JS] 3. Add a paragraph to the XML standard document explaining why =
character references should be resolved before=20
storing the string as the value of the entity.

[MA I believe we would lose an enormous amount of expressive power and =
put
unnecessary restrictions.

[JS] This may very well be true. I'm not an SGML expert.
I'd love to see an example of this. I think a good example of this would =
make XML parser writers much more motivated when implementing it! :-)

[MA] Recursive entity resolution is not programmatically
that much extra work

[JS] Perhaps not the resolution itself.
But making it possible to give the user good error messages (and =
displaying the location(s) where the error takes place) I assume is =
quite a lot of work. Perhaps not so for the direct coding, but to come =
up with the necessary architecture/design.
I also think this simpler model would make for simpler API's for tool =
builders, at least for tools needing to have info about where entities =
were invoked in the original document. (f.i. tools which =
updates/synchronizes documents need this info, in order to not "flatten =
it out".)

[MA] and allows for various important SGML facilities. And
remember that one of the explicit goals for XML is SGML compatibility.

[JS] Yes. But it would be very sad if this made XML substantially more =
complex (without any other benefit than compatibility), both for users =
and tool vendors.

Cheers,
Jarle Stabell