Re: XML vs the Dreaded Whitespace

Tim Bray (tbray@textuality.com)
Sat, 13 Dec 1997 14:57:07 -0800

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: David Megginson: "XML Architectural Forms"
Previous message: Tim Bray: "Re: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal"
Maybe in reply to: David Megginson: "XML vs the Dreaded Whitespace"

At 03:00 AM 11/12/97 -0500, Chris Smith wrote:
>Part of this work requires that these documents carry document
>authentication information. This, in turn, requires that some regions
>of an XML document must be transported *exactly*, and must be received
>and checked identically so that the message authentication actually
>works. That fact that we are considering the idea of including email
>as a transport mechanism doesn't help matters.

So your proposal is:
(1) transcode into UTF-16 if necessary
(2) digitally sign what you get after (1).

I think this is a sensible way to go. Obviously, there are
anomalies;

<a foo='1' bar="2"/>
will not be the same as
<a
foo="1"
bar='2'
></a>

which is surprising, but trying to find solutions may well not be
cost-effective.

You *might* want to consider losing the prologue and start checking
just at the root element.

You *might* want to consider normalizing namespace prefixes.

You *might* want to normalize whitespace in markup.

You *might*, etc etc etc etc; unless you are willing to commit to
a full grove/propert-set model a la SGML's extended facilities, you
may well be better off signing the instance as it sits.

In particular, I think there are lots of things that would be easier
and less trouble-prone to work around than line-breaking, which is well
known to be highly error-prone. For example, in the line-break HERE->
how many space characters that you can't see follow the ">"?

There might be a useful halfway point as follows; run it through an
XML processor and sign just the combination of element type, attribute
name-value pairs, and textual content that the processor emits; this
allows you to finesse a lot of quoting/white-space/line-end issues;
also it allows authors to use tricks like default attributes and
internal entities that don't "really" change the content.

On the other hand, I'd say that off the top, just digitally signing the
UTF-i-fied characters as they sit is a reasonable way to go. -Tim

Next message: David Megginson: "XML Architectural Forms"
Previous message: Tim Bray: "Re: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal"
Maybe in reply to: David Megginson: "XML vs the Dreaded Whitespace"