Firstly to paraphrase some earlier comments, the "whitespace problem" has
resulted from its dual personality.
Personality 1. The programmer's whitespace ("pretty printing") is used as a
layout tool for visual editing of the markup and content. Besides, lots of
editing applications won't allow lines over 250 characters.
Personality 2. The whitespace is part of the content used because the
author either wanted it that way or he/she could not see any other easy way
to encode the information correctly.
SGML tried to cater for both personalities and it succeeded in a moderate
fashion. The downside was that it is not an easy task to maintain and
process SGML documents.
Now for some personal opinion on what I thought XML was all about. XML is
an attempt to either simplify SGML (get rid of or change the bits which make
it hard to understand/use/process) or extend HTML to deal with information
content as well as presentation. I lean towards the former view "SGML for
the Web".
IMHO the current XML "whitespace handling" has not simplified the SGML
situation significantly.
Here are some comments and slight variations on Sean's suggestion.
I belive that Sean's suggestion has plently of merit.
What is wrong with having some standard elements
(<PCDATA>,<CDATA>,<NEWLINE>)which are part of every XML DTD?
If you didn't want users to have to author these tags then "normalisation"
applications could be developed which could convert "raw" XML into the
"normalised" version.
Example:
<foo>
I am data 1
I am <emph>data</emph> 2
</foo>
could be normalised to:
<foo>
<pcdata>I am data 1</pcdata><newline/>
<pcdata>I am data 2</pcdata>
</foo>
or
<foo><pcdata>I am data 1 I am <emph>data</emph> 2</pcdata>
</foo>
depending on the DTD declarations for the elements or a style sheet (?!!)
However, normalisation is not needed if the authors can be given tools which
can produced the desired markup.
Thus, all whitespace in the "normalised" documents could be collapsed to a
single space (because we removed personality 2 we are only left with pretty
printing).
I will stop rambling now.
IMHO the solution lies in removing the dual personalities of whitespace at
document authoring time (or at its interface to XML tools for documents
tagged by human hand).
Regards,
Bill
Regards,
Bill Donoghoe bdonoghoe@acslink.net.au
InfoTech (NSW) Pty Ltd mobile: 014 625 397 (in Australia)
SGML/HyTime/DSSSL/XML Consultancy and Development