So what you're saying is that this sort of thing can work, even with all
of the minimization features of SGML, because you knew the general layout
of your data and/or you had a tool that could normalize the weird stuff
for you. You succeeded at your task because you approached it (perhaps
unknowingly) with either:
a) the right data set: more or less already normalized SGML or
b) the right tool -- a normalizer: AE.
That's exactly what I've been saying also. If you are going to do regular
expression hacking on XML it had better have been already marked-up in
some corporate standard (which would probably exclude short end-tags,
confusing entities, confusing whitespace, confusing newlines, etc.) or you
should have a tool that can normalize it for you.
Paul Prescod - http://itrc.uwaterloo.ca/~papresco
"A writer is also a citizen, a political animal, whether he likes it or
not. But I do not accept that a writer has a greater obligation
to society than a musician or a mason or a teacher. Everyone has
a citizen's commitment." - Wole Soyinka, Africa's first Nobel Laureate