Come on, This is a crock. I've set that crytic little variable
(funny that everything in Perl deserves that description) so that
linend won't block regexp matches. Once that was done, I wrote a few
regexps and parsed HTML just fine (It takes 1 line for a simple tag
pattern match, and 10 for a loop to create a reasonably full parse
into elements, content, and attribute values). I'm sure a "real" Perl
programmer (unlike me) can shrink that down to 2-3 lines of
triwty little characters, all of them different.
XML should be no harder. My understanding of the goal for the DPH was
always that XML would be no worse than HTML -- ie. for quick and dirty
transformations or operations, quick and dirty parsers would work. As
far as I can tell, "dirty" means that you know (or are pretty sure)
they will work with one document or corpus of documents, not
necessarily that they will work with any arbitrary document.
If you never break tags across lines in your documents, your Perl
desperation may work without worrying about this case; if you do, you
have to have smarter desperation. For _reliable_ parsing of
_arbitrary_ documents, you probably do need a full parser of the
instance language (10 productions in the standard, or so, wasn't it?).
There's no reason that that level of parsing can't be implemented
within no more than 20 lines of Perl. I can't remember (or abide) the syntax of
Perl enough to write it, but I'm sure there's a DPH on the list wh
would love to volunteer.
>>IT Sounds to me like what we really need is a small paper (about 5
>>paragraphs) explaining whitespace for developers:
>>
>I think this is an excellent idea!
Well, I gave the three sentence version. Feel free to expand it...
Acually I think the three sentences sum it up pretty well.
--
David------------------------------------------+----------------------------
David Durand dgd@cs.bu.edu| david@dynamicDiagrams.com
Boston University Computer Science | Dynamic Diagrams
http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/