> My afterall impression is that most available tools do well with
> toy examples, but any input being in the MB range easily blasts
> them. At least that's true for what came from MS so far.
I don't think that that's true in general. Most of the Java-based XML
parsers I've tried seem to be able to handle Jon Bosak's XML Old
Testament (nearly 4MB) just fine, if somewhat slowly -- I used ot.xml
for routine testing and profiling while developing AElfred, and
AElfred barely kicked up a sweat.
The problem comes if the parser tries to build a tree rather than
simply reporting an event stream. Depending on the implementation,
document trees tend to be very large. With a naive tree
implementation, a 10MB document might use 100MB or more of virtual
memory for the document tree -- that'll bring most current desktop
systems to a screeching halt.
All the best,
David
-- David Megginson david@megginson.com http://www.megginson.com/