Re: XML tools and big documents

Michael Kay (M.H.Kay@eng.icl.co.uk)
Fri, 4 Sep 1998 11:59:51 +0100


>To this end, I have been (in such spare time as i have)
tinkering
>about with Mr. Clark's XP API (com.jclark.xml.tok, mostly)
to write an
>application that will allow me to attach the logical
element structure
>to offsets in the storage entity, so that I can consider
the logical
>structure's relationship to points in the text without
reparsing the
>document
I think we're all looking for a solution to the problem that
a >1Mb document is too big, we don't want to parse it every
time we want to look at it, but storing the fine-grained DOM
representation has the opposite problem, it takes too much
space and takes too long to reassemble a reasonable unit
like a page. Indexing the original serial XML (say at
"chapter" level) is one solution; it's essentially
equivalent to my approach, which has been to split the
original XML (say at "chapter" level) and store the
"chapters" as separate linked XML documents.

What I mean by "chapter" is typically 1-10Kb, or
alternatively, a chunk of text such that the user doesn't
mind pressing "Next" when he's got to the end of it.

Mike Kay