Indexing of XML documents

Peter Murray-Rust (Peter@ursus.demon.co.uk)
Fri, 14 Mar 1997 23:19:46 GMT

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Previous message: Peter Murray-Rust: "Re: XML parsers hit the big time"

I hope I can express this problem clearly - I'm sure that you are
familiar with it.

When we need to resolve a TEI pointer like (id a23) we may have to scan
the whole document. In general we will wish to cache (index) IDs since
we don't wish to rescan for another search. One obvious place to do this
is when the document is first read in (admittedly there may never be a need
to scan the whole document).

When validating a document the IDs, GIs and ATTNAMEs all have to be scanned
since they occur in VC's. Presumably as a by-product of validation we can
at least expect a hashtable of IDs (and possibly GIs).

The question is, should we do both of these by default (or even others
that I haven't thought of)? Or should we do none and leave it to the app?
Or should the parser have a switch?

[BTW a WF document can have multiple identical IDs, OK? Presumably the
behaviour of an app that has to reference them is 'undefined'?]

-- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/

Previous message: Peter Murray-Rust: "Re: XML parsers hit the big time"