I am guilty of imprecision ( sorry :-) I meant an internal indexing of the
document tree, not an index to locate the document.
>
> > In general we will wish to cache (index) IDs since
> > we don't wish to rescan for another search.
> I don't follow this. Under what circumstances is searching a document for
> an ID much more painful than using a cache? Is this for 100 MByte documents?
> (which do exist, by the way, droves. No, like elephants, in herds)
Yes - I was thinking of exactly that. Particularly if the document contains
thousands of elements (e.g. large chunks of HTML-like material).
>
> > When validating a document the IDs, GIs and ATTNAMEs all have to be scanned
> > since they occur in VC's.
> Not sure what a VC is (validatable context??) but yes, they all have to
> be validated.
VC = 'validity constraint' - see XML-draft 1.4 and abbreviated as this in
later places. The point is that (say) in production 52 all IDs have to be
scanned for uniqueness. Therefore at this stage it could be useful to
hash them so that they could be extracted rapidly if they form part of a
later search, rather than going through the whole doc again.
It's no big deal - but since I found myself doing it for various
searches, it seemed worth thinking about in the API.
P.
-- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/