This all depends on who "we" is taken to be.
A web indexing robot doesn't need to resolve tei pointers at all,
except to identify the remote document -- it then indexes the whole thing.
> In general we will wish to cache (index) IDs since
> we don't wish to rescan for another search.
I don't follow this. Under what circumstances is searching a document for
an ID much more painful than using a cache? Is this for 100 MByte documents?
(which do exist, by the way, droves. No, like elephants, in herds)
> When validating a document the IDs, GIs and ATTNAMEs all have to be scanned
> since they occur in VC's.
Not sure what a VC is (validatable context??) but yes, they all have to
be validated.
> Presumably as a by-product of validation we can
> at least expect a hashtable of IDs (and possibly GIs).
I think that should be application-specific.
You might provide a hash table interface to make it easier, though.
Lee