I've done some work in this area adapting our Conceptual Indexing
framework to index XML documents. I have a paper on this work
almost ready. If you'll be in Seattle (XML conf) we can talk about it.
The interesting fact about Conceptual Indexing is the expressive
richness of its index database, which lets you store both offsets
of various meaningful chunks of the indexed documents, and relations
(structural and semantic) between these chunks.
I use a Java XML parser and a set of small Java objects
(one per element type) which know how to render each encountered
element into structures appropriate for the index, consistent
w/ the designed conceptualization of a given document type.
This has been a cool work, and as always, more remains to be done :-)
--Jacek