Re: xml parser

Tim Bray (tbray@textuality.com)
Wed, 04 Nov 1998 08:22:37 -0800


At 10:55 AM 11/4/98 -0000, Michael Kay wrote:
>My immediate answer to this is yes, all the information you need for a
>search engine is available via the SAX or DOM interface offered by many
>parsers.

I disagree. Few parsers track byte offsets or other locational info in
the file, and I think you need that to do basic things like proximity
and phrase search.

>Of course you don't need to build your own search engine either, all you
>need to do is write an XML filter for an existing search engine. I'm
>surprised no-one seems to have done this yet.

I think you do need to build your own engine. Reason is, most existing
search engines have an atomic-document view of the world, and break
down completely when asked to model a general recursive hierarchical
structure like XML. -Tim