Right but Len's question was about having a "schema of *some kind*". The
closer your schema is to explicitly recognizing the information you want
to discover, the easier it is to discover the information. If you have
no schema then you are Very Far Away from that goal.
> This is precisely why text retrieval is so hard -- the "schema" that all
> documents are written in is a human written language, and nobody knows how
> to machine-process that. You can chunk it up all you like into logical
> blocks, but you're always going to be missing certain substantive
> information relating to the text.
Certainly, but those who actually do this processing still chunk it up
into the logical blocks because according to some schema, because that
is the way to get closest to achieving the goal. So in answer to Len's
question I still say that having a schema is better than not having one,
despite the fact that having the schema does not "solve" the problem. It
gets you closer to solving the problem.
Paul Prescod