Re: Data warehousing and XML

len bullard (cbullard@hiwaay.net)
Tue, 02 Dec 1997 18:28:05 -0600


Paul Prescod wrote:

> I don't doubt that there are some people in the world who want to "mine"
> documents, but I think that they are in the minority, and will be for a
> long time. But more important, it makes little sense to me to "mine" XML
> data. Even if you wanted to mine your structured document data it will
> almost always make sense to load that into the mining tool's internal
> data structures.

Umm.. that actually was one of the often requested capabilities
when I was still working on SGML systems. The problem was
precisely that a great deal of the *interesting* information
was not in relational databases. Comparative policy analysis,
for example.

> Once again, XML is great as the transfer format, but when you get down
> to doing your queries, your data mining software should not be parsing
> the XML syntax.

Ok. Hmm? Well, what were the various proposals over the
years for SGML querying systems for?

> > However, let me ask a technical
> > question that you can probably answer with a deeper
> > technical perspective than mine? How well can one query
> > data (or convert it for that matter) for which one
> > has no rigorous schema (of some kind)?
>
> In some cases you can do sophisticated queries on data without a schema,
> but you would have to jump through AI hoops. It's not a job I would
> apply for, but neural net experts may be able to detect structure in the
> chaos. But building the schema first is definately cheaper than trying
> to divine the structure later.

That is what I thought to be the case. I remember when we
were doing the GE CASS system we bounced around the idea
of using DTDs as sort of a reversed query, that is, it
gave us a way to figure out what kinds of queries should
be interesting. We never pursued the idea because the
SGML systems of that time were fairly primitive.

len