Re: parser for xml-data?

Ron Bourret (rbourret@dvs1.informatik.tu-darmstadt.de)
Fri, 8 May 1998 11:34:04 +0200


Sudarshan Purohit wrote:

> I think this is a good place to talk of the project I'm doing =
right now.
> Basically, it's a translator between an XML-Data document and any
> ODBC interface. This involves :
> 1. Taking an input XML-Data file and outputting a series of SQL
> statements that result in the same information being stored in an =
RDBMS.

I'm working on a similar project (moving data from XML to relational =
databases=20
and vice versa).

Do you really mean you are moving data from an XML-Data document? =
XML-Data is=20
an XML language for describing metadata, including DTDs. As such, the =
data in=20
an XML-Data file is metadata, not data. This is suitable if you want to =
create=20
tables in the RDBMS that match the XML-Data schema. However, an =
XML-Data file=20
won't give you any data to put in those tables.

By the way, there was a question the other day about how one might =
actually=20
deploy XML-Data as a substitute for a DTD. Was this ever answered?

> 2. Using an ODBC Recordset interface to get an equivalent XML-Data
> file. ( I'm still working out this one)
> The first part involves parsing the XML file, so I might be able =
to help=20
out
> somewhere, Yosr.
>=20
> While we're on the topic, I'd like to ask a few things.
> 1. XML-Data stores Data as "entities". This implies that we've got
> a more abstract representaion of data than what would have been
> possible through relational databases. For example, we've got this
> "ONEORMORE,ZEROORMORE" occurence attribute for child elements.
> Implementing this in an RDBMS would mean creating another linked
> table for that data item. The same goes for complex data types.
> Am I on the right track?

You are correct. XML describes an object model, so the problem is =
roughly the=20
same as moving object data to and from an RDBMS -- you might want to =
look at=20
some papers on the subject. There is an excellent white paper =
introducing the=20
subject at:

http://www.ontos.com/mapcon.htm
=20
Another good paper is:

Shekar Ramanathan, Julia E. Hodges: Extraction of Object-Oriented =
Structures=20
from Existing Relational Databases, SIGMOD Record, Vol. 26, Number 1, =
March=20
1997, pp. 59-64.
=20
Postscript: =
http://bunny.cs.uiuc.edu/sigmod/sigmod_record/9703/rama.ps

The only major difference I have found so far for XML is that the =
elements in=20
XML documents are ordered, while data members in OO programming =
languages are=20
not.

>=20
> 2. If we're creating XML specifically for this purpose, it'll have to
> be in the format
>=20
> <elementType id=3D"Name">
> ....</elementType>
>=20
> <elementType id=3D"SerialNo">
> ....</elementType>
>=20
> <elementType id=3D"Tuple">
> <!--contains Name and Serial No: as children, occuring once.
> Also all key declarations, constarints, .... of teh table as such-->
> .....
> </elementType>
>=20
> <elementType id=3D"TheRealTable">
> <element type=3D"#Tuple" occurs=3D"ONEORMORE">
> </elementType>
>=20
> This restrictive structure, of course, means that you end up
> not using this kind of document for anything else. Am I right
> in thinking that this has to be done?

Again, this makes sense if you are storing schema information, but I'm =
not sure=20
that that is what you want. The problem is one of creating a mapping =
from the=20
DTD (expressed as a DTD, XML-Data, or another competing schema language) =
to=20
tables/columns and then using that mapping to move data from an XML file =
(which=20
uses that DTD) to the RDBMS. You can certainly do this for a large =
number of=20
DTDs -- I'm not yet convinced it is possible for all.

>=20
> 3. The XML-Data specification says that it's open-ended
> and new types of entities may be added. I'm assuming this means
> I'm able to add lines like
>=20
> <level lev=3D"COLUMN">
> or <level lev=3D"TUPLE">
> into my element delcarations, which will be ignored by other
> parsers than mine. Am I right ?
>=20
> I came rather late into the discussion, so it's possible that
> this stuff may have been figured out already. If so, tell me.
>=20
> Thanks,
>=20
> Sudarshan 1030hrs.