RE: XML Java API - An Idea(*)

Chris Lloyd (clloyd@gorge.net)
Sat, 21 Jun 1997 10:57:33 -0400


-----Original Message-----
From: Peter Murray-Rust [SMTP:Peter@ursus.demon.co.uk]
Sent: Saturday, June 21, 1997 11:31 AM
To: xml-dev@ic.ac.uk
Subject: Re: XML Java API Standardization

In message <199706211310.JAA17653@smtp2.erols.com> "Peat" writes:
> If the document is very large, and the parser is required to maintain =
the
> grove, we would then require the parser to also then include some type =
of
> defined memory management. Can this be a problem, where different =
parsers
> implement resource management differently?

Memory management issues shouldn't be an issue in the API =
standardization. If you are using a parser that cannot serialize the =
tree, then you are certainly going to be limited by memory. If you are =
using an object database to implement the grove, then you don't have =
size limitations but speed may become an issue.

This is an important point and one which I've been conscious of but =
ignored so
far. JUMBO is quite large (with all the MOL classes in there's about =
half a=20
megabyte of classes and I have had outOfmem failures with large files =
(ca.
1 Mbyte legacy input and translation into a tree). I don't know whether =
there=20
is a generic solution to this. I tried to run the garbage collector =
(JDK1.02)
occasionally and this helps, but since parser and browser and document =
all have
to be in memory then large docs are a problem.

Presumably in an application subtrees can be saved to disk (serialized?)
>=20
> I would think if this burden is on the application layer, then =
knowledge of
> the application can be used to optimize resources.

I would think that if the author uses entities, then knowledge of the =
entity
structure would help. In the browser the entities could be treated as=20
'pointers' and resolved only when required.

Yes this is how other groves have been implemented

>=20
> Grove standardization is a good idea. Any ideas on how the grove
> standardization can be implemented up one layer?
^^ ??? ^^^

I'm just entering this thread so I don't know what solutions have been =
discussed. There is already an API to draw from in the DSSSL spec and a =
definition of the SGML property set which gives us a common language to =
work from. The problem is that an XML API to a grove should be simple =
with a small interface and should leverage the object-oriented power and =
syntax of Java.

Personally, when working with groves I find some abstractions very =
useful in an API. I would rather have an API based on iterators than one =
based on a set of navigation function calls. I'm talking about =
navigating the grove rather than building the grove. An iterator API =
would be extremely simple, well abstracted and more inline with patterns =
of C++ and Java programming than the SDQL API found in DSSSL. They could =
also maintain an adherence to the syntax of the SGML property set.

Here is an example although my naming syntax probably does not =
correspond to the SGML property set here.

// Assuming we have a object provided by the parser that is a grove, =
instantiate an iterator and navigate to the first element that is a =
TITLE tag

// A Factory is an object that defines what SGML/XML constructs the =
iterator knows how to iterate. It provides the grove iterator with a =
different node iterator for each property node that it knows how to =
walk.

ForwardGroveIterator XMLIter(OurGrove, XMLPropertySetFactory(), =
StartNodePropertyHandle);

While(XMLIter++ !=3D XMLIter.end())
{
XMLBaseProperty Prop =3D XMLIter.Object(); // in C++ we would use the =
dereference operator like this XMLBaseProperty Prop =3D *XMLIter;
If (Prop.GetClass() =3D=3D Element.Class) // is this an element?
{
Element aElement =3D Prop; // lets convert the property from a base =
class object to it's concrete class=20
// Now we have an element object and can call all it's member functions
if (Element.GetIdent() =3D=3D String("TITLE"))
break;
}
}

// OK lets instantiate a new iterator to walk back up to the root of the =
grove
// use the copy constructor to produce a reverse iterator from our =
x and functions of individual properties in the grove. Hence we can use =
the SGML property set or another property set with the same code.
6.) Iterators work well in different memory models and garbage =
collection schemes.
7.) Iterators, Factories, and Algorithmns can be combined in very =
powerful and flexible ways.
8.) Finally, Iterators are fun!!

Chris Lloyd
clloyd@gorge.net

Again, I reiterate that I'd like to see something concrete in a few days =
and
not to lose the momentum again. =20

P.

--=20
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/