[NEW] AElfred: a small, fast XML Parser

David Megginson (ak117@freenet.carleton.ca)
Tue, 9 Dec 1997 19:18:44 -0500


Microstar Software Ltd. is happy to announce Ūlfred (AElfred), a
small, fast, DTD-aware Java-based XML parser, especially suitable for
use in Java applets.

We've designed Ūlfred for Java programmers who want to add XML support
to their applets and applications without doubling their size: Ūlfred
consists of only two class files, with a total size of approximately
24K, and requires very little memory to run. Ūlfred also implements
Java's java.lang.Runnable interface and a zero-argument constructor,
so it's easy to start Ūlfred as a separate thread or to adapt it for
use as a JavaBean.

Ūlfred is free for both commercial and non-commercial use, and COMES
WITH NO WARRANTEE. You can download a copy of version 1.0 (with
source code) from the following URL:

http://www.microstar.com/XML/index.htm

There is also an applet to let you try Ūlfred online in your own
browser before download it.

*****************
DESIGN PRINCIPLES
*****************

1. Ūlfred must be as small as possible, so that it doesn't add too
much to your applet's download time.

STATUS: Ūlfred is currently about 24K in total, and we're still
looking for ways to shrink it further.

2. Ūlfred must use as few class files as possible, to minimize the number
of HTTP connections necessary for applets.

STATUS: Ūlfred consists of only two class files, the main parser
class (XmlParser.class) and a small interface for your own program
to implement (XmlProcessor.class). All other classes in the
distribution are just demonstrations.

3. Ūlfred must be compatible with most or all Java implementations
and platforms.

STATUS: Ūlfred uses only JDK 1.0.2 features, and we have tested it
successfully with the following Java implementations: JDK 1.1.1
(Linux), jview (Windows NT), Netscape 4 (Linux and Windows NT),
Internet Explorer 3 (Windows NT), and Internet Explorer 4 (Windows
NT).

4. Ūlfred must use as little memory as possible, so that it does not take
away resources from the rest of your program.

STATUS: On a P75 Linux system, using JDK 1.1.1, running Ūlfred
(with a 4MB XML document) takes only 2MB more memory than running
a simple "Hello world" Java application. Because Ūlfred does not
build an in-memory parse tree, you can run it on very large input
files using little or no extra memory.

5. Ūlfred must run as fast as possible, so that it does not slow down
the rest of your program.

STATUS: On a P75 Linux system, using JDK 1.1.1 (without a JIT
compiler), Ūlfred parses XML test files at about 50K/second. On a
P166 NT workstation, using jview, Ūlfred parses XML test files at
about 1MB/second.

6. Ūlfred must produce correct output for well-formed and valid
documents, but need not reject every document that is not valid or
not well-formed.

STATUS: Ūlfred is DTD-aware, and handles all current XML features,
including CDATA and INCLUDE/IGNORE marked sections, internal and
external entities, proper whitespace treatment in element content,
and default attribute values. It will sometimes accept input that
is technically incorrect, however, without reporting an error (see
README), since full error reporting would make the parser much
larger.

7. Ūlfred must provide full internationalisation from the first release.

STATUS: Ūlfred supports Unicode to the fullest extent possible in
Java. It correctly handles XML documents encoded using UTF-8,
UTF-16, ISO-10646-UCS-2, ISO-10646-UCS-4 (as far as surrogates
allow), and ISO-8859-1 (ISO Latin 1/Windows). With these
character sets, Ūlfred can handle all of the world's major (and
most of its minor) languages.

***********************
ABOUT THE NAME "Ūlfred"
***********************

Ūlfred the Great (AElfred in ASCII) was king of Wessex, and at least
nominally of all England, at the time of his death in 899AD. Ūlfred
introduced a wide-spread literacy program in the hope that his people
would learn to read English, at least, if Latin was too difficult for
them. This Ūlfred hopes to bring another sort of literacy to Java,
using XML, at least, if full SGML is too difficult.

The initial "Ū" (AE ligature) is also a reminder that XML is not
limited to ASCII.

Enjoy!

David

---
David Megginson                 ak117@freenet.carleton.ca
Microstar Software Ltd.         dmeggins@microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)