Well-formedness checker available

James Clark (jjc@jclark.com)
Sun, 30 Nov 1997 19:34:37 -0500

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: James Clark: "Test cases available"
Previous message: Markus Wodrich: "AW: Entities vs #PCDATA with msxml 1.6 ?"

I've enhanced my XML tokenizer to support multiple encodings and to
provide enough functionality that it can be used as the basis of high
performance full XML processors. As a proof of this, I've written a
well-formedness checker (xmlwf) on top of the tokenizer.

The main design goal was performance. On my portable (a 133Mhz Pentium
running Windows NT), it can check Jon's 3.7Mb ot.xml file in about
0.5sec (this compares to about 8sec for nsgmlsu and about 2sec for RXP
on the same system). It seems to be about 15% slower than the original
tokenizer. On the other hand, the size of the source and object code has
increased a lot. The source has also got a lot hairier.

The source code (in ANSI C) and Win32 binaries are available at:

ftp://ftp.jclark.com/pub/test/xmltok.zip

This is an alpha release. The only documentation is what you're reading
now.

To use the well-formedness checker, just give xmlwf one or more
filenames, and it will check that each one is a well-formed XML document
entity. There's a -g option which tells it to check instead that each
file is a well-formed XML external general text entity.

James

Next message: James Clark: "Test cases available"
Previous message: Markus Wodrich: "AW: Entities vs #PCDATA with msxml 1.6 ?"