I use about 8 event handlers for most of my API's...
>As much as possible, a good reusable component should not force the
>user's hand when choosing what node to grab onto. As an example,
>YACC is pretty bad about this. You supply it with a lexer (with a
>fixed name) and a set of handlers to be called when productions are
>reduced. The YACC-generated parser insists on being in charge.
Sure. The important thing with is that if you want to query into
a document, you have to have parsed at least as far as the nodes you
want to access, and that haveing a tree representation for such cases
makes it a *lot* easier. For cases where you "want to be in control",
I would have the event handler be a grove constructor, and have the
application work upon the grove. Note that accessing a grove, or
querying a document is *different* to *parsing* a document.
>1. An external entity manager, responsible for obtaining document
> instances (the "start" document and others), DTD's, etc. from
> local storage, the web, some database, etc. This should probably
> be user-customizable.
I'm not sure about this. In some ways, I cannot see the reason for
*exposing* an entity manager, but then again, I can imagine an
implementation without one either....
>2. An encoding manager, responsible for mapping one of the possible
> XML document encodings (Latin-n, UTF-7, UTF-8, UCS-2, UTF-16, whatever)
> onto ISO10646 characters.
Streams...
>3. The parser itself, responsible for turning characters into XML events,
> and possibly into grove structures.
Push grove building off to later stages.
>[Browser] gives the most complicated parser, since it has to asynchronously
>handle information from several different documents.
>
>[YACC] is the easiest to write, but it's less flexible. Given [Browser],
>it's easy to write [YACC]. (Given [XMLEventStream] you can also derive
>[YACC], but with greater overhead.)
>
>[XMLEventStream] and [Grove] give you the most flexibility with respect to
>the grove plan.
I think these confluge many different processing layers.
>languages, but the only firm conclusion I've come to is that I really wish
>I could use coroutines.
Amen to that sentiment.