Istvan
> ----------
> From: Chris Olds[SMTP:colds@nwlink.com]
> Reply To: Chris Olds
> Sent: Wednesday, September 03, 1997 4:54 PM
> To: xml-dev@ic.ac.uk
> Cc: 'Tim Bray'; Istvan Cseri
> Subject: Re: Character classification
>
> How are people dealing with UTF-8 vs. unicode vs. Latin-1? I have
> been
> working on a lexer (using Flex) that assumes the input stream is
> either
> Latin-1 or UTF-8 and returns byte strings to the caller. Since Java
> chars are Unicode, I assume that the Java XML parsers are doing the
> opposite, right? Is there any consensus on what form PCDATA or GI
> names
> should take when they are returned to the application? On a related
> note, when do character entities get replaced - in the lexer or later
> on? My reading of the draft is that the scanner must do the
> replacement
> if the examples of rescanning are to work.
>
> /cco
>
> Chris Olds colds@nwlink.com
>
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
>