> imagine a plain text file which I want to markup using XML. Now it could be
> that there are characters in this file whose ASCII value is greater than
> 127 (in PCDATA sections).
>
> If I try to use expat on the generated XML file, it tells me that it is
> not wellformed at the position where such a character occurs. Does the
> XML spec say anything about not permitting characters with high ASCII
> values? If so, where?
Expat, like a proper XML parser, is assuming the UTF-8 charset.
You need to specify Latin-1 or whatever you are using.
> I guess, to correctly interpret and display those characters I have to
> know the character set which was used to encode the original text file.
> How can I communicate this character set to an XML parser?
Put "<?xml encoding="8859-1" ?>" as the very first line.
-- John Cowan http://www.ccil.org/~cowan cowan@ccil.org You tollerday donsk? N. You tolkatiff scowegian? Nn. You spigotty anglease? Nnn. You phonio saxo? Nnnn. Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)