Re: PR.xml

Peter Murray-Rust (peter@ursus.demon.co.uk)
Fri, 16 Jan 1998 16:08:43


At 10:43 16/01/98 -0500, David Megginson wrote:
>Peter Murray-Rust writes:
>
>The XML source for the PR is encoded in ISO-8859-1 but has no encoding
>declaration (so AElfred assumes UTF-8, and reports an encoding error,
>though not very helpfully, when it finds an invalid UTF-8 sequence).
>The WG is aware of the problem.

Thanks. I am also aware of it now :-). Can I make the assumption that:

- ISO-8859-1 and UTF-8 look identical to not-very-experienced humans.
- in principle I should be able to sort this by adding something like

<?xml version="1.0" encoding="ISO-8859-1"?>
to the top of the document

- in practice this fails because by the time it gets to the encoding
declaration it has already assumed the encoding is UTF-8 and has crashed :-)

I am not quite clear why we need this problem. Do different tools emit
different encodings? If so, what should I work with?. Can I convert this
document?

I know there has been lots of important discussions about encodings (which
I have not always read very carefully), so an authoritative statement from
a WG member would help at least one human :-)

P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg