I would use numeric character references wherever XML allows them; if
there are non-ASCII characters in places where numeric character
references aren't allowed I would use UTF-8 and give a warning to the
user. The ASCII characters will still be there as ASCII, and the
non-ASCII characters won't get lost, although they will look a bit funny
in an 8-bit editor. An interesting case is when there are non-ASCII
characters in places where numeric character references are not
recognized but do not cause an error (eg PIs, comments); one could have
an application convention that recognizes numeric character references
in these cases.
> 2. Rename all the offending elements and attributes, and use PIs to
> ensure that when they're read back in we can patch things up.
> So, for example, the file could contain:
>
> <?GoodCitizen MangledGI Strae1="Straße"?>
> <Strae1>foo bar</Strae1>
>
> Advantages: It's fully compliant.
If I was going to do this sort of thing, I think I would use a variation
on URL % encoding. I would have a convention that underscore (say)
followed by 4 hex digits represented the Unicode character with that hex
code.
James