> The most common notation to use is Base64. You can find base 64 specified in
> an RFC.
>
> You can make a more efficient encoding by using all the available
> characters. There are sevearal thousand, so you might want to invent your
> own Base4K encoding, for example, if it was really a big problem.
I propose a compromise: what might be called Base-256 encoding.
To embed a stream of arbitrary octets into an XML document,
they should appear as the #PCDATA content of a suitable element.
Each octet from 0-255 is encoded using the Unicode Private
Zone characters U+F000-U+F0FF respectively. These characters are conveniently
located in the middle of the Private Zone.
Using this convention causes the data to be expanded by 2:1 in a UCS-2
representation, by 3:1 in a UTF-8 representation, and by 7:1 in a
numeric-character-reference representation. Therefore, it is suitable
only for relatively small amounts of octet data embedded in a basically
textual matrix.
-- John Cowan cowan@ccil.org e'osai ko sarji la lojban.