This reminds me - are there good techniques for maintaining a byte
offset in conjunction with character-set translations? Ideally you
want the translation done in big blocks at a low level, but then how
do you access the byte offsets? In RXP/LTXML I keep the offset of the
start of the block (which is actually a line), and then (in the case
of UTF-8) effectively reverse-translate to calculate how much to add
(this relies on UTF-8 being invertible). Surely there must be a better
way...
-- Richard