ASCII control characters in XML

Steve Harris (sharris@primus.com)
Tue, 28 Apr 1998 09:21:58 -0700


Is it possible to transport UTF-8-encoded text that includes some
characters in the byte range x0000-x001F (ASCII control characters)?
These codes are valid within UTF-8 (via RFC2044), but the XML
specification clearly says that these codes do not constitute 'valid
characters'. My application that wraps Clark's "expat" dies upon
encountering codes in this range, citing well-formedness violations. I'm
looking for the proper method for transporting text that occasionally
includes these codes.
I've been RTFM'ing this for a while now, and I've found plenty of
archived discussion regarding raw binary data as PCDATA content, but
this seems closer to common text-processing problem. Any advice or
further interpretation would be greatly appreciated.

Steven E. Harris
Software Engineer
PRIMUS
1601 Fifth Avenue, Suite 1900
Seattle, Washington 98101
(206) 292-1001 x436