I think XML name tokens are better detected by exclusion not inclusion:
this
is a sensible way when you have to deal with lots of potential naming
characters. In other words, you detect the end of the name by the
presence of a sepchar or a delimiter, rather than by testing if each
character is a name character. At the reading end, such simple
token-detection is all that is needed if your document is well formed.
To stop silly tags, the SGML declaration should have ZWNJ character
(which I think has to do with cursive operation of arabic scripts,
and is as much required as accent characters) NAMECHAR not NAMESTRT.
So, in context, ZWNJ and RTL & LTR have visible effects. They are not
usually undetectable. But it is better to allow silly tags than
disallow native-language markup: only about 1/4 of the world can make
sense of English/Latin tags.
Apparantly the WG is waiting till August to finialise the naming
discipline.
Rick Jelliffe