Three small announcements:
1. RFC 9839, a guide to which Unicode characters you should never use: https://www.rfc-editor.org/rfc/rfc9839.html
2. Blog piece with background and context, “RFC 9839 and Bad Unicode”: https://www.tbray.org/ongoing/When/202x/2025/08/14/RFC9839
3. A little Go library that implements 9839’s exclusion subsets: https://github.com/timbray/RFC9839

#Unicode

@timbray would be curious as to the rationale for the choice of the "problematic" terminology, as that adjective is famously considered to be so vague as to constitute a sort of "red flag" when deployed in discussions of online propriety. the precise distinction of "never useful text" and "can lead to misbehavior" seems like a useful one, although i'd argue that private-use characters should be included precisely because they can sometimes be valid, so are more likely to show up.

were any alternatives considered for terminology to designate such invalid text characters? "non-assignable" would seem to be much more specific with respect to the "unicode assignables" subset defined in the rfc document.