This #Unicode technical report (tr58) on non-ASCII characters in urls and email addresses might be relevant for ActivityPub implementations
This #Unicode technical report (tr58) on non-ASCII characters in urls and email addresses might be relevant for ActivityPub implementations
Internationalise The Fediverse
https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/
We live in the future now. It is OK to use Unicode everywhere.
It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters at all!!!
A decade ago, I was miffed that GitHub only supported some ASCII characters in its project names. There's no technical reason why your repo can't be called "ഹലോ വേൾഡ്".
Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) doesn't allow Unicode usernames and has resisted efforts to change.
So I built a small ActivityPub server which publishes content from an Actor called @你好@i18n.viii.fi - it is only a demo account, but it works!
Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it. Take a look at the replies on Mastodon to see which services work. You can also see some of its posts on the Fediverse.
What Does The Fox Spec Say?
The ActivityPub specification says:
Building an international base of users is important in a federated network.Internationalization
I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.
The user's @ name is defined by preferredUsername which is:
A short username which may be used to refer to the actor, with no uniqueness guarantees. 4.1 Actor objects
There's nothing in there about what scripts it can contain. However, later on, the spec says:
Properties containing natural language values, such as
name,preferredUsername, orsummary, make use of natural language support defined in ActivityStreams.4. Actors
So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.
The ActivityStreams specification talks about language mapping.
Finally, the ActivityPub specification has some examples on non-Latin text in names.
So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.
But What About...?
There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.
What about homograph attacks?
Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.
What if people make names that can't be typed?
Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug. But, anyway, clients can let users search for other people, or copy and paste their names.
What about weird "Zalgo" text?
It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.
What about bi-directional text?
The spec makes clear this is allowed.
Do people even want a username in their own script?
I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of @😉
What's Next?
If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.
If your software can see @你好@i18n.viii.fi and its posts, please let me know.
Off-label uses of pandoc: conversion between text encodings.
E.g., UTF-8 to UTF-16:
echo 'X' | pandoc lua -e 'io.write(pandoc.text.toencoding(io.read"a", "utf-16"))'
Other direction:
echo 'X' | pandoc lua -e 'io.write(pandoc.text.fromencoding(io.read"a", "utf-16"))'
The set of supported encodings is platform dependent, but always includes UTF-8, UTF-16, UTF-32, and latin1.
Off-label uses of pandoc: conversion between text encodings.
E.g., UTF-8 to UTF-16:
echo 'X' | pandoc lua -e 'io.write(pandoc.text.toencoding(io.read"a", "utf-16"))'
Other direction:
echo 'X' | pandoc lua -e 'io.write(pandoc.text.fromencoding(io.read"a", "utf-16"))'
The set of supported encodings is platform dependent, but always includes UTF-8, UTF-16, UTF-32, and latin1.
No matter how badly you screw up at work, at least your silly mistake won't be absolutely IMMORTALIZED in the #Unicode specification. Unless you work at Unicode, in which case, good luck. We're all counting on you.
A small collection of text-only websites
https://shkspr.mobi/blog/2025/12/a-small-collection-of-text-only-websites/A couple of years ago, I started serving my blog posts as plain text. Add .txt to the end of any URl and get a deliciously lo-fi, UTF-8, mono[chrome|space] alternative.
Here's this post in plain text - https://shkspr.mobi/blog/2025/12/a-small-collection-of-text-only-websites.txt
Obviously a webpage without links is like a fish without a bicycle, but the joy of the web is that there are no gatekeepers. People can try new concepts and, if enough people join in, it becomes normal. I'm not saying the plain-text is the best web experience. But it is an experience. Perfect if you like your browsing fast, simple, and readable. There are no cookie banners, pop-ups, permission prompts, autoplaying videos, or garish colour schemes.
I'm certainly not the first person to do this, so I thought it might be fun to gather a list of websites which you browse in text-only mode. If you know of any more - including your own site - please drop a comment in the box!
- Terence Eden's blog - add
.txtto any URl. - Daring Fireball - add
.textto any URl. - Zach Flowers - replace
.htmlwith.txt. - Fabien Benetou's PIM - add
?action=sourceto any URl. - M0YNG - add
.txtto any URl. - Gwern - add
.mdto any URl or send an HTTP Accept for Markdown. - Dan Q's textplain.blog - the entire blog is plain text!
- Matt Hobbs - there is a feed of plaintext which allows you to read recent posts.
If you'd like to add a site, please get in touch. The rules are simple - content which has the MIME type of text/plain. No HTML, no multimedia, no RTF, no XML, no ANSI colour escape sequences.
Emoji are fine though; emoji are cool.
#blogging #blogs #text #unicode #utf8🆕 blog! “A small collection of text-only websites”
A couple of years ago, I started serving my blog posts as plain text. Add .txt to the end of any URl and get a deliciously lo-fi, UTF-8, mono[chrome|space] alternative.
Here's this post in plain text - https://shkspr.mobi/blog/2025/12/a-small-collection-of-text-only-websites.txt
Obviously a webpage…
👀 Read more: https://shkspr.mobi/blog/2025/12/a-small-collection-of-text-only-websites/
⸻
#blogging #blogs #text #unicode #utf-8
🆕 blog! “A small collection of text-only websites”
A couple of years ago, I started serving my blog posts as plain text. Add .txt to the end of any URl and get a deliciously lo-fi, UTF-8, mono[chrome|space] alternative.
Here's this post in plain text - https://shkspr.mobi/blog/2025/12/a-small-collection-of-text-only-websites.txt
Obviously a webpage…
👀 Read more: https://shkspr.mobi/blog/2025/12/a-small-collection-of-text-only-websites/
⸻
#blogging #blogs #text #unicode #utf-8
A small collection of text-only websites
https://shkspr.mobi/blog/2025/12/a-small-collection-of-text-only-websites/A couple of years ago, I started serving my blog posts as plain text. Add .txt to the end of any URl and get a deliciously lo-fi, UTF-8, mono[chrome|space] alternative.
Here's this post in plain text - https://shkspr.mobi/blog/2025/12/a-small-collection-of-text-only-websites.txt
Obviously a webpage without links is like a fish without a bicycle, but the joy of the web is that there are no gatekeepers. People can try new concepts and, if enough people join in, it becomes normal. I'm not saying the plain-text is the best web experience. But it is an experience. Perfect if you like your browsing fast, simple, and readable. There are no cookie banners, pop-ups, permission prompts, autoplaying videos, or garish colour schemes.
I'm certainly not the first person to do this, so I thought it might be fun to gather a list of websites which you browse in text-only mode. If you know of any more - including your own site - please drop a comment in the box!
- Terence Eden's blog - add
.txtto any URl. - Daring Fireball - add
.textto any URl. - Zach Flowers - replace
.htmlwith.txt. - Fabien Benetou's PIM - add
?action=sourceto any URl. - M0YNG - add
.txtto any URl. - Gwern - add
.mdto any URl or send an HTTP Accept for Markdown. - Dan Q's textplain.blog - the entire blog is plain text!
- Matt Hobbs - there is a feed of plaintext which allows you to read recent posts.
If you'd like to add a site, please get in touch. The rules are simple - content which has the MIME type of text/plain. No HTML, no multimedia, no RTF, no XML, no ANSI colour escape sequences.
Emoji are fine though; emoji are cool.
#blogging #blogs #text #unicode #utf8One clean, developer-focused page for every Unicode symbol
https://fontgenerator.design/symbols
#HackerNews #Unicode #Symbols #Developer #Tools #Design #FontGenerator
One clean, developer-focused page for every Unicode symbol
https://fontgenerator.design/symbols
#HackerNews #Unicode #Symbols #Developer #Tools #Design #FontGenerator
