so recall this classic text of mine https://codeberg.org/cosmicexplorer/corporeal
Consider the following examples:
"THE TRAGEDY OF HAMLET, PRINCE OF DENMARK" = 40 unicode chars, 40 bytes"Que trata de la condición" = 25 unicode chars, 26 bytes"ЧАСТЬ ПЕРВАЯ" = 12 unicode chars, 23 bytes"源氏物語" = 4 unicode chars, 12 bytes
the reason for hamlet was because i fucking love hamlet and because it begins a tragedy. the rest are all also in the repo. i absolutely did just scan for shit that looks wack all in a row like that. the kanji were obviously chosen because they are the least likely to scan like sounds to us english speakers. so we begin with the sound and fury. i leave you with a noise you cannot speak
As we can see from the above examples, the further one gets from the latinate caricature of US english, the more space is taken to represent it.
i admit "latinate caricature" just sounds silly. but the romans were not silly. i think i hadn't yet decided upon the IETF being richard nixon so it's funny that i noticed ASCII and made some basic assumptions