Update on that #rust #tts #grpc service. TTS is far more complicated than I imagined even using #ai (machine learning) models. I assumed I'd have to process the text, for the model, but it's coming up that I need more processing than expected.
1. Split it up into sentences
2. Pass it through a phonemizer (phonetic/sound versions of the text)
3. Process the phonemes for the model
4. Run the model to actually generate the speech
I'm gonna have to write a blog post about this when I get done
Update on that #rust #tts #grpc service. TTS is far more complicated than I imagined even using #ai (machine learning) models. I assumed I'd have to process the text, for the model, but it's coming up that I need more processing than expected.
1. Split it up into sentences
2. Pass it through a phonemizer (phonetic/sound versions of the text)
3. Process the phonemes for the model
4. Run the model to actually generate the speech
I'm gonna have to write a blog post about this when I get done
I've started a new #rust project that may or may not end up seeing use at work. I'm trying to do it on my own as #foss if possible.
Idea is to have a simple containerized service that accepts text and streams back the audio using #tts models like KittenNanoTTS.
Anyone have specific rust advice on using grpc to stream, or on consuming models in rust?
I've started a new #rust project that may or may not end up seeing use at work. I'm trying to do it on my own as #foss if possible.
Idea is to have a simple containerized service that accepts text and streams back the audio using #tts models like KittenNanoTTS.
Anyone have specific rust advice on using grpc to stream, or on consuming models in rust?
Handy, https://handy.computer/.
A free, open source, and extensible speech-to-text application that works completely offline.
> Handy is a cross-platform desktop application built with Tauri (Rust + React/TypeScript) that provides simple, privacy-focused speech transcription. Press a shortcut, speak, and have your words appear in any text field—all without sending your voice to the cloud.
Handy, https://handy.computer/.
A free, open source, and extensible speech-to-text application that works completely offline.
> Handy is a cross-platform desktop application built with Tauri (Rust + React/TypeScript) that provides simple, privacy-focused speech transcription. Press a shortcut, speak, and have your words appear in any text field—all without sending your voice to the cloud.
I've been meaning to share this for a while, but for any Android users out who want to use a text-to-speech engine other than Google's, I recommend Sherpa TTS: https://github.com/woheller69/ttsEngine
It's open source, offline, multilingual, and available on F-Droid.
I use text-to-speech a lot with my RSS readers (Feeder and ReadYou), and it's been a game-changer.
🧵
#texttospeech #tts #opensource #fdroid #rss #Feeder #ReadYou #degoogled #candidefindings
I've been meaning to share this for a while, but for any Android users out who want to use a text-to-speech engine other than Google's, I recommend Sherpa TTS: https://github.com/woheller69/ttsEngine
It's open source, offline, multilingual, and available on F-Droid.
I use text-to-speech a lot with my RSS readers (Feeder and ReadYou), and it's been a game-changer.
🧵
#texttospeech #tts #opensource #fdroid #rss #Feeder #ReadYou #degoogled #candidefindings
At least once a week, like clockwork, I see a post warning against abusing text characters from outside the standard alphabet (i.e maths notation, superscript) in posts. Some people use them for style reasons, especially places in that only support unformatted plaintext - like mastodon.
'Screen readers don't like them' is the basic reason to avoid using these characters. A screen reader encountering a username made of extreme 'zalgo' text might keep a text-to-speech unit busy for an hour as it explicitly describes every modifier and ligature.
But I have to wonder: are screen readers really so basic? Surely this kind of nonsense is so prevalent that a #TTS focused mastodon tool knows not to render usernames that use characters from the greek/math italic set? Or upon encountering a post filled with ascii art or obscure unicode characters that will take a long time to read, triggers some kind of heuristic to compress or otherwise skip? I understand that in the context of an maths article a sigma character represents 'summation', but when I see a capital 'Σ' at the start of a display name I instead understand it is either a greek user or just someone that wanted a 'fancy looking capital E'.
In an inclusive space we should prioritise accessibility to everyone, and if that means avoiding Fraktur unicode, fine. But I am wondering if #blind TTS users are really using software that has not adapted to the way other users interact with social media. Are there tools that handle Emokid2008s insistence on posting lyrics in bold italic unicode characters? Is GothicPrincess99's username a five minute description job because of the blackletter text she chose?
Hopefully my lack of knowledge in this area isn't too offensive. I just want to understand if the TTS social media experience is really so susceptible to non standard characters and more importantly: why is this the case??
(cc @GoemonIshikawa )
At least once a week, like clockwork, I see a post warning against abusing text characters from outside the standard alphabet (i.e maths notation, superscript) in posts. Some people use them for style reasons, especially places in that only support unformatted plaintext - like mastodon.
'Screen readers don't like them' is the basic reason to avoid using these characters. A screen reader encountering a username made of extreme 'zalgo' text might keep a text-to-speech unit busy for an hour as it explicitly describes every modifier and ligature.
But I have to wonder: are screen readers really so basic? Surely this kind of nonsense is so prevalent that a #TTS focused mastodon tool knows not to render usernames that use characters from the greek/math italic set? Or upon encountering a post filled with ascii art or obscure unicode characters that will take a long time to read, triggers some kind of heuristic to compress or otherwise skip? I understand that in the context of an maths article a sigma character represents 'summation', but when I see a capital 'Σ' at the start of a display name I instead understand it is either a greek user or just someone that wanted a 'fancy looking capital E'.
In an inclusive space we should prioritise accessibility to everyone, and if that means avoiding Fraktur unicode, fine. But I am wondering if #blind TTS users are really using software that has not adapted to the way other users interact with social media. Are there tools that handle Emokid2008s insistence on posting lyrics in bold italic unicode characters? Is GothicPrincess99's username a five minute description job because of the blackletter text she chose?
Hopefully my lack of knowledge in this area isn't too offensive. I just want to understand if the TTS social media experience is really so susceptible to non standard characters and more importantly: why is this the case??
(cc @GoemonIshikawa )