I had a nightmare the other night. I dreamt that someone told me we don’t use alt tags for screen readers any more, that there was something new … but nobody would tell me what I could use now.
🗣️🎤📝
Speech to Text and Text to Speech on GNU/Linux
📝🔊💻
Why This Matters to Me (and Maybe You Too)
If you’re anything like me—a Linux user who counts on voice typing and TTS because of visual impairment—you know that accessibility is not a luxury, it’s a necessity. Speaking from experience as someone who depends on voice typing (and TTS) , the quest for a seamless, local, FLOSS speech-to-text (STT) setup on Linux can be frustrating.
Here’s how you can succeed with modern tools using Linux. FLOSS means freedom and privacy; working locally means real control.
Let’s dive in! I’ll tell you what I’ve learned and what I use—and hope you’ll share your favorite tools or tips!
System-Wide Voice Keyboard: Speak Directly in Any App
Want to speak and have your words typed wherever your cursor is—be it a terminal, browser, chat, or IDE? Here’s what actually works and how it feels day-to-day:
- Speak to AI (Offline, Whisper-based, global hotkeys)
This tool is my current go-to. It uses Whisper locally, lets you use global hotkeys (configurable) to type into any focused window, and doesn’t need internet. Runs smoothly on X11 and Wayland; just takes a bit of setup (AppImage available!).
GitHub Repo https://github.com/AshBuk/speak-to-ai) | Dev.to Post https://dev.to/ashbuk/i-built-an-offline-voice-typing-app-for-linux-speak-to-ai-3ab5)
- DIY: RealtimeSTT + PyAutoGUI
For the true tinkerers, RealtimeSTT plus a Python script lets you simulate keystrokes. You control every step, can lower latency with your tweaks, but you’ll need to be comfortable with scripting.
RealtimeSTT Guide https://github.com/KoljaB/RealtimeSTT#readme)
- Handy (Free/Libre, offline, Whisper-based, acts as a keyboard)
I’ve read lots of positive feedback on Handy—even though I haven’t tried it myself. The workflow is simple: press a hotkey, speak, and Handy pastes your text in the active app. It’s fully offline, works on X11 and Wayland, and gets strong accuracy thanks to Whisper.
Heads up: Handy lets you pick your own shortcut key, but it actually overrides the keyboard shortcut for start/stop recording. That means it can clash with other tools that depend on major shortcut combos—including Orca’s custom keybindings if you use a screen reader. If your workflow relies on certain shortcuts, this might need adjustment or careful planning before you commit.
GitHub Repo https://github.com/cjpais/Handy) | Demo https://handy.computer)
Real-Time Transcription in a Window (Copy/Paste Workflow)
If you’re okay with speaking into a dedicated app, then copying, these options offer great GUIs and power features:
- Speech Note by @mkiol https://mastodon.social/@mkiol
FLOSS, offline, multi-language GUI app—perfect for quick notes and batch transcription. Not a system-wide keyboard, but super easy to use and works on both desktops and Linux phones.
Flathub https://flathub.org/apps/net.mkiol.SpeechNote | LinuxPhoneApps https://linuxphoneapps.org/apps/net.mkiol.speechnote/)
- WhisperLive (by Collabora)
Real-time transcription in a terminal or window—great for meetings, lectures, and captions. Manual copy/paste required to get the text to other apps.
GitHub Repo https://github.com/collabora/WhisperLive)
More Tools for Tinkerers
If you like building your own or want extra control, check out:
- Vosk: Lightweight, lots of language support. GitHub https://alphacephei.com/vosk/)
- Kaldi: Powerful, best for custom setups. Website https://kaldi-asr.org/)
- Simon: Voice control automation. Website https://simon-listens.org/)
- voice2json: Phrase-level and command recognition. GitHub https://github.com/synesthesiam/voice2json)
Pro Tips
- Desktop Environment: X11 vs. Wayland affects how keyboard hooks and app focus actually operate.
- Ready-Made vs. DIY: If you want plug-and-play, try Speech Note or Handy first. Into automation or customization? RealtimeSTT is perfect.
- Follow the Community: @thorstenvoice offers tons of open-source voice tech insights.
Screen Reader Integration
Looking for robust screen reader support? Linux has you covered:
- Orca (GNOME/MATE): The most customizable GUI screen reader out there. The default voice (eSpeak) is robotic, but you can swap it for something better and fine-tune verbosity so it reads only what matters.
- Speakup: Console-based, ideal for terminal.
- Emacspeak: The solution for Emacs fans.
💡 Orca is part of my daily toolkit. It took time to get the settings just right (especially verbosity!) but it’s absolutely worth it. If you use a screen reader—what setup makes it bearable or even enjoyable for you?
Final Thoughts
If you’re starting from scratch, try Handy for direct typing (just watch those shortcuts if you use a screen reader!) or Speech Note for GUI-based transcription. Both are privacy-friendly, local, and accessible—ideal for everyday Linux use.
Is there a FLOSS gem missing here?
Sharing what works (and what doesn’t!) helps the entire community.
Resources:
Speech Note on Flathub https://flathub.org/apps/net.mkiol.SpeechNote
Handy GitHub https://github.com/cjpais/Handy
Speak to AI Guide https://dev.to/ashbuk/i-built-an-offline-voice-typing-app-for-linux-speak-to-ai-3ab5
RealtimeSTT https://github.com/KoljaB/RealtimeSTT
#Linux #SpeechToText #FLOSS #Accessibility #VoiceKeyboard #ScreenReader #Whisper #Handy #SpeechNote #OpenSource #Community #voicetyping #LocalSTT #TTStools #SpeechRecognition #A11y #Linuxtools #Voicekeyboard #Whisper #Handy #speech-to-text #SpeechNote #review #ScreenReaders #ORCA #FOSS
Question for the fediverse tech folk: work has asked me to learn Google’s Big Query. Does anyone know how #accessible it is with a #screenreader? And where is the best place to start for someone mostly familiar with small scale postgresql deployment? #bigquery
Question for the fediverse tech folk: work has asked me to learn Google’s Big Query. Does anyone know how #accessible it is with a #screenreader? And where is the best place to start for someone mostly familiar with small scale postgresql deployment? #bigquery
Question to screen reader users:
If I were to post IPA (International Phonetic Alphabet, not India Pale Ale), how does it handle it? Does it pronounce it, or does it produce something like "right-angle opening bracket, IPA symbol labio-dental fricative vee, IPA symbol velar plosive gee, right-angle closing bracket"?
🥔 Development diary: potato survival game:
What I did today:
• started to implement the inventory
• reworked UI in #Unity multiple times (again) 😭
• continued working with the #screenReader plugin and started to apply it to the inventory
That goes well:
• the screen reader plugin can go through the inventory item by item (something that Unity can't do out of the box on its own)
👇
🥔 Development diary: potato survival game:
What I did today:
• started to implement the inventory
• reworked UI in #Unity multiple times (again) 😭
• continued working with the #screenReader plugin and started to apply it to the inventory
That goes well:
• the screen reader plugin can go through the inventory item by item (something that Unity can't do out of the box on its own)
👇
Blind fediversians, which of these pages is more accessible?
-
Old version of Pandora's Tale Wiki (Character page):
https://pandorastale.miraheze.org/wiki/Characters -
New version of PTW (Character page):
https://pandorastale.wiki/Main/Characters.gmi -
(If you have a gemini client installed) New PTW Gemini Capsule (Character page):
gemini://pandorastale.wiki/Main/Characters.gmi
And here's another example:
-
Old version of Pandora's Tale Wiki (Chapter page):
https://pandorastale.miraheze.org/wiki/Chapters -
New version of PTW (Chapter page):
https://pandorastale.wiki/Main/Chapters.gmi -
(If you have a gemini client installed) New PTW Gemini Capsule (Chapter page):
gemini://pandorastale.wiki/Main/Chapters.gmi
#Accessibility #a11y #ScreenReader #PandorasTaleWiki #Geminispace
This is a perfect example of how nebulous #screenReader #accessibility is, and why it confuses so many laymen. I am looking at a web page right now that has a "Download" button. The button is an a (anchor) tag, with a div inside of it with the CSS class for a download button.
Obviously this is awful HTML, but it works fine, if you can see. There's a big fat button with "DOWNLOAD!" in all caps on the screen. Clicking this button starts the download. Seems good, no?
Well, no. This div has no actual textual content, and the anchor tag has no href or text either. So this huge honking button is entirely invisible to screen readers. How do I even begin to explain this to, say, a customer support rep? :)
Blind fediversians, which of these pages is more accessible?
-
Old version of Pandora's Tale Wiki (Character page):
https://pandorastale.miraheze.org/wiki/Characters -
New version of PTW (Character page):
https://pandorastale.wiki/Main/Characters.gmi -
(If you have a gemini client installed) New PTW Gemini Capsule (Character page):
gemini://pandorastale.wiki/Main/Characters.gmi
And here's another example:
-
Old version of Pandora's Tale Wiki (Chapter page):
https://pandorastale.miraheze.org/wiki/Chapters -
New version of PTW (Chapter page):
https://pandorastale.wiki/Main/Chapters.gmi -
(If you have a gemini client installed) New PTW Gemini Capsule (Chapter page):
gemini://pandorastale.wiki/Main/Chapters.gmi
#Accessibility #a11y #ScreenReader #PandorasTaleWiki #Geminispace
Hello Masto-peeps who use screen readers!
I just learned that I need to put alt text on URLs for more accessible PDFs, but -- what should it say?
I am formatting academic citations that include a URL, so all the information about where that link will take you is in the text. I don't want it to read the URL to you and I don't want to just repeat the same information you just heard. What do you find most helpful in this situation?
Pls boost for reach!
I mean, I don't want it to do those things unless you find them USEFUL!
And of course the Section 508 website is not helpful. Sigh.
Hello Masto-peeps who use screen readers!
I just learned that I need to put alt text on URLs for more accessible PDFs, but -- what should it say?
I am formatting academic citations that include a URL, so all the information about where that link will take you is in the text. I don't want it to read the URL to you and I don't want to just repeat the same information you just heard. What do you find most helpful in this situation?
Pls boost for reach!
I had the great honour to write an article for @piccalilli about creating accessible PDFs using free tools:
https://piccalil.li/blog/a-guide-to-creating-accessible-pdfs-using-free-tools/
Give it a read and let me know what you think. Thank you @belldotbz for having me! 🙏
#a11y #accessibility #pdf #LibreOffice #OpenSource #ScreenReader #JAWS #NVDA #VoiceOver #Axes4
I had the great honour to write an article for @piccalilli about creating accessible PDFs using free tools:
https://piccalil.li/blog/a-guide-to-creating-accessible-pdfs-using-free-tools/
Give it a read and let me know what you think. Thank you @belldotbz for having me! 🙏
#a11y #accessibility #pdf #LibreOffice #OpenSource #ScreenReader #JAWS #NVDA #VoiceOver #Axes4
Some good news for screenreaders on Wayland with the release of Niri 25.08:
Some good news for screenreaders on Wayland with the release of Niri 25.08:
OK #Mastodon. I've seen several toots on #accessibility for #screenreader users, however, I've not seen one from a screenreader user (as far as I know). I've used ZoomText, Outspoken, JAWS (AKA JFW), Supernova, NVDA (Windows), and VoiceOver (both on Macs and iPhone). I don't have experience with Windows Narrator or TalkBack. I would like to rectify and clarify a few small things.
First off, any awareness of accessibility issues, and endeavours to make things more accessible is great. Keep going!
But…
Blind/low-vision people have been using the internet as long as everyone else. We had to become used to the way people share things, and find workarounds or tell developers what we needed; this latter one has been the main drive to get us here and now. Over the past decade, screen readers have improved dramatically, including more tools, languages, and customisability. However, the basics were already firmly in place around 2000. Sadly, screen readers cost a lot of money at that time. Now, many are free; truly the biggest triumph for accessibility IMHO.
So, what you can do to help screen readers help their users is three simple things.
1. Write well: use punctuation, and avoid things like random capitalisation or * halfway through words.
2. Image description: screen readers with image recognition built-in will only provide a very short description, like: a plant, a painting, a person wearing a hat, etc. It can also deal with text included in the image, as long as the text isn't too creatively presented. So, by all means, go absolutely nuts with detail.
3. Hashtags: this is the most commonly boosted topic I've seen here, so #ThisIsWhatAnAccessibleHashtagLooksLike. The capitalisation ensures it's read correctly, and for some long hashtags without caps, I've known screen readers to give up and just start spelling the whole damn thing out, which is slow and painful.
That's really all. Thanks for reading! 😘
Here's another one for the #blind#hiveMind primarily, but feel free to chime in if you have a good idea.
For an upcoming video/stream, I am looking into video editing as a #screenReader user. I'd like to cover both #mac and #windows and am curious what solutions have worked well for people in the past.
Conditions are no mouse usage at all, app needs to be screenreader-compatible, ideally somewhat full-featured both for #audio and #video.
Be it #iMovie, #Clipchamp, #quicktime, #reaper, a dreaded #AI tool, tell me thy success stories, thy struggles and thy findings! 😊 #accessibility