@neil Yes and no. I use transcription models reasonably often - especially for generating meeting, podcast & video transcripts.
I struggle to follow complex information from speech and find it substantially easier as text, but a lot of content is only posted as YouTube videos or similar these days. So I end up using yt-dlp to grab the video, and whisper to transcribe it. I’m lucky enough to have the resources and know how to do this locally, but not everyone does.