Post by @wlaatje@social.edu.nl

Is my assumption correct that, for transcription, it is generally better to use an ASR speech-to-text model rather than an LLM that tries to “reason” about the words or sentences and may take unwanted liberties with them?

I would rather have a word be poorly recognized than have the model silently reinterpret, rewrite, or infer something that was not actually said. Or is that view too simplistic?

#yapsnap (#transcription by url, on the CPU)
https://github.com/kouhxp/yapsnap

GitHub

GitHub - kouhxp/yapsnap: Snap any video URL or audio file into plaintext. No GPU. No cloud. One command.

Snap any video URL or audio file into plaintext. No GPU. No cloud. One command. - kouhxp/yapsnap

#yapsnap #transcription

Joep Bos-Coenraad

@joepbc@mastodon.social · 1 hour ago

@wlaatje hear hear! Specialistische techniek (ook AI) is bijna altijd te verkiezen boven general AI.

Home Login