Discussion
Loading...

Post

Log in
  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Simon Willison
Simon Willison
@simon@fedi.simonwillison.net  ·  activity timestamp 9 hours ago

Two new speech-to-text models (similar to Whisper) from Mistral today - one of them is API-only, the other is a 8.9GB Apache-2.0 licensed open weights model for "realtime" transcription. They're both very good! https://simonwillison.net/2026/Feb/4/voxtral-2/

Simon Willison’s Weblog

Voxtral transcribes at the speed of sound

Mistral just released Voxtral Transcribe 2 - a family of two new models, one open weights, for transcribing audio to text. This is the latest in their Whisper-like model family, …
  • Copy link
  • Flag this post
  • Block
Simon Dückert
Simon Dückert
@simondueckert@colearn.social replied  ·  activity timestamp 2 hours ago

@simon Did you compare it to #parkeet v3 in terms of speed and acuracy?

  • Copy link
  • Flag this comment
  • Block
Viraptor
Viraptor
@viraptor@cyberplace.social replied  ·  activity timestamp 7 hours ago

@simon
It's the first one you can run yourself that does diarization, isn't it? I've seen hacks to implement it that were painful to use before, but nothing truly integrated.

  • Copy link
  • Flag this comment
  • Block
Mat]3
Mat]3
@mathis@metalhead.club replied  ·  activity timestamp 9 hours ago

@simon Whisper is very good, but when the audio is noisy and not very clear, it starts hallucinating. I wonder how Voxtral fares in this case. The first Voxtral was just a bit worse than Whisper.

  • Copy link
  • Flag this comment
  • Block
Andreas Wagner
Andreas Wagner
@anwagnerdreas@hcommons.social replied  ·  activity timestamp 8 hours ago

@mathis @simon Whisper seems to have significant hallucination problems with speakers with speech disabilities - even worse than with accents. In a stroke of genius the researchers who investigated this labelled their study "Careless Whisper". I wonder how Voxtral would fare in such situations.

https://doi.org/10.1145/3630106.3658996

Careless Whisper: Speech-to-Text Hallucination Harms

  • Copy link
  • Flag this comment
  • Block

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.2-alpha.7 no JS en
Automatic federation enabled
Log in
  • Explore
  • About
  • Members
  • Code of Conduct