@endrift Not that previous approaches did any better though. Google Translate has been doing stupid stuff since its inception and it has never had a "this doesn't make any sense" flag.
The main difference is LLMs are more likely to come up with correct, plausible sounding text instead of something clearly broken, when given clearly broken input.
@endrift I honestly don't know why they didn't have, like, a confidence flag in old models that could actually refuse to translate or warn.
"Standard" LLMs (the text completion kind), if that's what they're using now, probably can't implement that reliably, but you can probably architect a translation model that can (I forget what it's called but I think there's a model architecture better suited to translation).

