@endrift I find they can generally be much better than previous approaches at translating "human" errors and nonstandard language, but yeah they have the standard LLM issue where if you feed in garbage they don't know how to say "no" and they just hallucinate whatever is statistically most likely.
@endrift Not that previous approaches did any better though. Google Translate has been doing stupid stuff since its inception and it has never had a "this doesn't make any sense" flag.
The main difference is LLMs are more likely to come up with correct, plausible sounding text instead of something clearly broken, when given clearly broken input.
@endrift I honestly don't know why they didn't have, like, a confidence flag in old models that could actually refuse to translate or warn.
"Standard" LLMs (the text completion kind), if that's what they're using now, probably can't implement that reliably, but you can probably architect a translation model that can (I forget what it's called but I think there's a model architecture better suited to translation).