Post · bonfire.cafe

Post

Recent commentary [1]:
escalating concern over the use of the more powerful #chatbots when they are used to go beyond the #knowledge of the human expert who uses them, rather than for simply pre-processing in a controlled way within the domain of human-expert knowledge.

1. ⁠What is often called "hallucination/confabulation” (i.e. severe #extrapolation #uncertainty and #overfitting by the chatbot model) is apparently becoming increasingly realistic with a declining human ability to detect it

Daniele de Rigo

@dderigo@hostux.social replied · last year

Another potential key point:

2. apparently, an emerging cognitive bias in humans who allegedly may tend to over-trust #chatbots, especially more advanced ones.

In particular [1]:
"bigger, more-refined versions of #LLMs are, as expected, more accurate [...] But they are less reliable: among all the non-accurate responses, the fraction of wrong answers has increased [...] because the models are less likely to avoid answering a question — for example, by saying they don’t know"

Daniele de Rigo

@dderigo@hostux.social replied · last year

"As expected, the accuracy of the answers
increased as the refined models became larger and
decreased as the questions got harder [...]
The fraction of wrong answers among those that were either incorrect or avoided rose as the models got bigger, and reached more than 60 %, for several refined models" [1]

The study "found that all the models would occasionally get even easy questions wrong, meaning there is no ‘ #SafeOperatingRegion’ in which a user can have high confidence in the answers"

Daniele de Rigo

@dderigo@hostux.social replied · last year

The research [2] noted how "the percentage of incorrect results increases markedly from the raw to the shaped-up models, as a consequence of substantially reducing avoidance [...]
Where the raw models tend to give non-conforming outputs that cannot be interpreted as an answer [...], shaped-up models instead give seemingly #PlausibleButWrong answers [[...]
This does not match the expectation that more recent #LLMs would more successfully avoid answering outside their operating range"

Daniele de Rigo

@dderigo@hostux.social replied · last year

Subtle tradeoff "whether avoidance increases for more difficult instances, as would be appropriate for the corresponding lower level of correctness"

Alas, the
"percentage of avoidant answers rarely rises quicker than the percentage of incorrect ones":
"an involution in #reliability: there is no difficulty range for which #errors are improbable, either because the questions are so easy that the model never fails or because they are so difficult that the model always avoids giving an answer"

Daniele de Rigo

@dderigo@hostux.social replied · last year

#References

[1] Jones, N., 2024. Bigger AI chatbots more inclined to spew nonsense — and people don’t always realize. Nature d41586-024-03137–3+
https://doi.org/10.1038/d41586-024-03137-3

[2] Zhou, L., Schellaert, W., Martínez-Plumed, F., Moros-Daval, Y., Ferri, C., Hernández-Orallo, J., 2024. Larger and more instructable language models become less reliable. Nature. https://doi.org/10.1038/s41586-024-07930-y

#DOI #LargeLanguageModels #chatbots #CognitiveBias

Larger and more instructable language models become less reliable

Bigger AI chatbots more inclined to spew nonsense — and people don’t always realize

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances

Bonfire social · 1.0.1-alpha.44 no JS en

Automatic federation enabled