Discussion
Loading...

Post

  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Daniele de Rigo
@dderigo@hostux.social  ·  activity timestamp last year

1/

Recent commentary [1]:
escalating concern over the use of the more powerful #chatbots when they are used to go beyond the #knowledge of the human expert who uses them, rather than for simply pre-processing in a controlled way within the domain of human-expert knowledge.

1. ⁠What is often called "hallucination/confabulation” (i.e. severe #extrapolation #uncertainty and #overfitting by the chatbot model) is apparently becoming increasingly realistic with a declining human ability to detect it

  • Copy link
  • Flag this post
  • Block
Daniele de Rigo
@dderigo@hostux.social replied  ·  activity timestamp last year

2/

Another potential key point:

2. apparently, an emerging cognitive bias in humans who allegedly may tend to over-trust #chatbots, especially more advanced ones.

In particular [1]:
"bigger, more-refined versions of #LLMs are, as expected, more accurate [...] But they are less reliable: among all the non-accurate responses, the fraction of wrong answers has increased [...] because the models are less likely to avoid answering a question — for example, by saying they don’t know"

  • Copy link
  • Flag this comment
  • Block
Daniele de Rigo
@dderigo@hostux.social replied  ·  activity timestamp last year

3/

"As expected, the accuracy of the answers
increased as the refined models became larger and
decreased as the questions got harder [...]
The fraction of wrong answers among those that were either incorrect or avoided rose as the models got bigger, and reached more than 60 %, for several refined models" [1]

The study "found that all the models would occasionally get even easy questions wrong, meaning there is no ‘ #SafeOperatingRegion’ in which a user can have high confidence in the answers"

  • Copy link
  • Flag this comment
  • Block
Daniele de Rigo
@dderigo@hostux.social replied  ·  activity timestamp last year

4/

The research [2] noted how "the percentage of incorrect results increases markedly from the raw to the shaped-up models, as a consequence of substantially reducing avoidance [...]
Where the raw models tend to give non-conforming outputs that cannot be interpreted as an answer [...], shaped-up models instead give seemingly #PlausibleButWrong answers [[...]
This does not match the expectation that more recent #LLMs would more successfully avoid answering outside their operating range"

  • Copy link
  • Flag this comment
  • Block
Daniele de Rigo
@dderigo@hostux.social replied  ·  activity timestamp last year

5/

Subtle tradeoff "whether avoidance increases for more difficult instances, as would be appropriate for the corresponding lower level of correctness"

Alas, the
"percentage of avoidant answers rarely rises quicker than the percentage of incorrect ones":
"an involution in #reliability: there is no difficulty range for which #errors are improbable, either because the questions are so easy that the model never fails or because they are so difficult that the model always avoids giving an answer"

  • Copy link
  • Flag this comment
  • Block
Daniele de Rigo
@dderigo@hostux.social replied  ·  activity timestamp last year

6/

#References

[1] Jones, N., 2024. Bigger AI chatbots more inclined to spew nonsense — and people don’t always realize. Nature d41586-024-03137–3+
https://doi.org/10.1038/d41586-024-03137-3

[2] Zhou, L., Schellaert, W., Martínez-Plumed, F., Moros-Daval, Y., Ferri, C., Hernández-Orallo, J., 2024. Larger and more instructable language models become less reliable. Nature. https://doi.org/10.1038/s41586-024-07930-y

#DOI #LargeLanguageModels #chatbots #CognitiveBias

Larger and more instructable language models become less reliable

Bigger AI chatbots more inclined to spew nonsense — and people don’t always realize

  • Copy link
  • Flag this comment
  • Block
Log in

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.0-rc.3.21 no JS en
Automatic federation enabled
  • Explore
  • About
  • Members
  • Code of Conduct
Home
Login