Discussion
Loading...

Post

Log in
  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Paul Cantrell
Paul Cantrell
@inthehands@hachyderm.io  ·  activity timestamp yesterday

LLMs have no model of correctness, only typicality. So:

“How much does it matter if it’s wrong?”

It’s astonishing how frequently both providers and users of LLM-based services fail to ask this basic question — which I think has a fairly obvious answer in this case, one that the research bears out.

(Repliers, NB: Research that confirms the seemingly obvious is useful and important, and “I already knew that” is not information that anyone is interested in except you.)

1/ https://www.404media.co/chatbots-health-medical-advice-study/

404 Media

Chatbots Make Terrible Doctors, New Study Finds

Chatbots provided incorrect, conflicting medical advice, researchers found: “Despite all the hype, AI just isn't ready to take on the role of the physician.”
  • Copy link
  • Flag this post
  • Block
Paul Cantrell
Paul Cantrell
@inthehands@hachyderm.io replied  ·  activity timestamp yesterday

Despite the obviousness of the larger conclusion (“LLMs don’t give accurate medical advice”), this passage is…if not surprising, exactly, at least really really interesting.

2/

When the researchers tested the LLMs without involving users by providing the models with the full text of each clinical scenario, the models correctly identified conditions in 94.9 percent of cases. But when talking to the participants about those same conditions, the LLMs identified relevant conditions in fewer than 34.5 percent of cases. People didn’t know what information the chatbots needed, and in some scenarios, the chatbots provided multiple diagnoses and courses of action. Knowing what questions to ask a patient and what information might be withheld or missing during an examination are nuanced skills that make great human physicians; based on this study, chatbots can’t reliably replicate that kind of care.
When the researchers tested the LLMs without involving users by providing the models with the full text of each clinical scenario, the models correctly identified conditions in 94.9 percent of cases. But when talking to the participants about those same conditions, the LLMs identified relevant conditions in fewer than 34.5 percent of cases. People didn’t know what information the chatbots needed, and in some scenarios, the chatbots provided multiple diagnoses and courses of action. Knowing what questions to ask a patient and what information might be withheld or missing during an examination are nuanced skills that make great human physicians; based on this study, chatbots can’t reliably replicate that kind of care.
When the researchers tested the LLMs without involving users by providing the models with the full text of each clinical scenario, the models correctly identified conditions in 94.9 percent of cases. But when talking to the participants about those same conditions, the LLMs identified relevant conditions in fewer than 34.5 percent of cases. People didn’t know what information the chatbots needed, and in some scenarios, the chatbots provided multiple diagnoses and courses of action. Knowing what questions to ask a patient and what information might be withheld or missing during an examination are nuanced skills that make great human physicians; based on this study, chatbots can’t reliably replicate that kind of care.
  • Copy link
  • Flag this comment
  • Block
Troed Sångberg
Troed Sångberg
@troed@masto.sangberg.se replied  ·  activity timestamp 24 hours ago

@inthehands This is why experienced developers can make use of LLMs, and why LLMs won't replace them.

  • Copy link
  • Flag this comment
  • Block
Greg Lloyd
Greg Lloyd
@Roundtrip@federate.social replied  ·  activity timestamp 23 hours ago

@troed @inthehands

I see the high end #LLM experience like riding a good horse — exceptionally skilled in horsey things, moving fast, etc — an augmentation tool that’s exceptionally easy to use to augment your own abilities, not an #AI.

Ref 🧵https://federate.social/@Roundtrip/115549029949917075

  • Copy link
  • Flag this comment
  • Block
Paul Cantrell
Paul Cantrell
@inthehands@hachyderm.io replied  ·  activity timestamp yesterday

There’s a lesson here, perhaps, about the tangled relationship between what is •typical• and what is •correct•, and what it is that LLMs actually do:

When medical professionals ask medical questions in technical medical language, the answers they get are typically correct.

When non-professional ask medical questions in a perhaps medically ill-formed vernacular mode, the answers they get are typically wrong.

The LLM readily models both of these things. Despite having no notion of correctness in either case, correctness is more statistically typical in one than the other.

3/

  • Copy link
  • Flag this comment
  • Block

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.2-alpha.7 no JS en
Automatic federation enabled
Log in
  • Explore
  • About
  • Members
  • Code of Conduct