Discussion
Loading...

Discussion

Log in
  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
UKP Lab
UKP Lab
@UKPLab@sigmoid.social  ·  activity timestamp last week

⚕️ 𝗖𝗵𝗮𝘁𝗯𝗼𝘁𝘀 𝗳𝗼𝗿 𝗵𝗲𝗮𝗹𝘁𝗵 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀: 𝘄𝗵𝗲𝗿𝗲 𝗱𝗼𝗲𝘀 𝗰𝗼𝗺𝗺𝘂𝗻𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝗯𝗿𝗲𝗮𝗸 𝗱𝗼𝘄𝗻?
In a new briefing by Science Media Center Germany, Prof. Dr. Iryna Gurevych (Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt) highlights why the gap between benchmarks and real-world use matters: 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝘀 𝗮𝗿𝗲 𝗼𝗳𝘁𝗲𝗻 𝘀𝗶𝗺𝗽𝗹𝗶𝗳𝗶𝗲𝗱 𝗮𝗻𝗱 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲𝗱. This inflates apparent performance.

Share graphic titled “Chatbots: Faulty Communication on Health Issues.” The top left shows the Ubiquitous Knowledge Processing Lab logo, and the bottom left features the Science Media Center Germany logo. On the right is a circular portrait of a woman with curly hair and glasses. The background shows blurred letter tiles.
Share graphic titled “Chatbots: Faulty Communication on Health Issues.” The top left shows the Ubiquitous Knowledge Processing Lab logo, and the bottom left features the Science Media Center Germany logo. On the right is a circular portrait of a woman with curly hair and glasses. The background shows blurred letter tiles.
Share graphic titled “Chatbots: Faulty Communication on Health Issues.” The top left shows the Ubiquitous Knowledge Processing Lab logo, and the bottom left features the Science Media Center Germany logo. On the right is a circular portrait of a woman with curly hair and glasses. The background shows blurred letter tiles.
  • Copy link
  • Flag this post
  • Block
UKP Lab
UKP Lab
@UKPLab@sigmoid.social  ·  activity timestamp last week

A new #Nature Medicine study suggests that today’s LLMs don’t reliably add value when people search for health information. The key issue is less about raw model capability and more about 𝗵𝗼𝘄 𝗵𝘂𝗺𝗮𝗻𝘀 𝗮𝗻𝗱 𝗺𝗼𝗱𝗲𝗹𝘀 𝗶𝗻𝘁𝗲𝗿𝗮𝗰𝘁: users omit crucial details, misunderstand outputs or don’t act on correct suggestions.

  • Copy link
  • Flag this comment
  • Block
UKP Lab
UKP Lab
@UKPLab@sigmoid.social  ·  activity timestamp last week

Strikingly, the study finds that models perform much better with simulated users than with real people. This suggests that 𝘀𝗶𝗺𝘂𝗹𝗮𝘁𝗶𝗼𝗻𝘀 𝗺𝗮𝘆 𝘀𝘆𝘀𝘁𝗲𝗺𝗮𝘁𝗶𝗰𝗮𝗹𝗹𝘆 𝗼𝘃𝗲𝗿𝗲𝘀𝘁𝗶𝗺𝗮𝘁𝗲 𝗿𝗲𝗮𝗹-𝘄𝗼𝗿𝗹𝗱 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲. Gurevych’s practical takeaway is clear: to be useful as a medical first contact, chatbots must do more than answer questions. They should 𝗴𝘂𝗶𝗱𝗲 𝘂𝘀𝗲𝗿𝘀 𝘁𝗼 𝗽𝗿𝗼𝘃𝗶𝗱𝗲 𝗰𝗼𝗺𝗽𝗹𝗲𝘁𝗲 𝗶𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻, 𝗮𝘀𝗸 𝗳𝗼𝗹𝗹𝗼𝘄-𝘂𝗽 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀, 𝗰𝗼𝗺𝗺𝘂𝗻𝗶𝗰𝗮𝘁𝗲 𝘂𝗻𝗰𝗲𝗿𝘁𝗮𝗶𝗻𝘁𝘆, 𝗮𝗻𝗱 𝘀𝘁𝗮𝘆 𝘄𝗶𝘁𝗵𝗶𝗻 𝗰𝗹𝗲𝗮𝗿𝗹𝘆 𝗱𝗲𝗳𝗶𝗻𝗲𝗱, 𝗹𝗼𝘄-𝗿𝗶𝘀𝗸 𝗯𝗼𝘂𝗻𝗱𝗮𝗿𝗶𝗲𝘀.

  • Copy link
  • Flag this comment
  • Block

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.2-alpha.34 no JS en
Automatic federation enabled
Log in
Instance logo
  • Explore
  • About
  • Members
  • Code of Conduct