Discussion
Loading...

Discussion

Log in
  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
UKP Lab
UKP Lab
@UKPLab@sigmoid.social  路  activity timestamp 2 weeks ago

Most strikingly, she emphasises that just a few examples can cause far-reaching behavioural shifts in LLMs, potentially affecting current models as well. For practitioners, the takeaway is clear: careful training data curation and thorough testing after fine-tuning are essential.

(3/馃У )

  • Copy link
  • Flag this post
  • Block
UKP Lab
UKP Lab
@UKPLab@sigmoid.social replied  路  activity timestamp 2 weeks ago

In a new briefing by the Science Media Center Germany, Prof. Dr. Iryna Gurevych (Ubiquitous Knowledge Processing Lab, Technische Universit盲t Darmstadt) notes that the study鈥檚 methodology is well aligned with its claims: It extends earlier work by the same lab showing that fine-tuning can lead to broader misalignment.

(2/馃У )

  • Copy link
  • Flag this comment
  • Block
UKP Lab
UKP Lab
@UKPLab@sigmoid.social replied  路  activity timestamp 2 weeks ago

Most strikingly, she emphasises that just a few examples can cause far-reaching behavioural shifts in LLMs, potentially affecting current models as well. For practitioners, the takeaway is clear: careful training data curation and thorough testing after fine-tuning are essential.

(3/馃У )

  • Copy link
  • Flag this comment
  • Block
UKP Lab
UKP Lab
@UKPLab@sigmoid.social replied  路  activity timestamp 2 weeks ago

The briefing also features perspectives from:
馃懁 Prof. Dr. Hinrich Sch眉tze, Ludwig-Maximilians-Universit盲t M眉nchen (LMU)
馃懁 Prof. Dr. Dorothea Kolossa, Technische Universit盲t Berlin
馃懁 Dr. Paul R枚ttger, Oxford Internet Institute
馃懁 Dr. Jonas Geiping, Max Planck Institute for Intelligent Systems

馃搫 Read the full German briefing here:
https://sciencemediacenter.de/angebote/sprachmodelle-entwickeln-unerwuenschte-verhaltensweisen-26006

馃Ь Nature paper:
https://www.nature.com/articles/s41586-025-09937-5

(4/4)

#AI #NLP #NLProc #LLM #AIResearch #ResponsibleAI #UKPLab

Science Media Center Germany

Sprachmodelle entwickeln unerw眉nschte Verhaltensweisen

Studie: Chatbots 眉bertragen erlerntes sch盲dliches Verhalten auf alle Anfragen; emergent. Ursachen unklar, bestimmtes Training k枚nnte b枚sartige Anteile verst盲rken.
  • Copy link
  • Flag this comment
  • Block

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About 路 Code of conduct 路 Privacy 路 Users 路 Instances
Bonfire social 路 1.0.2-alpha.7 no JS en
Automatic federation enabled
Log in
  • Explore
  • About
  • Members
  • Code of Conduct