#Tag · bonfire.cafe

The briefing also features perspectives from:
👤 Prof. Dr. Hinrich Schütze, Ludwig-Maximilians-Universität München (LMU)
👤 Prof. Dr. Dorothea Kolossa, Technische Universität Berlin
👤 Dr. Paul Röttger, Oxford Internet Institute
👤 Dr. Jonas Geiping, Max Planck Institute for Intelligent Systems

📄 Read the full German briefing here:
https://sciencemediacenter.de/angebote/sprachmodelle-entwickeln-unerwuenschte-verhaltensweisen-26006

🧾 Nature paper:
https://www.nature.com/articles/s41586-025-09937-5

(4/4)

#AI #NLP #NLProc #LLM #AIResearch #ResponsibleAI #UKPLab

Science Media Center Germany

Sprachmodelle entwickeln unerwünschte Verhaltensweisen

Studie: Chatbots übertragen erlerntes schädliches Verhalten auf alle Anfragen; emergent. Ursachen unklar, bestimmtes Training könnte bösartige Anteile verstärken.

UKP Lab

@UKPLab@sigmoid.social · 3 weeks ago

Most strikingly, she emphasises that just a few examples can cause far-reaching behavioural shifts in LLMs, potentially affecting current models as well. For practitioners, the takeaway is clear: careful training data curation and thorough testing after fine-tuning are essential.

(3/🧵 )

UKP Lab

@UKPLab@sigmoid.social replied · 3 weeks ago

📄 Read the full German briefing here:
https://sciencemediacenter.de/angebote/sprachmodelle-entwickeln-unerwuenschte-verhaltensweisen-26006

🧾 Nature paper:
https://www.nature.com/articles/s41586-025-09937-5

(4/4)

#AI #NLP #NLProc #LLM #AIResearch #ResponsibleAI #UKPLab

Science Media Center Germany

Sprachmodelle entwickeln unerwünschte Verhaltensweisen

Studie: Chatbots übertragen erlerntes schädliches Verhalten auf alle Anfragen; emergent. Ursachen unklar, bestimmtes Training könnte bösartige Anteile verstärken.

Djoerd Hiemstra 🍉 boosted

ACL 2026

@aclmeeting@sigmoid.social · 4 weeks ago

📢 Call for Papers: ACL 2026 Industry Track
ACL 2026 Industry Track in San Diego, CA, United States
Conference: July 2 - 7, 2026
Paper submission deadline: February 14, 2026
https://2026.aclweb.org/calls/industry_track/ #NLProc #ACL2026NLP

ACL 2026

Industry Track

ACL 2026 Industry Track.

ACL 2026

@aclmeeting@sigmoid.social · 4 weeks ago

ACL 2026

Industry Track

ACL 2026 Industry Track.

Andreas Wagner boosted

UKP Lab

@UKPLab@sigmoid.social · last month

👏 Congratulations to all authors and collaborators. We are looking forward to presenting our work at EACL 2026 in #Rabat. More details to follow.

#NLProc #LLM #MachineLearning #UKPLab #Research #EACL2026

UKP Lab

@UKPLab@sigmoid.social · last month

8️⃣ 𝗟𝗟𝗠𝘀 𝗮𝘀 𝗖𝘂𝗹𝘁𝘂𝗿𝗮𝗹 𝗔𝗿𝗰𝗵𝗶𝘃𝗲𝘀: 𝗖𝘂𝗹𝘁𝘂𝗿𝗮𝗹 𝗖𝗼𝗺𝗺𝗼𝗻𝘀𝗲𝗻𝘀𝗲 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗚𝗿𝗮𝗽𝗵 𝗘𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻
Junior Cedric Tonga, Chen Cecilia Liu, Iryna Gurevych, Fajri Koto

9️⃣ 𝗔𝗜𝗖𝗗 𝗕𝗲𝗻𝗰𝗵: 𝗔 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗶𝗻𝗴 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸 𝗳𝗼𝗿 𝗔𝗜-𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗲𝗱 𝗖𝗼𝗱𝗲 𝗗𝗲𝘁𝗲𝗰𝘁𝗶𝗼𝗻
Daniil Orel, Dilshod Azizov, Indraneil Paul, Yuxia Wang, Iryna Gurevych, Preslav Nakov

UKP Lab

@UKPLab@sigmoid.social replied · last month

👏 Congratulations to all authors and collaborators. We are looking forward to presenting our work at EACL 2026 in #Rabat. More details to follow.

#NLProc #LLM #MachineLearning #UKPLab #Research #EACL2026

Andreas Wagner boosted

UKP Lab

@UKPLab@sigmoid.social · 3 months ago

The day featured well-prepared talks, thoughtful questions, and lively exchanges across topics.

A big thank you to Yongxin Huang, our thesis coordinator, for guiding this cohort through the process, and to all supervisors for their continuous support throughout the semester.

We wish all students the best of luck as they finalize their submissions by the end of this month! ✨

(2/2)

#UKPLab #ThesisDay #NLP #NLProc

UKP Lab

@UKPLab@sigmoid.social · 3 months ago

The day featured well-prepared talks, thoughtful questions, and lively exchanges across topics.

A big thank you to Yongxin Huang, our thesis coordinator, for guiding this cohort through the process, and to all supervisors for their continuous support throughout the semester.

We wish all students the best of luck as they finalize their submissions by the end of this month! ✨

(2/2)

#UKPLab #ThesisDay #NLP #NLProc

Roman Klinger

@romanklinger.de@bsky.brid.gy · 3 months ago

Tomorrow I'll give my inaugural lecture @uni-bamberg.de@bsky.brid.gy – if you don't know what language technology #nlproc is, and want to know what @bamnlp.de@bsky.brid.gy works on, please feel invited! If you can't make it at 6pm to Weberei 5 in Bamberg ("Erba Campus"), send me a message and I'll share a streaming link.

UKP Lab

@UKPLab@sigmoid.social · 3 months ago

𝗧𝗶𝗿𝗲𝗱 𝗼𝗳 𝘆𝗼𝘂𝗿 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁 𝗺𝗮𝗸𝗶𝗻𝗴 𝗺𝗮𝗻𝘆 𝗲𝗿𝗿𝗼𝗿𝘀?
➡️ We’ve got the solution!

Meet 𝗦𝗘𝗘𝗘𝗗 🌱 — a framework for 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗲𝗱 𝗘𝗿𝗿𝗼𝗿 𝗗𝗶𝘀𝗰𝗼𝘃𝗲𝗿𝘆 in conversational AI.

🧩 SEEED detects both 𝗸𝗻𝗼𝘄𝗻 𝗮𝗻𝗱 𝗽𝗿𝗲𝘃𝗶𝗼𝘂𝘀𝗹𝘆 𝘂𝗻𝘀𝗲𝗲𝗻 𝗲𝗿𝗿𝗼𝗿 𝘁𝘆𝗽𝗲𝘀, and even 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲𝘀 𝗱𝗲𝗳𝗶𝗻𝗶𝘁𝗶𝗼𝗻𝘀 for newly discovered ones

⚙️ By combining 𝗹𝗶𝗴𝗵𝘁𝘄𝗲𝗶𝗴𝗵𝘁 𝗲𝗻𝗰𝗼𝗱𝗲𝗿𝘀 with a 𝗻𝗼𝘃𝗲𝗹 𝘀𝗮𝗺𝗽𝗹𝗶𝗻𝗴 𝘀𝘁𝗿𝗮𝘁𝗲𝗴𝘆 𝗳𝗼𝗿 𝗰𝗼𝗻𝘁𝗿𝗮𝘀𝘁𝗶𝘃𝗲 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴, it improves representation learning and uncovers 𝗰𝗼𝗵𝗲𝗿𝗲𝗻𝘁 𝗲𝗿𝗿𝗼𝗿 𝗰𝗮𝘁𝗲𝗴𝗼𝗿𝗶𝗲𝘀.

(1/🧵 )

Diagram illustrating a chatbot correction process. A human says, “I really like indie music! Do you have a favorite artist?” The chatbot replies, “I’m a huge fan of indie music too! The Beatles are my absolute favorite!” A feedback system then flags this as factually inconsistent, explaining that The Beatles are a rock band and suggesting The Smiths as an indie band instead. The corrected response becomes, “I’m a huge fan of indie music too! The Smiths are my absolute favorite!” The diagram also highlights limitations of relying solely on instructions or external tools, noting that they “do not cover everything.”

UKP Lab

@UKPLab@sigmoid.social replied · 3 months ago

📊 𝗦𝗘𝗘𝗘𝗗 outperforms #GPT-4o and #Phi-4 by up to +𝟴 𝗽𝗽 across multiple datasets.

📄 𝗣𝗮𝗽𝗲𝗿: https://www.arxiv.org/abs/2509.10833
💻 𝗖𝗼𝗱𝗲: https://github.com/UKPLab/emnlp2025-automatic-error-discovery
🔗 𝗣𝗿𝗼𝗷𝗲𝗰𝘁: https://ukplab.github.io/emnlp2025-automatic-error-discovery/

Be sure to follow the authors: Dominic Petrak, Thy Thy Tran, and Iryna Gurevych from Ubiquitous Knowledge Processing (UKP) Lab/Technische Universität Darmstadt.

See you at the #EMNLP in Suzhou!

(2/2)

#NLProc #ConversationalAI #Agents #EMNLP2025

UKP Lab

@UKPLab@sigmoid.social · 3 months ago

𝗞𝗲𝘆 𝘁𝗮𝗸𝗲𝗮𝘄𝗮𝘆𝘀:
⚡ 𝟱𝟳% 𝗮𝘁𝘁𝗮𝗰𝗸 𝘀𝘂𝗰𝗰𝗲𝘀𝘀 𝗿𝗮𝘁𝗲: Outperforms SOTA attacks across GPT-4o, LLama, Gemma, and Phi models
🧠 𝗦𝗺𝗮𝗿𝘁𝗲𝗿 ≠ 𝗦𝗮𝗳𝗲𝗿: Larger, more capable models are MORE vulnerable to contrastive reasoning attacks
🚨 𝗗𝗲𝗳𝗲𝗻𝘀𝗲 𝗴𝗮𝗽 𝗲𝘅𝗽𝗼𝘀𝗲𝗱: Current safety measures can't detect subtle, logic-driven jailbreaks
✅ 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻 𝗲𝘅𝗶𝘀𝘁𝘀: Our Chain-of-Thought defenses reduce attack success by 95%

(2/🧵)

UKP Lab

@UKPLab@sigmoid.social replied · 3 months ago

📜 𝗣𝗮𝗽𝗲𝗿 → https://arxiv.org/pdf/2501.01872
🌐 𝗣𝗿𝗼𝗷𝗲𝗰𝘁 → https://ukplab.github.io/emnlp2025-poate-attack/
💾 𝗖𝗼𝗱𝗲 + 𝗱𝗮𝘁𝗮 → https://github.com/UKPLab/emnlp2025-poate-attack

And consider following the authors Rachneet Sachdeva‬, Rima Hazra, and Iryna Gurevych (UKP Lab/TU Darmstadt) if you are interested in more information or an exchange of ideas.

(3/3)

#NLProc #LLMSafety #AIsecurity #Jailbreak #LLM

GitHub

GitHub - UKPLab/emnlp2025-poate-attack: Code associated with "Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions".

Code associated with "Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions". - UKPLab/emnlp2025-poate-attack

Turning Logic Against Itself: Probing Model Defenses Through Contrastive Questions

POATE achieves 44% attack success rate on major LLMs by harnessing contrastive reasoning to provoke unethical responses.

https://arxiv.org/pdf/2501.01872

UKP Lab

@UKPLab@sigmoid.social · 6 months ago

9️⃣ 𝘓𝘦𝘢𝘬𝘺 𝘛𝘩𝘰𝘶𝘨𝘩𝘵𝘴: 𝘓𝘢𝘳𝘨𝘦 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯𝘨 𝘔𝘰𝘥𝘦𝘭𝘴 𝘈𝘳𝘦 𝘕𝘰𝘵 𝘗𝘳𝘪𝘷𝘢𝘵𝘦 𝘛𝘩𝘪𝘯𝘬𝘦𝘳𝘴
Tommaso Green, Martin Gubri, Haritz Puerto, Sangdoo Yun, Seong Joon Oh

🔟 𝘐𝘥𝘦𝘯𝘵𝘪𝘧𝘺𝘪𝘯𝘨 𝘈𝘴𝘱𝘦𝘤𝘵𝘴 𝘪𝘯 𝘗𝘦𝘦𝘳 𝘙𝘦𝘷𝘪𝘦𝘸𝘴
Sheng Lu, Ilia Kuznetsov, Iryna Gurevych

UKP Lab

@UKPLab@sigmoid.social replied · 6 months ago

👏 Congratulations to all authors and collaborators for their excellent work! We are looking forward to presenting these results at EMNLP 2025 in #Suzhou this November.

Stay tuned for more details!

#NLProc #MachineLearning #UKPLab #Research #EMNLP2025

UKP Lab

@UKPLab@sigmoid.social · 7 months ago

Also consider following the authors Aniket Pramanick (Ubiquitous Knowledge Processing (UKP) Lab)‬, Yufang Hou (IT:U- Interdisciplinary Transformation University Austria, IBM Research), Saif M Mohammad (National Research Council Canada / Conseil national de recherches Canada), and Iryna Gurevych (Ubiquitous Knowledge Processing (UKP) Lab).

🗺️ See you at #ACL2025 in Vienna

(2/2)

#NLProc #ACL2025 #AI4Science #ACL2025