Discussion
Loading...

#Tag

Log in
  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Michael Graaf boosted
Fastly Devs
Fastly Devs
@fastlydevs@mastodon.social  ·  activity timestamp last week

Why do LLMs fall for prompt injection attacks that wouldn’t fool a fast-food worker?

In this piece, Fastly Distinguished Engineer Barath Raghavan and security expert Bruce Schneier explain how AI flattens context—and why that makes autonomous AI agents especially risky.

A sharp, practical take on AI security. 🍔🤖: https://spectrum.ieee.org/prompt-injection-attack

#AISecurity #PromptInjection #LLMs #Cybersecurity

Sorry, no caption provided by author
Sorry, no caption provided by author
Sorry, no caption provided by author
IEEE Spectrum

Why AI Keeps Falling for Prompt Injection Attacks

Why AI falls for scams that wouldn't trick a fast-food worker—and what that reveals about AI security.
⁂
More from
IEEE Spectrum
  • Copy link
  • Flag this post
  • Block
Fastly Devs
Fastly Devs
@fastlydevs@mastodon.social  ·  activity timestamp last week

Why do LLMs fall for prompt injection attacks that wouldn’t fool a fast-food worker?

In this piece, Fastly Distinguished Engineer Barath Raghavan and security expert Bruce Schneier explain how AI flattens context—and why that makes autonomous AI agents especially risky.

A sharp, practical take on AI security. 🍔🤖: https://spectrum.ieee.org/prompt-injection-attack

#AISecurity #PromptInjection #LLMs #Cybersecurity

Sorry, no caption provided by author
Sorry, no caption provided by author
Sorry, no caption provided by author
IEEE Spectrum

Why AI Keeps Falling for Prompt Injection Attacks

Why AI falls for scams that wouldn't trick a fast-food worker—and what that reveals about AI security.
⁂
More from
IEEE Spectrum
  • Copy link
  • Flag this post
  • Block
UKP Lab
UKP Lab
@UKPLab@sigmoid.social  ·  activity timestamp 3 months ago

𝗞𝗲𝘆 𝘁𝗮𝗸𝗲𝗮𝘄𝗮𝘆𝘀:
⚡ 𝟱𝟳% 𝗮𝘁𝘁𝗮𝗰𝗸 𝘀𝘂𝗰𝗰𝗲𝘀𝘀 𝗿𝗮𝘁𝗲: Outperforms SOTA attacks across GPT-4o, LLama, Gemma, and Phi models
🧠 𝗦𝗺𝗮𝗿𝘁𝗲𝗿 ≠ 𝗦𝗮𝗳𝗲𝗿: Larger, more capable models are MORE vulnerable to contrastive reasoning attacks
🚨 𝗗𝗲𝗳𝗲𝗻𝘀𝗲 𝗴𝗮𝗽 𝗲𝘅𝗽𝗼𝘀𝗲𝗱: Current safety measures can't detect subtle, logic-driven jailbreaks
✅ 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻 𝗲𝘅𝗶𝘀𝘁𝘀: Our Chain-of-Thought defenses reduce attack success by 95%

(2/🧵)

UKP Lab
UKP Lab
@UKPLab@sigmoid.social replied  ·  activity timestamp 3 months ago

📜 𝗣𝗮𝗽𝗲𝗿 → https://arxiv.org/pdf/2501.01872
🌐 𝗣𝗿𝗼𝗷𝗲𝗰𝘁 → https://ukplab.github.io/emnlp2025-poate-attack/
💾 𝗖𝗼𝗱𝗲 + 𝗱𝗮𝘁𝗮 → https://github.com/UKPLab/emnlp2025-poate-attack

And consider following the authors Rachneet Sachdeva‬, Rima Hazra, and Iryna Gurevych (UKP Lab/TU Darmstadt) if you are interested in more information or an exchange of ideas.

(3/3)

#NLProc #LLMSafety #AIsecurity #Jailbreak #LLM

GitHub

GitHub - UKPLab/emnlp2025-poate-attack: Code associated with "Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions".

Code associated with "Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions". - UKPLab/emnlp2025-poate-attack

Turning Logic Against Itself: Probing Model Defenses Through Contrastive Questions

POATE achieves 44% attack success rate on major LLMs by harnessing contrastive reasoning to provoke unethical responses.
https://arxiv.org/pdf/2501.01872
  • Copy link
  • Flag this comment
  • Block
Ænðr E. Feldstraw
Ænðr E. Feldstraw
@aeveltstra@mastodon.social  ·  activity timestamp 6 months ago
https://cs.gmu.edu/~zeng/papers/2025-Security-OneFlip.pdf

#oneflip : one flip to rule them all.

The linked paper by students of George Mason University (Xiang Li et al (2025): "Rowhammer-Based Trojan Injection:
One Bit Flip Is Sufficient for Backdooring DNNs") descibes how flipping a single bit suffices to corrupt the output of high-precision a.i.-s based on deep neural networks.

Of course there are no mitigations: none of the creators imagined malice.

#cybersecurity #aisecurity

  • Copy link
  • Flag this post
  • Block

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.2-alpha.22 no JS en
Automatic federation enabled
Log in
  • Explore
  • About
  • Members
  • Code of Conduct