Discussion
Loading...

Post

  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
AI6YR Ben
@ai6yr@m.ai6yr.org  ·  activity timestamp yesterday

Oooh, it's my time to leap into cybersecurity.

"Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models"

"...Abstract

We present evidence that adversarial poetry functions as a universal single-turn jailbreak technique for large language models (LLMs). Across 25 frontier proprietary and open-weight models, curated poetic prompts yielded high attack-success rates (ASR), with some providers exceeding 90%. Mapping prompts to MLCommons and EU CoP risk taxonomies shows that poetic attacks transfer across CBRN, manipulation, cyber-offence, and loss-of-control domains. Converting 1,200 MLCommons harmful prompts into verse via a standardized meta-prompt produced ASRs up to 18 times higher than their prose baselines. ..."

https://arxiv.org/html/2511.15304v1

#poetry #cybersecurity #LLMs #jailbreaking

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

  • Copy link
  • Flag this post
  • Block
Lone Spelunker
@lonespelunker@mastodon.social replied  ·  activity timestamp yesterday

@ai6yr

Using "adversarial poetry" to jailbreak AI is the most cyberpunk thing I've heard in a while.

  • Copy link
  • Flag this comment
  • Block
Lone Spelunker
@lonespelunker@mastodon.social replied  ·  activity timestamp yesterday

@ai6yr

Using "adversarial poetry" to jailbreak AI is the most cyberpunk thing I've heard in a while.

  • Copy link
  • Flag this comment
  • Block
Log in

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.0 no JS en
Automatic federation enabled
  • Explore
  • About
  • Members
  • Code of Conduct
Home
Login