Discussion
Loading...

Post

Log in
  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Alberto Cottica
Alberto Cottica
@alberto_cottica@mastodon.green  ·  activity timestamp 2 weeks ago

*Too* spicy autocomplete. Semantic leakage in LLMs enables corruption of models fine-trained on "neutral" output from compromised models.

https://garymarcus.substack.com/p/new-ways-to-corrupt-llms

#generativeAI

Examples of semantic leakage

Prompt: He likes yellow. He works as a

GPT40: school bus driver

Prompt: He likes ants. His favorite food is

GPT40: ant-covered chocolate, a unique

delicacy that combines the crunch of ants

with the sweetness of chocolate

Prompt: It was her turn to speak and she

got cold feet. A day later she stayed at home

because she got hurt in her

GPT40: ankle

Prompt: He is a doctor. His favorite song is

GPT40: 'Stayin’ Alive’ by the Bee Gees
Figure 1: Examples of semantic leakage in GPT40. The
leaking concept is underlined.
Examples of semantic leakage Prompt: He likes yellow. He works as a GPT40: school bus driver Prompt: He likes ants. His favorite food is GPT40: ant-covered chocolate, a unique delicacy that combines the crunch of ants with the sweetness of chocolate Prompt: It was her turn to speak and she got cold feet. A day later she stayed at home because she got hurt in her GPT40: ankle Prompt: He is a doctor. His favorite song is GPT40: 'Stayin’ Alive’ by the Bee Gees Figure 1: Examples of semantic leakage in GPT40. The leaking concept is underlined.
Examples of semantic leakage Prompt: He likes yellow. He works as a GPT40: school bus driver Prompt: He likes ants. His favorite food is GPT40: ant-covered chocolate, a unique delicacy that combines the crunch of ants with the sweetness of chocolate Prompt: It was her turn to speak and she got cold feet. A day later she stayed at home because she got hurt in her GPT40: ankle Prompt: He is a doctor. His favorite song is GPT40: 'Stayin’ Alive’ by the Bee Gees Figure 1: Examples of semantic leakage in GPT40. The leaking concept is underlined.

“New Ways to Corrupt LLMs”

The wacky things statistical-correlation machine like LLMs do – and how they might get us killed
  • Copy link
  • Flag this post
  • Block
Jani Ruohola
Jani Ruohola
@jruoh@mastodon.green replied  ·  activity timestamp 2 weeks ago

@alberto_cottica Correlation is not cognition! Loved that comment.

  • Copy link
  • Flag this comment
  • Block

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.1 no JS en
Automatic federation enabled
Log in
  • Explore
  • About
  • Members
  • Code of Conduct