Discussion
Loading...

#Tag

Log in
  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Hacker News
Hacker News
@h4ckernews@mastodon.social  ·  activity timestamp 22 hours ago

Provably Unmasking Malicious Behavior Through Execution Traces

https://arxiv.org/abs/2512.13821

#HackerNews #Provably #Unmasking #Malicious #Behavior #Through #Execution #Traces #executiontraces #cybersecurity #research #arxiv #maliciousbehavior #hackernews

arXiv.org

The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces

Large language models (LLMs) increasingly generate code with minimal human oversight, raising critical concerns about backdoor injection and malicious behavior. We present Cross-Trace Verification Protocol (CTVP), a novel AI control framework that verifies untrusted code-generating models through semantic orbit analysis. Rather than directly executing potentially malicious code, CTVP leverages the model's own predictions of execution traces across semantically equivalent program transformations. By analyzing consistency patterns in these predicted traces, we detect behavioral anomalies indicative of backdoors. Our approach introduces the Adversarial Robustness Quotient (ARQ), which quantifies the computational cost of verification relative to baseline generation, demonstrating exponential growth with orbit size. Theoretical analysis establishes information-theoretic bounds showing non-gamifiability -- adversaries cannot improve through training due to fundamental space complexity constraints. This work demonstrates that semantic orbit analysis provides a scalable, theoretically grounded approach to AI control for code generation tasks.
  • Copy link
  • Flag this post
  • Block
Thierry Crouzet boosted
Louis Derrac 🇫🇷 🇬🇧
Louis Derrac 🇫🇷 🇬🇧
@louisderrac@framapiaf.org  ·  activity timestamp 6 months ago

[🗞️Veille] Réflexions autour du testament numérique - Le blog de Genma https://blog.genma.fr/?Reflexions-autour-du-testament-numerique
Très intéressante réflexion, je me note de jouer l'exercice (au moins pour moi)
#numérique #traces 🗃️Toute ma veille commentée https://veille.louisderrac.com

  • Copy link
  • Flag this post
  • Block
Louis Derrac 🇫🇷 🇬🇧
Louis Derrac 🇫🇷 🇬🇧
@louisderrac@framapiaf.org  ·  activity timestamp 6 months ago

[🗞️Veille] Réflexions autour du testament numérique - Le blog de Genma https://blog.genma.fr/?Reflexions-autour-du-testament-numerique
Très intéressante réflexion, je me note de jouer l'exercice (au moins pour moi)
#numérique #traces 🗃️Toute ma veille commentée https://veille.louisderrac.com

  • Copy link
  • Flag this post
  • Block

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.1 no JS en
Automatic federation enabled
Log in
  • Explore
  • About
  • Members
  • Code of Conduct