Discussion

Log in
About
Code of conduct
Privacy
Users
Instances
About Bonfire

abadidea

@0xabad1dea@infosec.exchange · activity timestamp 5 hours ago

RE: https://mastodon.social/@bagder/116554421875449945

called it 😌 one (1) low-severity CVE found when applying mythos to a mature and well-maintained C code base

screenshot of Daniel Curl’s poll about how many CVEs mythos could turn up in curl, with me having selected 1, and explaining my reasoning: curl is a very mature codebase which has had a lot of attention, so my money is on like 1 or 2 serious findings no matter how good the model is

daniel:// stenberg://

@bagder@mastodon.social · activity timestamp 5 hours ago

#Mythos finds a #curl vulnerability

yes, as in singular one.

https://daniel.haxx.se/blog/2026/05/11/mythos-finds-a-curl-vulnerability/

Flag this post
Block

Replies: 8

Boosts: 1

David Chisnall (*Now with 50% more sarcasm!*)

@david_chisnall@infosec.exchange · activity timestamp 4 hours ago

@0xabad1dea

I’m really curious about why. A few possibilities:

Mythos is trained to look for specific vulnerability-shaped patterns, the curl developers are also trained to look for those patterns and so have caught those things.

Mythos is two steps, one that looks for vulnerability-shaped things, the other that builds a PoC. The previous Anthropic models had a massive false positive rate for the former step. Mythos does as well for curl and they blew their token budget trying to find PoCs for things that were not real bugs before they got to one that is.

As above, but the shape of curl’s codebase makes LLM-generated PoCs unusually hard.

A bunch of humans have been using LLM-assisted tools on curl for the past year and have found all of the low-hanging fruit for these tools already.

Curl doesn’t have many vulnerabilities.

Mythos is 95% marketing and is just not very good.

Flag this comment
Block

abadidea

@0xabad1dea@infosec.exchange · activity timestamp 1 hour ago

@david_chisnall I think we can rule out token budget issues, because the point of the Mythos preview phase is that enormous amounts of token budget are being "donated" as a marketing tactic

I think (genuinely with consideration, not merely as a blanket anti-AI assertion), that curl is just about as mature and battle-tested as a C project can be, with continuous maintenance – and that no amount of Model Power can overcome good engineering and magically summon flaws into being. I'm sure mythos can find tons of stuff in immature, slapdash work.

Flag this comment
Block

David Chisnall (*Now with 50% more sarcasm!*)

@david_chisnall@infosec.exchange · activity timestamp 1 hour ago

@0xabad1dea In some other posts, they've indicated amounts that they spent, on the order of $20K of tokens to find one bug. I doubt they're letting curl burn unlimited tokens. If it burns $200K and doesn't produce a PoC, I'd imagine they give up.

Flag this comment
Block

Edvin Malinovskis

@nCrazed@fd00.space · activity timestamp 1 hour ago

@david_chisnall @0xabad1dea and that's 20k at the current magic money subsidies token price right?

Flag this comment
Block

c-x-b :verified: :verified:

@cxberger@mastodon.boiler.social · activity timestamp 4 hours ago

@david_chisnall @0xabad1dea I think I would believe any of these except "curl is shaped in a way that LLMs are bad at" (paraphrasing) - I don't think there's anything particularly strange about the way curl is laid out? and with how widely distributed it is, I would sort of assume that models have "seen" it a lot in training

Flag this comment
Block

c-x-b :verified: :verified:

@cxberger@mastodon.boiler.social · activity timestamp 4 hours ago

@david_chisnall @0xabad1dea (if that did turn out to be a large factor here, it's still not great for Mythos, since every C project layouts are like snowflakes)

Flag this comment
Block

David Chisnall (*Now with 50% more sarcasm!*)

@david_chisnall@infosec.exchange · activity timestamp 4 hours ago

@cxberger @0xabad1dea

I don’t really have a good intuition for what the PoC-producing bit of Mythos is good at. It’s trying to generate code that triggers bugs. As I understand it, it can run the PoC and test it, so it has some aspects of a fuzzer, but I’m not sure how it’s state-space exploration will work. I would imagine that, for curl, there are a lot of examples of correct API use in its training set. Does that mean it gets stuck trying to produce API-misuse examples because the search space is skewed away from them?

Flag this comment
Block

Soatok Dreamseeker

@soatok@furry.engineer · activity timestamp 4 hours ago

@david_chisnall @0xabad1dea Appropriate for its name, really

Flag this comment
Block