Post · bonfire.cafe

Hot take: llm "guardrails" are worthless and will always be ineffective; they are a throwback to a premodern model of security as a list of prohibitions against actions instead of a more modern, holistic approach where the system as a whole is structured such that impermissible operations fail as a consequence of the system architecture.

The core mechanism of llm systems relies on the random elision and remixing of inputs; all such guardrail systems exist within this milieu, and are thus - architecturally, according to how llms work as a baseline - subject to that same elision; therefore, you can never be assured that a given guardrail directive will be present in the context window for the llm at the time of processing.

I personally think this is blindingly obvious, but I do understand why people who are bought into the tech might not understand that any attempt to 'instruct' an llm as to 'alignment' is going to be subject to an erosion of those 'protections' as an inherent part of the function of the machine.

Bluntly, if you don't want the llm to "do" a thing, you must make that thing impossible for the llm to do. Do not give it access to your filesystem; do not give it access to your production infrastructure; do not give it access to your children; do not give it access to anything unsupervised whatsoever.

And do not use an llm for any system where determinacy of operation is even slightly important, for that matter.

https://www.theregister.com/2025/11/14/ai_guardrails_prompt_injections_echogram_tokens/?td=keepreading

EchoGram tokens like ‘=coffee’ flip AI guardrail verdicts

: Who guards the guardrails? Often the same shoddy security as the rest of the AI stack

Fi 🏳️‍⚧️

@munin@infosec.exchange replied · yesterday

I'm not terribly familiar with Yann LeCun and his work, but his opinion as expressed in this article is one I agree with - llm are a local maximum, and there is no path where this leads to a "genai" or "superintelligence" consequence, regardless how sophisticated you make your models nor however much processing power you throw at them.

https://futurism.com/artificial-intelligence/meta-top-ai-scientist-quitting

Fi 🏳️‍⚧️

@munin@infosec.exchange replied · yesterday

Thinking is not language-based.

Language is an API by which people approximate the concepts in their heads to each other.

The semantic connections between words and concepts are not fixed; they're fairly sloppy, with a high degree of tolerance in how they fit together.

This is a feature; this is how poetry works, for instance.

This is also why, in areas such as law and medicine, the practitioners have fossilized specific semantic connotations and relationships using extremely specific jargon that you don't find outside of those fields, and frequently use Latin - a language not subject to the same forces of semantic drift as English, due to the paucity of normal speakers of it - to ossify those concepts and keep them consistent.

Starting -from- language and working backwards to the underlying conceptual framework is the opposite of how humans learn in the first place; infants learn basic facts about the world during their early life, and then they are taught the external cues that allow for communicating facts about the world with their caretakers through consistent conditioning - same way you teach a dog to sit; you associate the condition with the word 'sit' and thus achieve instruction.

While llms are certainly a clever way to create the impression of "understanding" it is, ultimately, a trick - the only 'understanding' comes from the human side; Clever Hans is not doing math at all, but engaging in a fuzzing of human responses to get the sugarcube.

Fi 🏳️‍⚧️

@munin@infosec.exchange replied · yesterday

And likewise, these "ai scientists" are genuinely incredibly blinkered in how they pursue "new advancements" in "machine cognition" - not a single one of them, as far as I can tell, has considered that we as humans -already have- examples of "superintelligence" in the real world.

You know how you get 'superintelligence'?

You collect a group of people with diverse viewpoints and experiences in the same room, give them a reason to want to work together, and remove obstacles to communication.

It's called fucking -teamwork- and it's been a thing for roughly all of human history.

"The whole is greater than the sum of its parts" y'know? That aphorism exists for a reason! Emergent effects from teams working together is a phenomenon that's kinda been around forfuckingever!

Fi 🏳️‍⚧️

@munin@infosec.exchange replied · yesterday

And, crucially, part of that is that the different members of the team

-have different thought processes-

and are approaching the problem in parallel.

A single "intelligence" - no matter how well informed - will never have the correct context window to do this; access to knowledge does not mean, inherently, an understanding of what knowledge is specifically pertinent to the problem under discussion.

[ Plural systems with a high degree of co-consciousness can manage something that looks like this, but the whole 'plurality' angle means it's not a -single- intelligence working on the issue; it's basically the same situation as a particularly tight-knit team, and you can see similar dynamics at tech conferences if you look for the corner full of very queer catgirls excitedly talking in a way unintelligible to anyone outside the conversation. ]

Fi 🏳️‍⚧️

@munin@infosec.exchange replied · yesterday

What llms have phenomenally succeeded at is to capture the attention of the moneyed class with the illusion that there is a way to avoid having to pay anyone outside their class - B2B transactions of OPEX are always preferred by the likes of them contra paying salaries to anyone individually, after all - to do the 'knowledge work' that they rely on as a resource to perform certain operations.

The fact that this will never succeed at anything genuinely -new- is not easily apparent to them - or, in fairness, to most people; how many people do you know who understand where a new idea comes from vs. remixing old ideas into new contexts?

And given the way in which "synthetic data" causes model collapse, combined with the hypercapitalistic explosion of llm-generated moneygrabs that are utterly fucking ubiquitous on the web now?

[ Consider also that human participation in the web is decreasing; people are generating and sharing llm-mediated content instead of expressing their own words and drawing their own pictures; a person posting slop is acting as a proxy for the llm, rather than acting as a person. This is a self-reinforcing cycle: as this behavior is made easier and considered socially acceptable in a given group, the proportion of synthetic content goes up and the proportion of human-generated content decreases; human cognition is -expensive- for the person doing it, and if they can gain the same social benefits from participating using the machine rather than their own head, it only makes sense for them to do so.

Folks like me who specifically rebel against this are -deeply weird- and probably more than a little masochistic. ]

https://en.wikipedia.org/wiki/Model_collapse

Fi 🏳️‍⚧️

@munin@infosec.exchange replied · yesterday

Anyway. The long and short of this thread is that, with a sufficient understanding of how the llm mechanism works under the hood, the whole "guardrail" thing becomes obviously impossible to achieve, and that if you want a machine that isn't going to randomly output shit from alt.sex.stories.llama.farmers into your CI infrastructure, you're gonna need a different system.

Fi 🏳️‍⚧️

@munin@infosec.exchange replied · yesterday

You know how you get 'superintelligence'?

You collect a group of people with diverse viewpoints and experiences in the same room, give them a reason to want to work together, and remove obstacles to communication.

It's called fucking -teamwork- and it's been a thing for roughly all of human history.

"The whole is greater than the sum of its parts" y'know? That aphorism exists for a reason! Emergent effects from teams working together is a phenomenon that's kinda been around forfuckingever!

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances

Bonfire social · 1.0.0 no JS en

Automatic federation enabled