As a research project, I built a needed tool with Claude Code. I thought it would be a disaster, but it wasn't. I have some complicated feelings about it.
Post
@mttaggart
Thanks for the write-up. An interesting if tricky read. I find myself in a similar place. If we don’t become knowledgeable about these tools, able to map out failure modes and boundaries, and identify how they should be delimited and defanged, we effectively cede the floor to evangelists who won’t know or care.
Wow, thanks for taking the time to write out your experience so completely. I think I’d have a similar complex reaction.
Recently developed a large multi threaded python program implementing a complex PID controlled system with lots of realtime IO from sensors and encoders and to actuators along with weak test coverage. Kids interacted with it at a science museum.
1/n
@mttaggart Great write up. I agree that there's a lot of nuance. Those solely using AI as a means for dividing the population into "ethical" and "unethical" groups, while stating "AI is a bubble", are not going to create the change they want to see. Balanced takes that insist on using tools as safely (and, ideally, as ethically) as possible are how you cross the divide.
@mttaggart Great. Sounds so familiar. I just tried repeating a git process which I knew worked because I did it yesterday using Claude as a help. Today Claude was completely off target. I had to correct it on major things many times. Was a bit shocked but....shows that one can learn from using it ....how else could I have corrected it the second time?
Thanks for this post. I've been an extreme skeptic of LLMs but seeing increasingly promising results for agentic coding. I'm not sure I'm on board with using it as regular practice but increasingly seeing the need to experiment with it to better understand how it'll impact my job, and provide better informed opinions on it.
@mttaggart I have read only the self-flagellation so far and can I just say: oof.
my own co-skeptic feeling here is that I am deeply sympathetic to what you’re trying to do here and also I am furious with your employer (or maybe just the ecosystem more generally) effectively forcing you to take a bunch of risks with this
@glyph I guess I see the professional side of it this way. I could:
- Quit, which harms everyone involved and solves nothing.
- Say nothing, which harms anyone impacted by dangerous AI.
- Do what I'm doing, and hope to mitigate harm.
The choice is clear, and I'd much rather that I be the one talking about AI security than a myopic booster of the tech.
@mttaggart oh yeah, for sure. and even given risks+externalities accounted for, this type of work (i.e. the investigation in the post itself) needs to get done. and it's not worth much if it doesn't get done by someone with your priors and methodological constraints, which is to say, someone who it will personally hurt. so, (unironically) thank you for your service here
@mttaggart I am still left wondering, per https://blog.glyph.im/2025/08/futzing-fraction.html , if overall you felt like your experience here mitigated my ongoing concern that despite "appearing to work" on small-scale tools like this, the larger risks still mean that it may be a net negative, even just straightforwardly to productivity, when deployed at scale
@glyph I hope I was clear that I still find the technology's harms outweigh its benefits. That would be true even if it produced perfect code every time, and that simply isn't the case.
What I discovered here is that, in limited use cases, the probability of error can decrease significantly, and the real time investment to build a working and secure product diminishes. That said, a lot of things need to go right, and every single process to keep the model on track is prone to failure. Also, context (in the model's sense) really matters. This project was small enough that the requisite context was almost always available to the model, or it was primed with external sources to make it available. Deployed against a much larger codebase, you'd need proportionally more computing resources to do likewise, and again your probability for error increases.
So yeah, still not great. I found a way to make it work, but doing so sucked ass.
I also wasn't kidding about Rust as basically a requirement. I would never in a million years attempt this with Python—which I love, by the way. But even with live LSP linting, the average Python code quality in the model's training corpora is going to affect output, and without the compile-time checks of Rust, I'd be very worried about hidden dragons.
@glyph Oh, one other point. I think the FF model might need a corollary for coding agents. Per-inference calculations don't really make sense in this workflow. Instead it would be more beneficial to think about time/usage per feature or commit or something. And yeah, by those metrics, this was phenomenally faster than what I would have done myself, and thanks to careful scaffolding, solid on the other concerns as well. By the numbers, this application was an unequivocal win. Just, y'know, an icky one.
@mttaggart yeah "inference" is a highly abstract factor in FF, the idea was not to literally describe an individual path through the model and so I may have abused the term. if you're checking per-diff-hunk then the "inference" is the diff hunk and the price should be calculated that way
@mttaggart okay, read the whole thing now. I wouldn't have phrased the "purity" section at the end in quite the same way you did, but it didn't raise my hackles in quite the same way Doctorow did with the same point. "I am tired of running from one corner of technology to the next" resonated hard enough to rattle my teeth
@glyph I'm curious about why you have reservations about the purity section, or, to put it another way, why it apparently did raise your hackles to some extent. @mttaggart
@matt @mttaggart "ideological purity" is a bit of a loaded phrase. While I'm sympathetic to the *sentiment*, I don't think it's true that "purity is a weapon used to divide labor against each other"; the thing that was used to divide labor against each other was racism. Now… purity does come into that, because once a bunch of racists are wandering around your movement, you've got difficult choices to make about how you maintain your coalition.
@matt @mttaggart so, like, you could argue that it's "purity testing" to say that racists are unwelcome in your movement, and that we can't fight amongst "ourselves", except that the opposite of that is to welcome racists into the coalition and now it's just a coalition of racists because the racists are going to chase all the minorities out
@matt @mttaggart there's a very delicate line to walk where you don't "purity test" casual racists by being super aggressive to them, but instead you make it clear that while *they* are welcome, their *racism* isn't welcome, so you can try to rehabilitate the casual rubes while aggressively excluding the heartfelt bigots. it's kind of impossible, which is why I am more sympathetic to this sentiment than to other recent formulations of this problem.
@glyph @matt So, this is probably the most misunderstood part of the piece, and that's on me. I am concerned about ideological purity in this context. Purity as a concept, whether ideological or otherwise (i.e. racial), is what I was calling dangerous. And racism, among many other things, is a derangement that weaponizes purity. This is an instrument capitalists used heavily throughout the latter 19th and early 20th centuries to disrupt labor movements and prevent workers of different races from finding common cause. That's not to say racism wasn't elsewhere or sourced from within all socioeconomic echelons. Even so, the weaponization and exacerbation is relevant. Purity is a way to pit people against each other.
Ideological purity, less dangerous than racism, still prevents finding common cause. Building movements requires working with those who do not agree with you on everything. There are lines we cannot cross to be sure, but we must be vigilant to prevent those lines from excluding all but exact matches to our own beliefs. This is the challenge, and one we are not meeting.
Am I a fascist for having used Claude Code and paying $20 to test it as others have? Some will say I am, or adjacent, because I have used a fascist tool. I find this deeply unhelpful to anyone. And that's my point. If you demonize anyone who touches this technology, your opposition movement is doomed to failure.
What do we want to accomplish? Stopping or stemming the spread of the disease, or building a commune of the untouched?
@mttaggart @matt This all wasn't in the text, but, it was sort of *implied* by the way you were bringing up purity and the references you were gesturing at, which is precisely why I said it *didn't* "raise my hackles" (I probably would have chosen a different phrase if I knew I'd have to repeat it 30 times).
It's extremely difficult to talk about not least because there are so many using "purity testing" *as* a purity test, and as cover for just telling people to shut up and accept odious views
I also had questions about this section. While I read, I wondered how you thought of maintaining a social unit of any kind without censure and expulsion in some cases. And that's not a trick question btw; maybe you havr some ideas.
Or to pose the question to this toot, what makes you think purity itself is the problem, as opposed to a fixation on it? Compare with money, which isn't itself evil, but an over-veneration of it can ruin a person.
@dogfox @glyph @matt To the first:
There are lines we cannot cross to be sure, but we must be vigilant to prevent those lines from excluding all but exact matches to our own beliefs. This is the challenge, and one we are not meeting.
Once again, I am making no case for a lack of boundaries. I am making the case that the boundaries currently in play are counterproductive.
I can not and will not give you a maxim for establishing them. Looking for empiricism there is where you get into weird inversions of moral obligation.
As for obsession versus the thing itself, I see a distinction without a difference. To maintain "purity" as a virtue is to seek it, and without clarity that it is unattainable, you end up with some version of obsession. I would prefer a heuristic of growth and estimation of intent. Not perfect metrics, and deeply subjective. It's something best done in human relations, and not conducive to a few hundred characters of pith.
I get what you're saying a lot better now. Thank you.
Strength of agreement isn't the same as purity. Purity also insists on completeness of agreement with predefined doctrine, if i am reading right.
In that case, I think I agree with you that that is always pathological.
@glyph I struggled with that section a lot, but I think it's demonstrably true that we spend more time tearing each other down than building each other up, and in so doing we give the victory to our adversaries.
@mttaggart my criteria for using llms for code generation at work:
1. Internal only tool
2. Doesn't involve new ideas, just involves implementing well known design patterns
3. Doesn't directly affect anything critical
4. I could do it, and have a detailed idea of how I would implement it
5. I have a good understanding of the necessary tests and edge cases that would verify the generated code
6. I don't have the time available to set aside for implementing it in the next 6 months
@mttaggart something I have been thinking recently, and which chimes a bit with your ultimate conclusion, is that I think of AI users a lot like smokers. 1/2
@mttaggart E.g. a) I think it is generally bad for their health (smoking literally, AI in terms of cognitive skills) in the long term, although some will get away with it. b) lots of people using them will be collectively bad for society as these costs compound. c) an individual using it doesn't make them a bad person (although I would encourage them not to). d) pushing (be that tobacco or AI) it on the other hand does demonstrate some sort of moral failing. 2/2
@smilingdemon That feels mostly correct, and the addictive properties align as well. I am wary of too-simple parallels, but this is close to a line of thinking I'm pursuing.
@mttaggart This fits what I’ve seen at $dayjob recently where talented and experienced people manage to sometimes get good use out of these tools (although with fewer ethical doubts than you describe). I’m mostly worried about problems caused by folks who don’t care or don’t know any better.
Successful use cases will also make it more difficult to argue against LLM use for those of us who don’t want to use them due to ethical reasons. I’m not looking forward to that.
@zaicurity Exactly. For the carelessness, I don't think a tool absolves one of carelessness, but I do think this tool in particular—at least in the way it is implemented now—makes carelessness not only easy, but highly incentivized. Without a dizzying array of external guardrails, harmful mistakes will occur. A bit more friction in the creation might go a long way. But alas, that would not be a popular product.
And yeah, people should have a right to opt out of using these things for ethical reasons, but I do think examining those objections closely is worthwhile, if only to strengthen them.
@mttaggart ah. That’s what the vaguetoot was about.
@winterknight1337 Yep. Bracing for the fallout.
@mttaggart good write up man.
@winterknight1337 Thanks, friend. Most appreciated.
@mttaggart Nice post. Yeah, the tipping point from these coding assistant creating slop, to being usable is a fairly recent thing. I'm not a coder, I'm a security engineer. So I'm used to handing over trust to a tool or SaaS service. Guardrails and layered controls are the key.
I think the skills we're learning right with how to make coding assistant write good code is a marketable skill. I feel like were back in the early days of the cloud learning a new skill that's cutting edge.
Anywho, just wanted to say I enjoyed reading your blog post. I too am struggling with all the complexities and externalities of AI.
@Xavier Thank you for reading, and for struggling!