As a research project, I built a needed tool with Claude Code. I thought it would be a disaster, but it wasn't. I have some complicated feelings about it.
Post
Wow, thanks for taking the time to write out your experience so completely. I think I’d have a similar complex reaction.
Recently developed a large multi threaded python program implementing a complex PID controlled system with lots of realtime IO from sensors and encoders and to actuators along with weak test coverage. Kids interacted with it at a science museum.
1/n
@mttaggart Great write up. I agree that there's a lot of nuance. Those solely using AI as a means for dividing the population into "ethical" and "unethical" groups, while stating "AI is a bubble", are not going to create the change they want to see. Balanced takes that insist on using tools as safely (and, ideally, as ethically) as possible are how you cross the divide.
@mttaggart Great. Sounds so familiar. I just tried repeating a git process which I knew worked because I did it yesterday using Claude as a help. Today Claude was completely off target. I had to correct it on major things many times. Was a bit shocked but....shows that one can learn from using it ....how else could I have corrected it the second time?
Thanks for this post. I've been an extreme skeptic of LLMs but seeing increasingly promising results for agentic coding. I'm not sure I'm on board with using it as regular practice but increasingly seeing the need to experiment with it to better understand how it'll impact my job, and provide better informed opinions on it.
@mttaggart I have read only the self-flagellation so far and can I just say: oof.
my own co-skeptic feeling here is that I am deeply sympathetic to what you’re trying to do here and also I am furious with your employer (or maybe just the ecosystem more generally) effectively forcing you to take a bunch of risks with this
@glyph I guess I see the professional side of it this way. I could:
- Quit, which harms everyone involved and solves nothing.
- Say nothing, which harms anyone impacted by dangerous AI.
- Do what I'm doing, and hope to mitigate harm.
The choice is clear, and I'd much rather that I be the one talking about AI security than a myopic booster of the tech.
@mttaggart oh yeah, for sure. and even given risks+externalities accounted for, this type of work (i.e. the investigation in the post itself) needs to get done. and it's not worth much if it doesn't get done by someone with your priors and methodological constraints, which is to say, someone who it will personally hurt. so, (unironically) thank you for your service here
@mttaggart I am still left wondering, per https://blog.glyph.im/2025/08/futzing-fraction.html , if overall you felt like your experience here mitigated my ongoing concern that despite "appearing to work" on small-scale tools like this, the larger risks still mean that it may be a net negative, even just straightforwardly to productivity, when deployed at scale
@glyph I hope I was clear that I still find the technology's harms outweigh its benefits. That would be true even if it produced perfect code every time, and that simply isn't the case.
What I discovered here is that, in limited use cases, the probability of error can decrease significantly, and the real time investment to build a working and secure product diminishes. That said, a lot of things need to go right, and every single process to keep the model on track is prone to failure. Also, context (in the model's sense) really matters. This project was small enough that the requisite context was almost always available to the model, or it was primed with external sources to make it available. Deployed against a much larger codebase, you'd need proportionally more computing resources to do likewise, and again your probability for error increases.
So yeah, still not great. I found a way to make it work, but doing so sucked ass.
I also wasn't kidding about Rust as basically a requirement. I would never in a million years attempt this with Python—which I love, by the way. But even with live LSP linting, the average Python code quality in the model's training corpora is going to affect output, and without the compile-time checks of Rust, I'd be very worried about hidden dragons.
@glyph Oh, one other point. I think the FF model might need a corollary for coding agents. Per-inference calculations don't really make sense in this workflow. Instead it would be more beneficial to think about time/usage per feature or commit or something. And yeah, by those metrics, this was phenomenally faster than what I would have done myself, and thanks to careful scaffolding, solid on the other concerns as well. By the numbers, this application was an unequivocal win. Just, y'know, an icky one.
@mttaggart okay, read the whole thing now. I wouldn't have phrased the "purity" section at the end in quite the same way you did, but it didn't raise my hackles in quite the same way Doctorow did with the same point. "I am tired of running from one corner of technology to the next" resonated hard enough to rattle my teeth
@glyph I struggled with that section a lot, but I think it's demonstrably true that we spend more time tearing each other down than building each other up, and in so doing we give the victory to our adversaries.
@mttaggart my criteria for using llms for code generation at work:
1. Internal only tool
2. Doesn't involve new ideas, just involves implementing well known design patterns
3. Doesn't directly affect anything critical
4. I could do it, and have a detailed idea of how I would implement it
5. I have a good understanding of the necessary tests and edge cases that would verify the generated code
6. I don't have the time available to set aside for implementing it in the next 6 months
@mttaggart something I have been thinking recently, and which chimes a bit with your ultimate conclusion, is that I think of AI users a lot like smokers. 1/2
@mttaggart E.g. a) I think it is generally bad for their health (smoking literally, AI in terms of cognitive skills) in the long term, although some will get away with it. b) lots of people using them will be collectively bad for society as these costs compound. c) an individual using it doesn't make them a bad person (although I would encourage them not to). d) pushing (be that tobacco or AI) it on the other hand does demonstrate some sort of moral failing. 2/2
@smilingdemon That feels mostly correct, and the addictive properties align as well. I am wary of too-simple parallels, but this is close to a line of thinking I'm pursuing.
@mttaggart This fits what I’ve seen at $dayjob recently where talented and experienced people manage to sometimes get good use out of these tools (although with fewer ethical doubts than you describe). I’m mostly worried about problems caused by folks who don’t care or don’t know any better.
Successful use cases will also make it more difficult to argue against LLM use for those of us who don’t want to use them due to ethical reasons. I’m not looking forward to that.
@zaicurity Exactly. For the carelessness, I don't think a tool absolves one of carelessness, but I do think this tool in particular—at least in the way it is implemented now—makes carelessness not only easy, but highly incentivized. Without a dizzying array of external guardrails, harmful mistakes will occur. A bit more friction in the creation might go a long way. But alas, that would not be a popular product.
And yeah, people should have a right to opt out of using these things for ethical reasons, but I do think examining those objections closely is worthwhile, if only to strengthen them.
@mttaggart ah. That’s what the vaguetoot was about.
@winterknight1337 Yep. Bracing for the fallout.
@mttaggart good write up man.
@winterknight1337 Thanks, friend. Most appreciated.
@mttaggart Nice post. Yeah, the tipping point from these coding assistant creating slop, to being usable is a fairly recent thing. I'm not a coder, I'm a security engineer. So I'm used to handing over trust to a tool or SaaS service. Guardrails and layered controls are the key.
I think the skills we're learning right with how to make coding assistant write good code is a marketable skill. I feel like were back in the early days of the cloud learning a new skill that's cutting edge.
Anywho, just wanted to say I enjoyed reading your blog post. I too am struggling with all the complexities and externalities of AI.
@Xavier Thank you for reading, and for struggling!