As a research project, I built a needed tool with Claude Code. I thought it would be a disaster, but it wasn't. I have some complicated feelings about it.
@mttaggart I wanted to write almost the same post, as this
> I let this thing into my brain, and now it is always there. For any new potential project, there is a voice in my head telling me how much easier it would be to let the model do it. How much faster it would be to simply describe the objective in a prompt and let go.
is what I'm strugling with now, but you wrote it perfectly
@mttaggart Debates aside, your thorough post was also a charm to read this gem.
"Do I think LinkedIn is the digital River Styx, where damned souls clamber over each other and claw at the boat passing overhead in the dim hope of salvation from those who have escaped the shambling horde?"
Also happy to read it works with Discourse.
@mttaggart
This might the best, most nuanced writeup I have read.
@EricCarroll Wow, thank you so much! I really appreciate that, and you taking the time to read it.
@mttaggart really excellent write up! Thanks for taking the time to do it
@mttaggart I feel very similar and have gone down a similar path recently. For me I've found it does very little wrong for the apps I am trying to make and I am more eager to press 2 because I enjoy the outcome and testing that outcome more then have control over the code.
I've been calling myself a self loathing AI user but I'm in the process of building 5 or 6 apps and tools I have wanted over the years but never found the effort to do.
@stefan would you be willing to share one example? The kind of app that you would want to build that you need to use this claud. What would be the difference of you doing it vice the ai doing it? Thank you.
@skykiss I just launched this mastodon client that is focused on live blogging. This probably would have taken me weeks or months to get it to the state I have gotten it in in just 2 weeks.
I certainly didn't need calaude to do it technically... but I did need it to lower the effort bar low enough that I actually got code committed to repo.
@stefan Thank you for sharing that experience, and thank you for reading!
@mttaggart "the best way to be pickled is to stay in the brine" - Gerald M. Weinberg
We are all in the brine now. Does the brine evaporate fast enough, or does it thicken?
@mttaggart thanks for writing this!
@mttaggart thanks for sharing your experience using genAI. I too struggle with it. The tech in itself may be empowering some, while at the same time corroding others to gain / keep knowledge, art & craft. The power unbalance, copyright theft, impact on labour (incl data cleaners/trainers) & climate, land & energy use makes it hard to embrace genAI as just a new technology. Currently IMHO using it is not proportional to the negative impacts it has. The Q: would it be possible to fix these?
Thanks for posting this, it was very insightful.
The human factors issues you raise are important. They strike me as similar to issues with autopilot systems in aviation. If the human-in-the-loop is left with nothing more than tediously approving automated functionality then at least two problems emerge: 1) the pilots lose valuable skills that they may need in life-or-death situations, and 2) the pilots are lulled into stupor as they mindlessly monitor a stream of automated behaviors and fail to recognize and react to problems that are rare events.
The aviation industry's response has been to require manual piloting periodically to maintain proficiency, along with perhaps other interventions that I'm unfamiliar with. I'm not sure what this would look like in practice for software development. Maybe requiring developers to manually generate some features? Maybe intentionally inserting mentally-engaging tasks in the human-in-the-loop code review process?
@DaveMWilburn @mttaggart a key difference between aviation and autopilot and software development and slop is that piloting an aircraft is a life-or-death situation and the autopilot has a very constrained task and a real-time deadline, whereas the LLM (being sold to us as some kind of a general task-accomplishment system) just shits out garbage code fast and all i'm doing is creating unreadable software for future people to have to deal with, under an extremely false pressure to perform
@atax1a @DaveMWilburn I have some bad news for you about where that code is going.
Yes, the stakes can indeed be life and death.
@mttaggart @atax1a @DaveMWilburn *cough*MCAS*cough*
@mttaggart @DaveMWilburn well yes we understand that some code is life-or-death stakes but in that case you REALLY want that to be written my someone and not extruded from a probability model. and vibecoding is more likely to introduce reliability problems than solve them. hence the contrast between the open-ended "generative" machine versus the specific-purpose, known-task system.
@DaveMWilburn We're dangerously close to trade guilds and licensure here, which... yeah maybe?
@mttaggart Very well written and considered, thanks!
@davebauerart Thank you! Appreciate you reading!
@mttaggart thanks, Matt. This is a pragmatic approach, very similar to my own style, taste and experience using Claude Code (my best creation using the tool so far has been a rust project with SSR).
I viscerally feel why this was hard to write and hard to post. There are many gifted people on this platform whose ideological opposition is a luxury most of us can’t afford. I find myself wanting to say something more nuanced in the conversations around its use and often I just don’t because I know it won’t be well received.
And like you’ve done here, what I’ve been doing isn’t really vibe coding. When you actually have experience and can read code and write good tests that is something else entirely. But yes, the skill required is the skill it atrophies. Bizarro land, but here we are. Anyway, I appreciate your willingness to share here. These are the kind of conversations that people are avoiding.
I really appreciate all the replies and support on this one. It was hard to write. I do want to call out two points that aren't being discussed, and that I felt pretty strongly about:
- Open source is in trouble, and maintainers need help. Generative code is the help that showed up. What is the expectation here?
- "The tool requires expertise to validate, but its use diminishes expertise and stunts its growth." What does "responsible use" look like that prevents this obvious and pervasive harm?
@mttaggart Responsible use is maintaining your role as the expert at all stages of the project. AI is a tremendous tool but has been made so easy to misuse by making it entertaining and our little code "buddy". You really hit many of what I consider responsible AI practices. Two of my main rules...
Review everything. If you don't understand the AI's code, the subject matter, references, etc don't commit until you do. Don't ever accept auto-commit.
Security is key. I can't stress that enough to new devs. If you didn't tell your agent to make it secure - it's not. Start your security audit.
@johnofrobotz Even if you did tell your agent to make it secure.
@mttaggart hehe so true. That's why the first rule is so important 😁
@mttaggart Responsible use is maintaining your role as the expert at all stages of the project. AI is a tremendous tool but has been made so easy to misuse by making it entertaining and our little code "buddy". You really hit many of what I consider responsible AI practices. Two of my main rules...
Review everything. If you don't understand the AI's code, the subject matter, references, etc don't commit until you do. Don't ever accept auto-commit.
Security is key. I can't stress that enough to new devs. If you didn't tell your agent to make it secure - it's not. Start your security audit.
@mttaggart
Actually, from my personal experience (CC too, which is probably still one of the better AI coding agents even if it has many warts), yes it "works", but before you start to announce your great successes with it, don't forget so ugly details that people like to overlook.
The human aspect. On one side you need an experienced overseer that makes sure that CC stays on track. I've seen CC go on many fascinating off topic excursions.
@mttaggart I really appreciate your insight, I'm going to be asking my boss a lot of these questions.
@mttaggart That's all pretty consistent with my experience. I've spent some time learning the tools and the underlying theory, and from earlier work I also know a bit about chips and the stack between chat and chips. On the one hand there are some major limits to how well the average person can describe the problems the want to solve. On the other I think the people building their thinking on "it's a bubble" miss that the tech is not fundamentally expensive to maintain once built.
@mttaggart
Thanks for the write-up. An interesting if tricky read. I find myself in a similar place. If we don’t become knowledgeable about these tools, able to map out failure modes and boundaries, and identify how they should be delimited and defanged, we effectively cede the floor to evangelists who won’t know or care.
Wow, thanks for taking the time to write out your experience so completely. I think I’d have a similar complex reaction.
Recently developed a large multi threaded python program implementing a complex PID controlled system with lots of realtime IO from sensors and encoders and to actuators along with weak test coverage. Kids interacted with it at a science museum.
1/n
@mttaggart Great write up. I agree that there's a lot of nuance. Those solely using AI as a means for dividing the population into "ethical" and "unethical" groups, while stating "AI is a bubble", are not going to create the change they want to see. Balanced takes that insist on using tools as safely (and, ideally, as ethically) as possible are how you cross the divide.
@mttaggart Great. Sounds so familiar. I just tried repeating a git process which I knew worked because I did it yesterday using Claude as a help. Today Claude was completely off target. I had to correct it on major things many times. Was a bit shocked but....shows that one can learn from using it ....how else could I have corrected it the second time?
@mttaggart That was a thin line to walk. Nicely done. Very even keeled.
I’ve experienced the echos of what you wrote in my day job too. My team has decided it’s a tool, it will be used, and it will never be trusted. But writing SOC PowerShell scripts and KQL alerts isn’t the same as an ERP. I do not envy that team.
As for the place of agentics in a future world, I look to the automobile and the airplane as relevant comparisons. The modern world could not exist without either. I don’t think the next future can exist without neural nets and other agentics. The jury is still out for me on LLMs.
But the automobile and the airplane have wrecked this planet environmentally, though ther inventors knew not that would happen. We know better now. Has our species matured enough to not make the same mistake? Perhaps. But not everyone has. And they turned on the hard sell big time trying to get filthy rich from so called “AI” hoping no one would question it.
Fortunately (?) it’s starting to look like the economics are not sustainable. As evidence I posit that Nadella may not have a job for much longer.
Perhaps we’ll collectively get a moment to pause, reassess and reset. Less hype and a more considered approach to this tech is gravely needed. That worked after the Internet bubble of the 1990s burst. It’ll work here if we can get the more even keeled among us to take charge. May that happen soon.
Thanks for this post. I've been an extreme skeptic of LLMs but seeing increasingly promising results for agentic coding. I'm not sure I'm on board with using it as regular practice but increasingly seeing the need to experiment with it to better understand how it'll impact my job, and provide better informed opinions on it.
@mttaggart I have read only the self-flagellation so far and can I just say: oof.
my own co-skeptic feeling here is that I am deeply sympathetic to what you’re trying to do here and also I am furious with your employer (or maybe just the ecosystem more generally) effectively forcing you to take a bunch of risks with this
@glyph I guess I see the professional side of it this way. I could:
- Quit, which harms everyone involved and solves nothing.
- Say nothing, which harms anyone impacted by dangerous AI.
- Do what I'm doing, and hope to mitigate harm.
The choice is clear, and I'd much rather that I be the one talking about AI security than a myopic booster of the tech.
@mttaggart oh yeah, for sure. and even given risks+externalities accounted for, this type of work (i.e. the investigation in the post itself) needs to get done. and it's not worth much if it doesn't get done by someone with your priors and methodological constraints, which is to say, someone who it will personally hurt. so, (unironically) thank you for your service here
@mttaggart I am still left wondering, per https://blog.glyph.im/2025/08/futzing-fraction.html , if overall you felt like your experience here mitigated my ongoing concern that despite "appearing to work" on small-scale tools like this, the larger risks still mean that it may be a net negative, even just straightforwardly to productivity, when deployed at scale
@glyph I hope I was clear that I still find the technology's harms outweigh its benefits. That would be true even if it produced perfect code every time, and that simply isn't the case.
What I discovered here is that, in limited use cases, the probability of error can decrease significantly, and the real time investment to build a working and secure product diminishes. That said, a lot of things need to go right, and every single process to keep the model on track is prone to failure. Also, context (in the model's sense) really matters. This project was small enough that the requisite context was almost always available to the model, or it was primed with external sources to make it available. Deployed against a much larger codebase, you'd need proportionally more computing resources to do likewise, and again your probability for error increases.
So yeah, still not great. I found a way to make it work, but doing so sucked ass.
I also wasn't kidding about Rust as basically a requirement. I would never in a million years attempt this with Python—which I love, by the way. But even with live LSP linting, the average Python code quality in the model's training corpora is going to affect output, and without the compile-time checks of Rust, I'd be very worried about hidden dragons.
@glyph Oh, one other point. I think the FF model might need a corollary for coding agents. Per-inference calculations don't really make sense in this workflow. Instead it would be more beneficial to think about time/usage per feature or commit or something. And yeah, by those metrics, this was phenomenally faster than what I would have done myself, and thanks to careful scaffolding, solid on the other concerns as well. By the numbers, this application was an unequivocal win. Just, y'know, an icky one.
@mttaggart okay, read the whole thing now. I wouldn't have phrased the "purity" section at the end in quite the same way you did, but it didn't raise my hackles in quite the same way Doctorow did with the same point. "I am tired of running from one corner of technology to the next" resonated hard enough to rattle my teeth
@glyph I struggled with that section a lot, but I think it's demonstrably true that we spend more time tearing each other down than building each other up, and in so doing we give the victory to our adversaries.
@mttaggart
Just finished reading it. Thanks for giving me cognitive dissonance on a Monday!
Most of the questions I would have asked, you answered. It was thought provoking, and so you get a long post of thoughts, sorry in advance.
It feels like you really discounted how much your existing skills and expertise helped the process. Though I guess to mention them repeatedly would feel like self aggrandizing or something similar.
I’m glad it worked, I’m not glad that it worked. I hate that it exists, I don’t hate that it exists. I keep thinking back to the car analogy. I know that fossil fuels are destroying the planet, I still have to drive to work (I live in a place where cycling isn’t viable, and transit is almost nonexistent).
Using AI to code. It is a different set of skills (that humans are inherently not great at), it is change, and you have to pay the bills in a world where that change is already happening whether you like it or not. We didn’t start the fire. You didn’t make the systems that we live in, and living off the grid does not solve any problems (at all). We live in a society. I don’t enjoy quoting the Joker, and I think it is overused by people who use it as a cop out, an absolution from responsibility, but we do live in a society and we should do so with a level of responsibility.
Recognizing the harms and still having to use the things, it hurts. Ignorance truly is bliss. I do think we owe it to ourselves to suffer with that recognition and use it as a driving force, to drive the efficient and practical use of harmful tech, to reduce the damage inflicted. Or maybe that’s just my catholic upbringing.
You did a good job.
@hotsoup Thank you, friend, much appreciated on all points. On the expertise point, I felt like I spoke to the importance of expertise in the process, and that's where mine came in: as a reviewer and security knower. And I guess as a Rust fan, making that initial choice. If there was some special property or characteristic I brought to the equation, I'm still unclear what it was.
@mttaggart @hotsoup To me the security knower part was the big one.
You mention, correct me if I'm wrong, that a few times you spotted security flaws, and in some of those (but not all) the LLM could fix them once they were pointed out. How would that have gone without your skills?
(1/2 - personal experience in next post, feel free to skip it if it doesn't interest you)
@mttaggart @hotsoup (2/2)
I've been asked to give comments to someone using an LLM to build a simple scheduling tool (website where they can schedule appointments). The basic functionality works fine, but they haven't thought about access control, permissions, backups, reliability, ... And how many other things I haven't thought of myself? They can't know what they don't know, you know??? And the LLM won't volunteer anything, it doesn't know anything (obviously).
@skylark13 @hotsoup So actually the opposite was true. Once I scaffolded the security audit, the model found much more than I would have straight way. I mean there were definitely things I knew had to be implemented, but the model's list met and exceeded my own. In a couple cases I identified attack scenarios the model was not building against, but largely the "security knower" component of my experience benefited by way of structuring the process to include a separate assessment, rather than assuming the model was writing secure code.
@mttaggart @hotsoup Yeah, but had you not planned that kind of assessment, would it have volunteered the need for one? That's what I mean, for a "tool" that people with little to no coding experience think they can use to build stuff that "works", they wouldn't even know they need something like that.
Once it has the information that you're auditing the security aspects, pattern matching on that will lead it to lots of relevant information. That makes sense.
@mttaggart my criteria for using llms for code generation at work:
1. Internal only tool
2. Doesn't involve new ideas, just involves implementing well known design patterns
3. Doesn't directly affect anything critical
4. I could do it, and have a detailed idea of how I would implement it
5. I have a good understanding of the necessary tests and edge cases that would verify the generated code
6. I don't have the time available to set aside for implementing it in the next 6 months
“I nevertheless recognize the societal and environmental harms posed by these tools. I want them to unexist. I even recognize the cognitive hazards to which I expose myself in their use (more on that later). I do not want to use them. And yet, I must understand them. If that damns me in your eyes, so be it.”
My friend, I think many of us are in exactly the same place. I hate AI (1) with every fiber of my being, but in order to secure it, I must understand it, and the best way for me to do that is to use it. 😢
(1) Generative AI, Agentic AI, etc. as opposed to traditional ML/AI
@mttaggart something I have been thinking recently, and which chimes a bit with your ultimate conclusion, is that I think of AI users a lot like smokers. 1/2
@mttaggart E.g. a) I think it is generally bad for their health (smoking literally, AI in terms of cognitive skills) in the long term, although some will get away with it. b) lots of people using them will be collectively bad for society as these costs compound. c) an individual using it doesn't make them a bad person (although I would encourage them not to). d) pushing (be that tobacco or AI) it on the other hand does demonstrate some sort of moral failing. 2/2
@smilingdemon That feels mostly correct, and the addictive properties align as well. I am wary of too-simple parallels, but this is close to a line of thinking I'm pursuing.
@mttaggart I can't speak to others, but I liked the article too.
Here is some fallout...
@jackryder Thank you! I really appreciate you reading.
@mttaggart tbf, I've been kind of stalking your rust stuff. 😆
@mttaggart This fits what I’ve seen at $dayjob recently where talented and experienced people manage to sometimes get good use out of these tools (although with fewer ethical doubts than you describe). I’m mostly worried about problems caused by folks who don’t care or don’t know any better.
Successful use cases will also make it more difficult to argue against LLM use for those of us who don’t want to use them due to ethical reasons. I’m not looking forward to that.
@zaicurity Exactly. For the carelessness, I don't think a tool absolves one of carelessness, but I do think this tool in particular—at least in the way it is implemented now—makes carelessness not only easy, but highly incentivized. Without a dizzying array of external guardrails, harmful mistakes will occur. A bit more friction in the creation might go a long way. But alas, that would not be a popular product.
And yeah, people should have a right to opt out of using these things for ethical reasons, but I do think examining those objections closely is worthwhile, if only to strengthen them.
@mttaggart ah. That’s what the vaguetoot was about.
@winterknight1337 Yep. Bracing for the fallout.
@mttaggart good write up man.
@winterknight1337 Thanks, friend. Most appreciated.
@mttaggart Nice post. Yeah, the tipping point from these coding assistant creating slop, to being usable is a fairly recent thing. I'm not a coder, I'm a security engineer. So I'm used to handing over trust to a tool or SaaS service. Guardrails and layered controls are the key.
I think the skills we're learning right with how to make coding assistant write good code is a marketable skill. I feel like were back in the early days of the cloud learning a new skill that's cutting edge.
Anywho, just wanted to say I enjoyed reading your blog post. I too am struggling with all the complexities and externalities of AI.
@Xavier Thank you for reading, and for struggling!