@glyph I hope I was clear that I still find the technology's harms outweigh its benefits. That would be true even if it produced perfect code every time, and that simply isn't the case.
What I discovered here is that, in limited use cases, the probability of error can decrease significantly, and the real time investment to build a working and secure product diminishes. That said, a lot of things need to go right, and every single process to keep the model on track is prone to failure. Also, context (in the model's sense) really matters. This project was small enough that the requisite context was almost always available to the model, or it was primed with external sources to make it available. Deployed against a much larger codebase, you'd need proportionally more computing resources to do likewise, and again your probability for error increases.
So yeah, still not great. I found a way to make it work, but doing so sucked ass.
I also wasn't kidding about Rust as basically a requirement. I would never in a million years attempt this with Python—which I love, by the way. But even with live LSP linting, the average Python code quality in the model's training corpora is going to affect output, and without the compile-time checks of Rust, I'd be very worried about hidden dragons.