6 Practices that turned AI from prototyper to workhorse (106 PRs in 14 days)

Turning AI from Prototyper to Workhorse: 6 Key Practices

As I've been following the development of AI tools, I've often wondered what it takes to turn these technologies from mere prototypers to reliable workhorses. A recent experiment caught my attention, where a team achieved remarkable results by implementing six key practices that enabled their AI system to produce high-quality code at an unprecedented pace. In just 14 days, they submitted an impressive 106 pull requests, with a significant improvement in code quality.

Why this matters

The ability to leverage AI in software development has the potential to revolutionize the way we work. By automating repetitive tasks and augmenting human capabilities, AI can help developers focus on higher-level tasks, leading to faster development cycles and better overall quality. However, to achieve this, we need to move beyond using AI as a mere prototyper and instead, integrate it into our development workflows as a reliable workhorse.

The 6 Key Practices

So, what are these six practices that made all the difference? Let's dive in:

  • Specs and plans are source code: By storing specs and plans in the same git repository as the source code, the team ensured that all relevant information was easily accessible and version-controlled. This approach allowed new agents to quickly get up to speed by reading the arch.md file and their specific spec.
  • Three models review every phase: The team used three different models - Claude, Gemini, and Codex - to review each phase of the development process. This approach caught a significant number of bugs, with no single model finding more than 55% of issues. In fact, the combination of all three models caught 20 bugs before shipping, including a severe security issue that Claude missed.
  • Enforce the process, don't suggest it: By using a state machine to enforce a strict development process - Spec → Plan → Implement → Review → PR - the team ensured that the AI system followed a predictable and reliable workflow. This approach prevented the AI from skipping steps or taking shortcuts.
  • Annotate, don't edit: Rather than having the AI system edit code directly, the team focused on writing specs and reviews that guided the code generation process. This approach allowed for more precise control over the output and reduced the risk of errors.
  • Agents coordinate agents: By using an architect agent to spawn and coordinate builder agents, the team created a hierarchical system that allowed for efficient and scalable development. The architect agent directed the builder agents, which worked in isolated git worktrees and communicated with each other asynchronously.
  • Manage the whole lifecycle: The team recognized that AI tools often focus on a limited aspect of the development process, such as code generation. By using AI to manage the entire lifecycle - from spec to PR and beyond - the team was able to automate a much larger portion of the development process, leading to significant productivity gains.

Results and Trade-offs

The results of this experiment were impressive, with a single engineer able to produce what a team of 3-4 would usually do. The code quality was also 1.2 points better on a 10-point scale compared to code generated by Claude alone. However, this approach did come with some trade-offs, including a longer development cycle and increased token usage, which resulted in a cost of $1.60 per PR.

How to install and try it out

The team has open-sourced their code, which is available on GitHub at https://github.com/cluesmith/codev. To get started, you can follow these steps:

git clone https://github.com/cluesmith/codev.git
cd codev
# Follow the instructions in the README file to set up and configure the system

For more details and raw results, you can visit the team's blog post at https://cluesmith.com/blog/a-tour-of-codevos/.

Who is this for?

This approach is ideal for teams and individuals who want to leverage AI to improve their development workflows and increase productivity. If you're looking for a way to automate repetitive tasks, improve code quality, and reduce development time, this approach is definitely worth exploring. However, keep in mind that it may require significant upfront investment in setting up and configuring the system.

What do you think about this approach to using AI in software development? Have you tried anything similar in your own projects? I'd love to hear your thoughts and experiences in the comments below.

🚀 Global, automated cloud infrastructure

Oracle Cloud is hard to get. I recommend Vultr for instant setup.

Get $100 in free server credit on Vultr →