Matt Asay
Contributing Writer

The tough task of making AI code production-ready

With AI introducing errors and security vulnerabilities as it writes code, humans still have a vital role in testing and evaluation. New AI-based review software hopes to help solve the problem.

shutterstock 1869308242 team putting together a chain of gears teamwork coordination collaboration
Credit: Studio Romantic / Shutterstock

Developers are increasingly turning to large language models (LLMs) to crank out code at astonishing volumes. As much as 41% of all code is now written by machines, totaling 256 billion lines in 2024 alone. Even Google, which employs some of the best and brightest developers in the industry, now relies on AI to write upwards of 25% of its code. If this sounds like the promised land of software developmentโ€”more code, faster, while developers sip Mai Tais on the beachโ€”the truth is not so rosy. After all, anyone whoโ€™s pushed real software to production knows that getting code to compile, pass tests, and run reliably in the wild is a far tougher slog than generating the code in the first place. As Iโ€™ve noted, โ€œLLM-generated code isnโ€™t magically bug-free or self-maintaining.โ€ Quite the opposite.

In fact, faster code creation may actually slow code readiness due to an increased need for cleaning, debugging, and hardening that code for production. As NativeLink CEO Marcus Eagan puts it, given that โ€œagents have minds of their own,โ€ it becomes critical to be able to identify and contain โ€œthe behavioral drift between test environments and production environments.โ€ Indeed, the gap between code generation and production deployment is the elephant in the AI-dev room, prompting the question: Who will do the hard work of compiling, testing, and polishing all this new AI-written code?

People are people

Hereโ€™s the uncomfortable truth: As much as we may want robots to do all our work for us, humans still own every hard, critical step that happens after the code is written. AI-generated code often uses incorrect libraries, violates build constraints, and overlooks subtle logic errors. According to a recent survey of 500 engineering leaders, AI models have a knack for introducing subtle bugs and vulnerabilities alongside the boilerplate they generate: 59% reported that AI-generated code introduced errors at least half the time, and 67% said they now spend more time debugging AI-written code than their own. Additionally, 68% of those surveyed said they now spend extra effort to fix security vulnerabilities injected by AI suggestions.

Catch that? Rather than eliminating developer work, AI often shifts the burden further downstream into QA and operations.

That downstream effort is potentially harder with AI, as well, because instead of correcting their own mistakes, developers now need to tackle unfamiliar code. One developer spent 27 days letting an AI agent handle all code and fixes (1,700+ commits with almost no human edits). He found that simple bugs can become hour-long exercises in carefully prompting the AI to fix its own mistakes. โ€œWhat would be a 5-minute fix for a human often turned into hours of guiding the AI,โ€ he reported, thanks to the AIโ€™s tendency to go off track or introduce new issues while trying to solve existing ones.

In other words, instead of replacing humans, AI is creating new roles and workflows for people. Developers increasingly serve as supervisors, mentors, and validators, reviewing AI-generated code, correcting its mistakes, and ensuring it integrates smoothly into existing systems. In short, the developerโ€™s job isnโ€™t going awayโ€”itโ€™s evolving, as Iโ€™ve said.

Using machines to fix machines

Companies and open source projects are emerging to address these gaps, automating code validation and testing to complement human oversight. Not surprisingly, many use AI tools to tackle AI deficiencies. A few examples:

  • AI-enhanced quality scanning: Tools like SonarQube and Snyk now use AI to detect bugs, security issues, and vulnerabilities specifically in AI-generated code. Sonar, for instance, introduced an AI-powered tool set to flag and even automatically fix common coding issues before they merge into your project.
  • Automated test generation: Diffblue Cover leverages AI to generate robust unit tests for Java code. This speeds up the testing phase dramatically (up to 250 times faster), reducing a major bottleneck for human developers. NativeLink, an open source build cache and remote execution server, helps companies streamline their build processes and reduces build times from days to hours. These kinds of tools become critical to stay ahead of AI-generated code.
  • AI-assisted code reviews: GitHub Copilot is previewing automated pull request reviews, flagging potential bugs and security flaws before human reviewers even look at the code. Amazonโ€™s CodeGuru and Sourcegraph Cody similarly offer AI-driven debugging and code analysis.
  • Agentic pipelines: Projects like Zencoder are pioneering multi-agent AI pipelines where specialized bots collaboratively produce, test, refine, and review code, significantly boosting the odds itโ€™s production-ready from the outset.
  • Secure runtime testing environments: E2B and other platforms provide secure sandbox environments that let AI-written code execute in isolation, automatically checking for compile-time or runtime issues before code reaches human hands.

Getting the most from AI

Even with these advancements, skilled developers remain essential to good software. There are good (and bad) ways to mix human ingenuity with the brute force of machine-written code. What can development teams do today to manage the deluge of AI-generated code and ensure itโ€™s production-ready? Iโ€™m glad you asked.

First, treat AI output as a first draft, not final code. Rather than taking AI-generated code as an unquestioned gift, it pays to cultivate a culture of skepticism. Just as youโ€™d review a junior developerโ€™s work, so too should you mandate reviews for AI-generated code. Have senior engineers or code owners give it a thorough look and never, ever deploy AI-written code without reading and testing it.

Second, integrate quality checks into your pipeline. Static analysis, linting, and security scanning should be non-negotiable parts of continuous integration whenever AI code is introduced. Many continuous integration/continuous delivery (CI/CD) tools (Jenkins, GitHub Actions, GitLab CI, etc.) can run suites like SonarQube, ESLint, Bandit, or Snyk on each commit. Enable those checks for all code, especially AI-generated snippets, to catch bugs early. As Sonarโ€™s motto suggests, ensure โ€œall code, regardless of origin, meets quality and security standardsโ€ before it merges.

Third, as covered above, you should start leveraging AI for testing, not just coding. AI can help write unit tests or even generate test data. For example, GitHub Copilot can assist in drafting unit tests for functions, and dedicated tools like Diffblue Cover can bulk-generate tests for legacy code. This saves time and also forces AI-generated code to prove itself. Adopt a mindset of โ€œtrust, but verify.โ€ If the AI writes a function, have it also supply a handful of test cases, then run them automatically.

Fourth, if your organization hasnโ€™t already, create a policy on how developers should (and shouldnโ€™t) use AI coding tools. Define acceptable use cases (boilerplate generation, examples) and forbidden ones (handling sensitive logic or secrets). Encourage developers to label or comment AI-generated code in pull requests. This helps reviewers know where extra scrutiny is needed. Also, consider licensing implications; make sure any AI-derived code complies with your code licensing policies to avoid legal headaches.

Fifth, as Iโ€™ve written, using AI effectively requires more, not less developer skill in certain areas. As such, you need to upskill your team on reading and debugging code. Teach them secure coding practices so they can spot when the AI introduces a SQL injection or buffer overflow. Encourage a testing mindset. Developers should think in terms of writing the test before trusting the function that Copilot gave them. In short, focus on developing โ€œAI literacyโ€ among your programmers; they need to understand both the capabilities and the blind spots of these tools.

Sixth, and perhaps most obviously, get started by piloting new AI-augmented tools. Perhaps it will feel most natural to start by enabling Copilotโ€™s automatic pull request review in a few repositories to see how it augments your human code reviews. Or maybe try an open source tool like E2B in a sandbox project to let an AI agent execute and test its own code. The goal is to find what actually reduces your teamโ€™s burden versus what adds more noise.

Looking ahead, the industry may evolve toward greater AI automation in the code validation process. Multi-agent AI systems that autonomously handle compiling, testing, debugging, and security scanning might become commonplace. AI could increasingly manage its own quality assurance, freeing developers to focus more on strategic oversight rather than tactical corrections. For now, however, people matterโ€”and, arguably, always will. Tomorrowโ€™s developers might write fewer lines of direct code but will spend more time defining specifications, constraints, and acceptance criteria that AI-driven systems must follow.

Matt Asay

Matt Asay runs developer marketing at Oracle. Previously Asay ran developer relations at MongoDB, and before that he was a Principal at Amazon Web Services and Head of Developer Ecosystem for Adobe. Prior to Adobe, Asay held a range of roles at open source companies: VP of business development, marketing, and community at MongoDB; VP of business development at real-time analytics company Nodeable (acquired by Appcelerator); VP of business development and interim CEO at mobile HTML5 start-up Strobe (acquired by Facebook); COO at Canonical, the Ubuntu Linux company; and head of the Americas at Alfresco, a content management startup. Asay is an emeritus board member of the Open Source Initiative (OSI) and holds a JD from Stanford, where he focused on open source and other IP licensing issues. The views expressed in Mattโ€™s posts are Mattโ€™s, and donโ€™t represent the views of his employer.

More from this author