Josh Fruhlinger
Contributing Writer

How to keep AI hallucinations out of your code

feature
Feb 17, 202510 mins

The effectiveness of AI coding assistants depends largely on the human in the driverโ€™s seat. Here are eight ways to keep AI hallucinations from infecting your code.

A humanoid robot uses AI-assisted coding tools on a virtual touch screen. AI-generated code, AI code tools.
Credit: BOY ANTHONY / Shutterstock

It turns out androids do dream, and their dreams are often strange. In the early days of generative AI, we got human hands with eight fingers and recipes for making pizza sauce from glue. Now, developers working with AI-assisted coding tools are also finding AI hallucinations in their code.

โ€œAI hallucinations in coding tools occur due to the probabilistic nature of AI models, which generate outputs based on statistical likelihoods rather than deterministic logic,โ€ explains Mithilesh Ramaswamy, a senior engineer at Microsoft. And just like that glue pizza recipe, sometimes these hallucinations escape containment.

AI coding assistants are increasingly omnipresent, and usage is growing, with 62% of respondents saying they were using AI coding tools in the May 2024 Stack Overflow developer survey. So how can you prevent AI hallucinations from ruining your code? We asked developers and tech leaders experienced with using AI coding assistants for their tips.

How AI hallucinations infect code

Microsoftโ€™s Ramaswamy, who works every day with AI tools, keeps a list of the sorts of AI hallucinations he encounters: โ€œGenerated code that doesnโ€™t compile; code that is overly convoluted or inefficient; and functions or algorithms that contradict themselves or produce ambiguous behavior.โ€ Additionally, he says, โ€œAI hallucinations sometimes just make up nonexistent functionsโ€ and โ€œgenerated code may reference documentation, but the described behavior doesnโ€™t match what the code does.โ€

Komninos Chatzipapas, founder of HeraHaven.ai, gives an example of a specific problem of this type. โ€œOn our JavaScript back-end, we had a function to deduct credit from a user based on their ID,โ€ he says. โ€œThe function expected an object containing an ID value as its parameter, but the coding assistant just put the ID as the parameter.โ€ He notes that in loosely typed languages like JavaScript, problems like these are more likely to slip past language parsers. The error Chatzipapas encountered โ€œcrashed our staging environment, but was fortunately caught before pushed to production.โ€

How does code like this slip into production? Monojit Banerjee, a lead in the AI platform organization at Salesforce, describes the code output by many AI assistants as โ€œplausible but incorrect or non-functional.โ€ Brett Smith, distinguished software developer at SAS, notes that less experienced developers are especially likely to be misled by the AI toolโ€™s confidence, โ€œleading to flawed code.โ€

The consequences of flawed AI code can be significant. Security holes and compliance issues are top of mind for many software companies, but some issues are less immediately obvious. Faulty AI-generated code adds to overall technical debt, and it can detract from the efficiency code assistants are intended to boost. โ€œHallucinated code often leads to inefficient designs or hacks that require rework, increasing long-term maintenance costs,โ€ says Microsoftโ€™s Ramaswamy.

Fortunately, the developers we spoke with had plenty of advice about how to ensure AI-generated code is correct and secure. There were two categories of tips: how to minimize the chance of code hallucinations, and how to catch hallucinations after the fact.

Reducing AI hallucinations in your code

The ideal would of course be to never encounter AI hallucinations at all. While thatโ€™s unlikely (not with the current state of the art), the following precautions can help reduce issues in AI-generated code.

Write clear and detailed prompts

The adage โ€œgarbage in, garbage outโ€ is as old as computer scienceโ€”and it applies to LLMs, as well, especially when youโ€™re generating code by prompting rather than using an autocomplete assistant. Many of the experts we spoke to urged developers to get their prompt engineering game on point. โ€œItโ€™s best to ask bounded questions and critically examine the results,โ€ says Andrew Sellers, head of technology strategy at Confluent. โ€œUsage data from these tools suggest that outputs tend to be more accurate for questions with a smaller scope, and most developers will be better at catching errors by frequently examining small blocks of code.โ€

Ask for references

LLMs like ChatGPT are notorious for making up citations in school papers and legal briefs. But code-specific tools have made great strides in that area. โ€œMany models are supporting citation features,โ€ says Salesforceโ€™s Banerjee. โ€œA developer should ask for citations or API reference wherever possible to minimize hallucinations.โ€

Make sure your AI tool has trained on the latest software

Most genAI chatbots canโ€™t tell you who won your home teamโ€™s baseball game last night, and they have limitations keeping up with software tools and updates as well. โ€œOne of the ways you can predict whether a tool will hallucinate or provide biased outputs is by checking its knowledge cut-offs,โ€ says Stoyan Mitov, CEO of Dreamix and co-founder of the Citizens app. โ€œIf you plan on using the latest libraries or frameworks that the tool doesnโ€™t know about, the chances that the output will be flawed are high.โ€

Train your model to do things your way

Travis Rehl, CTO at Innovative Solutions, says what generative AI tools need to work well is โ€œcontext, context, context.โ€ You need to provide good examples of what you want and how you want it done, he says. โ€œYou should tell the LLM to maintain a certain pattern, or remind it to use a consistent method so it doesnโ€™t create something new or different.โ€ If you fail to do so, you can run into a subtle type of hallucination that injects anti-patterns into your code. โ€œMaybe you always make an API call a particular way, but the LLM chooses a different method,โ€ he says. โ€œWhile technically correct, it did not follow your pattern and thus deviated from what the norm needs to be.โ€

A concept that takes this idea to its logical conclusion is retrieval augmented generation, or RAG, in which the model uses one or more designated โ€œsources of truthโ€ that contain code either specific to the user or at least vetted by them. โ€œGrounding compares the AIโ€™s output to reliable data sources, reducing the likelihood of generating false information,โ€ says Mitov. RAG is โ€œone of the most effective grounding methods,โ€ he says. โ€œIt improves LLM outputs by utilizing data from external sources, internal codebases, or API references in real time.โ€

Many available coding assistants already integrate RAG featuresโ€”the one in Cursor is called @codebase, for instance. If you want to create your own internal codebase for an LLM to draw from, you would need to store it in a vector database; Banerjee points to Chroma as one of the most popular options.

Catching AI hallucinations in your code

Even with all of these protective measures, AI coding assistants will sometimes make mistakes. The good news is that hallucinations are often easier to catch in code than in applications where the LLM is writing plain text. The difference is that code is executable and can be tested. โ€œCoding is not subjective,โ€ as Innovative Solutionsโ€™ Rehl points out. โ€œCode simply wonโ€™t work when itโ€™s wrong.โ€ Experts offered a few ways to spot mistakes in generated code.

Use AI to evaluate AI-generated code

Believe it or not, AI assistants can evaluate AI-generated code for hallucinationsโ€”often to good effect. ย For instance, Daniel Lynch, CEO of Empathy First Media, suggests โ€œwriting supporting documentation on the code so that you can have the AI evaluate the provided code in a new instance and determine if it satisfies the requirements of the intended use case.โ€

HeraHavenโ€™s Chatzipapas suggests that AI tools can do far more in judging output from other tools. โ€œScaling test-time compute deals with the issue where, for the same input, an LLM can generate a variety of responses, all with different levels of quality,โ€ he explains. โ€œThere are many ways to make it work but the simplest one is to query the LLM multiple times and then use a smaller โ€˜verifierโ€™ AI model to pick which answer is better to present to the end user. There are also more sophisticated ways where you can cluster the different answers you get and pick one from the largest cluster (since that one has received more implied โ€˜votesโ€™).โ€

Maintain human involvement and expertise

Even with machine assistance, most people we spoke to saw human beings as the last line of defense against AI hallucination. Most saw human involvement remaining crucial to the coding process for the foreseeable future. โ€œAlways use AI as a guide, not a source of truth,โ€ says Microsoftโ€™s Ramaswamy. โ€œTreat AI-generated code as a suggestion, not a replacement for human expertise.โ€

That expertise shouldnโ€™t just be around programming generally; you should stay intimately acquainted with the code that powers your applications. โ€œIt can sometimes be hard to spot a hallucination if youโ€™re unfamiliar with a codebase,โ€ says Rehl. Having hands-on experience in the codebase is critical to spotting deviations in specific methods or the overall code pattern, for example.

Test and review your code

Fortunately, the tools and techniques most well-run shops use to catch human errors, from IDE tools to unit tests, can also catch AI hallucinations. โ€œTeams should continue doing pull requests and code reviews just as if the code were written by humans,โ€ says Confluentโ€™s Sellers. โ€œItโ€™s tempting for developers to use these tools to automate more in achieving continuous delivery. While laudable, itโ€™s incredibly important for developers to prioritize QA controls when increasing automation.โ€

โ€œI cannot stress enough the need to use good linting tools and SAST scanners throughout the development cycle,โ€ says SASโ€™s Smith. โ€œIDE plugins, integration into the CI, and pull requests are the bare minimum to ensure hallucinations do not make it to production.โ€

โ€œA mature devops pipeline is essential, where each line of code will be unit tested during the development lifecycle,โ€ adds Salesforceโ€™s Banerjee. โ€œThe pipeline will only promote the code to staging and production after tests and builds are passed. Moreover, continuous deployment is essential to roll back code as soon as possible to avoid a long tail of any outage.โ€

Highlight AI-generated code

Devansh Agarwal, a machine learning engineer at Amazon Web Services, recommends a technique that he calls โ€œa little experiment of mineโ€: Use the code review UI to call out parts of the codebase that are AI-generated. โ€œI often see hundreds of lines of unit test code being approved without any comments from the reviewer,โ€ he says, โ€œand these unit tests are one of the use cases where I and others often use AI. Once you mark that these are AI-generated, then people take more time in reviewing them.โ€

This doesnโ€™t just help catch hallucinations, he says. โ€œItโ€™s a great learning opportunity for everyone in the team. Sometimes it does an amazing job and we as humans want to replicate it!โ€

Keep both hands on the wheel

Generative AI is ultimately a tool, nothing more and nothing less. Like all other tools, it has quirks. While using AI changes some aspects of programming and makes individual programmers more productive, its tendency to hallucinate means that human developers must remain diligent in the driverโ€™s seat. โ€œIโ€™m finding that coding will slowly become a QA- and product definition-heavy job,โ€ says Rehl. As a developer, โ€œyour goal will be to understand patterns, understand testing methods, and be able to articulate the business goal you want the code to achieve.โ€

Josh Fruhlinger

Josh Fruhlinger is a writer and editor who has been covering technology since the first dot-com boom. His interests include cybersecurity, programming tools and techniques, internet and open source culture, and what causes tech projects to fail. He won a 2025 AZBEE Award for a feature article on refactoring AI code and his coverage of generative AI earned him a Jesse H. Neal Award in 2024. In 2015 he published The Enthusiast, a novel about what happens when online fan communities collide with corporate marketing schemes. He lives in Los Angeles.

More from this author