Matt Asay
Contributing Writer

AI coding assistants are on a downward spiral

opinion
Feb 24, 20255 mins

When established technologies take up the most space in training data sets, whatโ€™s to make LLMs recommend new technologies (even if theyโ€™re better)?

spiral staircase
Credit: Symphonex

Weโ€™re living in a strange time for software development. On the one hand, AI-driven coding assistants have shaken up a hitherto calcified IDE market. As RedMonk Cofounder James Governor puts it, โ€œsuddenly weโ€™re in a position where there is a surprising amount of turbulence in the market for editors,โ€ when โ€œeverything is in playโ€ with โ€œso much innovation happening.โ€ Ironically, that very innovation in genAI may be stifling innovation in the software those coding assistants increasingly recommend. As AWS developer advocate Nathan Peck highlights, โ€œthe brutal truth beneath the magic of AI coding assistantsโ€ is that โ€œtheyโ€™re only as good as their training data, and that stifles new frameworks.โ€

In other words, genAI-driven tools are creating powerful feedback loops that foster winner-takes-all markets, making it hard for innovative, new technologies to take root.

No room for newbies

Iโ€™ve written before about genAIโ€™s tendency to undermine its sources for training data. In the software development world, ChatGPT, GitHub Copilot, and other large language models (LLMs) have had a profoundly negative effect on sites like Stack Overflow, even as theyโ€™ve had a profoundly positive impact on developer productivity. Why ask a question on Stack Overflow when you can ask Copilot? But every time a developer does that, one less question goes to the public repository used to feed LLMs training data.

Just as bad, we donโ€™t know if the training data is correct in the first place. As I recently noted, โ€œThe LLMs have trained on all sorts of good and bad data from the public Internet, so itโ€™s a bit of a crapshoot as to whether a developer will get good advice from a given tool.โ€ Presumably each LLM has a way of weighting certain sources of data as more authoritative, but if so, that weighting is completely opaque. AWS, for example, is probably the best source of information for how Amazon Aurora works, but itโ€™s unclear whether developers using Copilot will see documentation from AWS or a random Q&A on Stack Overflow. Iโ€™d hope the LLMs would privilege the creator of the technology as the best source for information about it, but who knows?

And then thereโ€™s the inescapable feedback loop that Peck points out. Itโ€™s worth quoting him at length. Hereโ€™s how he describes the loop:

  1. Developers choose popular incumbent frameworks because AI recommends them
  2. This leads to more code being written in these frameworks
  3. Which provides more training data for AI models
  4. Making the AI even better at these frameworks, and even more biased toward recommending them
  5. Attracting even more developers to these incumbent technologies

He then describes how this impacts him as a JavaScript developer. JavaScript has been a hotbed for innovation over the years, with a new framework seemingly emerging every other day. I wrote about this back in 2015, and that frenetic pace has continued for the past decade. Itโ€™s not necessarily something that will continue though, as Peck details, because the LLMs actively discourage developers from trying something new. Peck describes working with the new Bun runtime: โ€œIโ€™ve seen firsthand how LLM-based assistants try to push me away from using the Bun native API, back to vanilla JavaScript implementations that look like something I could have written 10 years ago.โ€

Why? Because thatโ€™s what the volume of training data is telling the LLMs to suggest. The rich get richer, in other words, and new options struggle to get noticed at all. Thatโ€™s always been somewhat true, of course, but now itโ€™s institutionalized by data-driven tools that donโ€™t listen to anything beyond sheer volumes of data.

As Peck concludes, this โ€œcreates an uphill battle for innovation.โ€ Itโ€™s always hard to launch or choose new technology, but AI coding assistants make it that much harder. He offers a provocative but appropriate example: If ChatGPT had been โ€œinvented before Kubernetes reached mainstream adoptionโ€ฆ, I donโ€™t think there would have ever been a Kubernetes.โ€ The LLMs would have pushed developers toward Mesos or other already available options, rather than the new (but eventually superior) option.

What to do?

Open it up

Itโ€™s not clear how we resolve this looming problem. Weโ€™re still in the โ€œwow, this is cool!โ€ phase of AI coding assistants, and rightly so. But at some point, the tax weโ€™re paying will become evident, and weโ€™ll need to figure out how to extricate ourselves from the hole weโ€™re digging.

One thing seems clear: As much as closed-source options may have worked in the past, itโ€™s hard to see how they can survive in the future. As Gergely Orosz posits, โ€œLLMs will be better in languages they have more training on,โ€ and almost by definition, theyโ€™ll have more access to open source technologies. โ€œOpen source code is high-quality training,โ€ he argues, and starving the LLMs of training data by locking up oneโ€™s code, documentation, etc., is a terrible strategy.

So thatโ€™s one good outcome of this seemingly inescapable LLM feedback loop: more open code. It doesnโ€™t solve the problem of LLMs being biased toward older, established code and thereby inhibiting innovation, but it at least pushes us in the right direction for software, generally.

Matt Asay

Matt Asay runs developer marketing at Oracle. Previously Asay ran developer relations at MongoDB, and before that he was a Principal at Amazon Web Services and Head of Developer Ecosystem for Adobe. Prior to Adobe, Asay held a range of roles at open source companies: VP of business development, marketing, and community at MongoDB; VP of business development at real-time analytics company Nodeable (acquired by Appcelerator); VP of business development and interim CEO at mobile HTML5 start-up Strobe (acquired by Facebook); COO at Canonical, the Ubuntu Linux company; and head of the Americas at Alfresco, a content management startup. Asay is an emeritus board member of the Open Source Initiative (OSI) and holds a JD from Stanford, where he focused on open source and other IP licensing issues. The views expressed in Mattโ€™s posts are Mattโ€™s, and donโ€™t represent the views of his employer.

More from this author