by Matt Asay

Contributing Writer

AI coding assistants are on a downward spiral

opinion

Feb 24, 20255 mins

Code EditorsDevelopment ToolsGenerative AI

When established technologies take up the most space in training data sets, what’s to make LLMs recommend new technologies (even if they’re better)?

We’re living in a strange time for software development. On the one hand, AI-driven coding assistants have shaken up a hitherto calcified IDE market. As RedMonk Cofounder James Governor puts it, “suddenly we’re in a position where there is a surprising amount of turbulence in the market for editors,” when “everything is in play” with “so much innovation happening.” Ironically, that very innovation in genAI may be stifling innovation in the software those coding assistants increasingly recommend. As AWS developer advocate Nathan Peck highlights, “the brutal truth beneath the magic of AI coding assistants” is that “they’re only as good as their training data, and that stifles new frameworks.”

In other words, genAI-driven tools are creating powerful feedback loops that foster winner-takes-all markets, making it hard for innovative, new technologies to take root.

No room for newbies

I’ve written before about genAI’s tendency to undermine its sources for training data. In the software development world, ChatGPT, GitHub Copilot, and other large language models (LLMs) have had a profoundly negative effect on sites like Stack Overflow, even as they’ve had a profoundly positive impact on developer productivity. Why ask a question on Stack Overflow when you can ask Copilot? But every time a developer does that, one less question goes to the public repository used to feed LLMs training data.

Just as bad, we don’t know if the training data is correct in the first place. As I recently noted, “The LLMs have trained on all sorts of good and bad data from the public Internet, so it’s a bit of a crapshoot as to whether a developer will get good advice from a given tool.” Presumably each LLM has a way of weighting certain sources of data as more authoritative, but if so, that weighting is completely opaque. AWS, for example, is probably the best source of information for how Amazon Aurora works, but it’s unclear whether developers using Copilot will see documentation from AWS or a random Q&A on Stack Overflow. I’d hope the LLMs would privilege the creator of the technology as the best source for information about it, but who knows?

And then there’s the inescapable feedback loop that Peck points out. It’s worth quoting him at length. Here’s how he describes the loop:

Developers choose popular incumbent frameworks because AI recommends them

This leads to more code being written in these frameworks

Which provides more training data for AI models

Making the AI even better at these frameworks, and even more biased toward recommending them

Attracting even more developers to these incumbent technologies

He then describes how this impacts him as a JavaScript developer. JavaScript has been a hotbed for innovation over the years, with a new framework seemingly emerging every other day. I wrote about this back in 2015, and that frenetic pace has continued for the past decade. It’s not necessarily something that will continue though, as Peck details, because the LLMs actively discourage developers from trying something new. Peck describes working with the new Bun runtime: “I’ve seen firsthand how LLM-based assistants try to push me away from using the Bun native API, back to vanilla JavaScript implementations that look like something I could have written 10 years ago.”

Why? Because that’s what the volume of training data is telling the LLMs to suggest. The rich get richer, in other words, and new options struggle to get noticed at all. That’s always been somewhat true, of course, but now it’s institutionalized by data-driven tools that don’t listen to anything beyond sheer volumes of data.

As Peck concludes, this “creates an uphill battle for innovation.” It’s always hard to launch or choose new technology, but AI coding assistants make it that much harder. He offers a provocative but appropriate example: If ChatGPT had been “invented before Kubernetes reached mainstream adoption…, I don’t think there would have ever been a Kubernetes.” The LLMs would have pushed developers toward Mesos or other already available options, rather than the new (but eventually superior) option.

What to do?

Open it up

It’s not clear how we resolve this looming problem. We’re still in the “wow, this is cool!” phase of AI coding assistants, and rightly so. But at some point, the tax we’re paying will become evident, and we’ll need to figure out how to extricate ourselves from the hole we’re digging.

One thing seems clear: As much as closed-source options may have worked in the past, it’s hard to see how they can survive in the future. As Gergely Orosz posits, “LLMs will be better in languages they have more training on,” and almost by definition, they’ll have more access to open source technologies. “Open source code is high-quality training,” he argues, and starving the LLMs of training data by locking up one’s code, documentation, etc., is a terrible strategy.

So that’s one good outcome of this seemingly inescapable LLM feedback loop: more open code. It doesn’t solve the problem of LLMs being biased toward older, established code and thereby inhibiting innovation, but it at least pushes us in the right direction for software, generally.

by Matt Asay

Contributing Writer

Matt Asay runs developer marketing at Oracle. Previously Asay ran developer relations at MongoDB, and before that he was a Principal at Amazon Web Services and Head of Developer Ecosystem for Adobe. Prior to Adobe, Asay held a range of roles at open source companies: VP of business development, marketing, and community at MongoDB; VP of business development at real-time analytics company Nodeable (acquired by Appcelerator); VP of business development and interim CEO at mobile HTML5 start-up Strobe (acquired by Facebook); COO at Canonical, the Ubuntu Linux company; and head of the Americas at Alfresco, a content management startup. Asay is an emeritus board member of the Open Source Initiative (OSI) and holds a JD from Stanford, where he focused on open source and other IP licensing issues. The views expressed in Matt’s posts are Matt’s, and don’t represent the views of his employer.

Show me more

Topics

About

Policies

Our Network

More

AI coding assistants are on a downward spiral

When established technologies take up the most space in training data sets, what’s to make LLMs recommend new technologies (even if they’re better)?

No room for newbies

Open it up

More from this author

Why DocumentDB can be a win for MongoDB

Enterprise essentials for generative AI

Why AI fails at business context, and what to do about it

Who does the unsexy but essential work for open source?

Bridging the trust gap in AI-driven development

The importance of memory for AI

Why front-end development will persist

Why LLMs demand a new approach to authorization

Show me more

Rust Innovation Lab launched, sponsors first project

PostgreSQL 18 to boost OLTP performance, but misses AI readiness

Is Meta’s $10 billion cloud deal a good idea for you?

Getting encryption wrong (and getting it right, too)

How to build a native desktop app vs. a web UI app

PyApp: Build click-to-run Python apps with Rust