The excitement and turmoil surrounding generative AI is not unlike the early days of open source, or the Wild West. We can resolve the uncertainty and confusion.
Most people can sing โHappy Birthdayโ by heart. But do you know how you learned it? Who first shared it with you? Who wrote it? You have the lyrics and the melody, and can teach others, but you probably have no idea where it came from.
That effectively describes โthe rubโ with generative AI, and itโs a problem for the individuals and organizations that use it. Much like the early days of open source, and software licensing in general, generative AI is uncharted territory, itโs exciting, and thereโs a lot to learn.
Even between deciding to develop this column and sitting down to actually write it, dozens and dozens of news stories distracted meโand further confused the AI issues I was contemplatingโnot least a story about OpenAI CEO Sam Altman telling the Senate that, yes, there should be a regulatory agency in place in the likely instance that the technology goes โhaywire.โ
In other words, generative AI is messy.
The Harvard Business Review notes that generative AI has an intellectual property problem that spans a complex set of questions:
- How should existing laws apply?
- What should we do about infringement?
- What rights do AI users have?
- What rights do content creators have?
- Who owns AI-generated works?
- Should unlicensed content be used for training?
- Should users be able to prompt AI models to cite the licensed and unlicensed works they were trained on?
How did we get to this point so quickly? Part of the confusion lies in the opacity of the generative AI model.
The GPT in ChatGPT
It all goes back to the โGPTโ in ChatGPT. GPT stands for generative pre-trained transformer. A transformer is not that bigโabout 2,000 lines of code. Itโs basically the equivalent of an egg cartonโits main purpose is to hold the โeggs,โ or the things that really have value to consumers. In the case of generative AI, the โeggsโ are variables or weights.
Sometimes humans forget where they learned something, but often they can remember and can cite sources. Unlike a human, ChatGPT and other generative AI platforms canโt actually remember any of the information they have ingested, nor can they cite it. There may be a log that exists somewhere, but itโs not in the model itself. Users canโt write a prompt to cite training data. The model just has a bunch of numbers and variables. Itโs akin to a bunch of neurons, and fake neurons at that. These models just statistically predict the next word based on a bunch of content.
So, how are we going to solve the problem?
Numerous mechanisms are being explored to control the use of AI models, generated content, and weights:
- Regulation. The government can make laws to control how AI can be used and prescribe punishments for breaking those laws.
- Licensing. This is a scalable legal agreement between creators and consumers of a software, prose, images, video, etc. Open source was founded on the pillars of โrightsโ and โfreedomsโ enabled by licensing, but AI is forcing us to take a hard look (Llama and ChatGPT are not open source) at whether ultimate freedom is really the best solution.
- Contracts. Content creators and those who pay for its creation often have contracts when they do business. For example, the Writers Guild of America West proposed a poison pill in the contract that would prevent any AI-generated content from being copyrighted. The studiosโ business models rely on copyrighted material so this would make it quite difficult to use AI-generated content instead of human writers.
- Technical controls: Much like in security, there is a difference between โpolicyโ and โtechnical controls.โ For example, thereโs a major difference between mandating that people must change their password every 90 days and forcing them to change their password when they try to log in. Similarly, many AI companies and researchers are attempting to control what an AI model or service will and wonโt do, but users are finding all kinds of creative ways to coax AI into doing prohibited things using prompt injection attacks.
Iโm skeptical that any of the four methods above will actually control what people can do and canโt do with AI. Though, as with open source, I do think consistent and well-understood licensing will be key to widespread adoption by businesses.
The HBR article I referenced earlier agrees, noting that licensing will be key to protecting both AI creators and consumers. But how is that going to happen?
The AI Wild West
The excitement and turmoil surrounding generative AI is not unlike the early days of open source. And, the early days of open source were like the Wild West. Licenses were being created and used with no oversight, causing uncertainty and confusion, which is the opposite of what licensing is supposed to do. In the late-90s, the Open Source Initiative (OSI) took over and basically said, โWeโre the keepers of everything that is open source.โ Today, both the OSI and the Free Software Foundation publish open source definitions, which are used to determine the conformance of open source licenses.
And for about 25 years we kinda thought we were โdoneโ with open source licensing. But because of AI (and other things like cloud providers), weโre going to need to rethink licensing schemes, or maybe generate totally new ones. The weights, the models, and the training data all likely need licenses because, without all the inputs and the outputs being well understood, it will make it difficult for businesses to adopt.
AI blurs these lines. When humans generate knowledge, itโs easy to understand the source and the ethics behind ownership of the knowledge. But when you start to get into AI models, itโs like, OK, who owns that stuff, anyway? Because, letโs be honest, it stands to reason that not all models are being trained on content that was approved for use in that way. In fact, I think itโs pretty safe to say that many of these models are violating copyright and license agreements. But, how do you prove it? Itโs just a bunch of numbers in a model. How can you sue somebody in court over that?
Just as the OSI did with open source, the OSI is stepping in to try and put some guardrails around all of this. The OSIโs Deep Dive project posits that โthe traditional view of open source code implementing AI algorithms may not be sufficient to guarantee inspectability and replicability of the AI systems.โ The OSI has put out a series of podcasts on the subject, and it is conducting four virtual seminars designed to โframe a conversation to discover whatโs acceptable for AI systems to be โOpen Source.โโ The OSI even has a blog about the outcome from the first face-to-face community discussion: Takeaways from the โDefining Open AIโ community workshop.
If all of this sounds really confusing, thatโs because it is. And the landscape is changing by the day. It will be really important for organizations to stay abreast of all of the news, and to do the best they can to avoid hype and fearmongering and focus instead on what can be doneโnowโto balance the benefits of AI with governance and guardrails.
I highly recommend paying close attention to the OSIโs work and to push the vendors you work with to explain what they are doing (and will do) to ensure the effective and ethical use of AI in an open source context. The goal is to deliver AI-powered applications with trusted provenance.
(And, for the record, โHappy Birthday,โ which dates back to the late-1800s, was derived from a song written by schoolteacher Patty Hill and her sister, Mildred.โ Over the years, it has been at the center of many copyright battles and millions of dollars of licensing fees. It is currently in the public domain.)
โ
Generative AI Insights provides a venue for technology leadersโincluding vendors and other third partiesโto explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorldโs technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.
ย


