A glimpse at how DeepSeek achieved its V3 and R1 breakthroughs, and how organizations can take advantage of model innovations when they emerge so quickly.
The release ofΒ DeepSeekΒ roiled the world of generative AI last month, leaving engineers and developers wondering how the company achieved what it did, and how they might take advantage of the technology in their own technology stacks.
The DeepSeek team built on developments that were already known in the AI community but had not been fully applied. The result is a model that appears to be comparable in performance to leading models like Metaβs Llama 3.1, but was built and trained at a fraction of the cost.
[ Related: More DeepSeek news and analysis ]
Most importantly, DeepSeek released its work as open access technology, which means others can learn from it and create a far more competitive market for large language models (LLMs) and related technologies.Β
Hereβs a glimpse at how DeepSeek achieved its breakthroughs, and what organizations must do to take advantage of such innovations when they emerge so quickly.
Inside the DeepSeek models
DeepSeek released two models in late December and late January: DeepSeek V3, a powerful foundational model comparable in scale to GPT-4; and DeepSeek R1, designed specifically for complex reasoning and based on the V3 foundation.Β Hereβs a look at the technical strategy for each.
DeepSeek V3
- New mix for precision training: DeepSeek leveraged eight-bit precision matrix multiplication for faster operations, while implementing custom logic to accumulate results with the correct precision. They also utilized WGMMA parallel operators (pronounced βwagamamaβ).Β
- Taking multi-token prediction to the next level: Clearly inspired by Metaβs French research team, which pioneered predicting multiple tokens simultaneously, DeepSeek utilized enhanced implementation techniques to push this concept even further.Β
- Expert use of βcommon knowledgeβ: The basic concept of Mixture-of-Experts (MoE) is akin to activating different parts of the brain based on the taskβjust as humans conserve energy by engaging only the necessary neural circuits. Traditional MoE models split the network into a limited number of βexpertsβ (e.g., eight experts) and activate only one or two per query. DeepSeek introduced a far more granular approach, incorporating an idea originally explored by Microsoft Researchβthe notion that some βcommon knowledgeβ needs to be processed by model components that remain active at all times.
DeepSeek R1
- Rewarding reasoning at scale: Much like AlphaGo Zero learned to play Go solely from game rules, DeepSeek R1 Zero learns how to reason from a basic reward modelβa first at this scale. While the concept isnβt new, successfully applying it to a large-scale model is unprecedented. DeepSeekβs research captures some profound moments, such as the βaha momentβ when DeepSeek R1 Zero realized on its own that spending more time thinking leads to better answers (I wish I knew how to teach that).
- Curating a βcold startβ: The DeepSeek R1 model also leverages a more traditional approach, incorporating cold-start data from DeepSeek V3. While no groundbreaking techniques seem to be involved at this stage, patience and meticulous curation likely played a crucial role in making it work.
These DeepSeek advances are a testament to open research, and how it can help the progress of humankind. One of the most interesting next steps? TheΒ great team at Hugging FaceΒ is already working to reproduce DeepSeek R1 in itsΒ Open R1 project.
The importance of LLM agnosticism
The limiting factor for AI will not be uncovering business value or model quality. What is critical is that companies maintain an agnostic strategy with their AI partners.Β
DeepSeek shows that betting on a single LLM provider will be a losing game. Some organizations have locked themselves into a single vendor, whether OpenAI, Anthropic, or Mistral. But the ability of new players to disrupt the landscape in a single weekend makes it clear: companies need an LLM-agnostic approach.Β
A multi-LLM infrastructure avoids the dangers of vendor βlock-inβ and makes it easier to integrate and switch between models as the market evolves. Essentially, this future-proofs any LLM decision by ensuring optionality through a companyβs AI journey.
Enterprises must also maintain control through careful governance. DeepSeek and the fast-emerging world of agentic AI show how chaotic and fast-moving the AI landscape has become. In a world of open-source reasoning models and rapidly multiplying vendors, engineering teams will need to maintain rigorous testing, robust guardrails, and continuous monitoring.Β
If you can meet these needs, technologies like Deepseek will be a huge positive for all businesses by increasing competition, driving down costs, and opening new use cases that more companies can capitalize on.
Florian DouetteauΒ is co-founder and CEO ofΒ Dataiku, the universal AI platform that provides the worldβs largest companies with control over their AI talent, processes, and technologies to unleash the creation of analytics, models, and agents.
β
Generative AI Insights provides a venue for technology leadersβincluding vendors and other outside contributorsβto explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorldβs technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.


