Published Date : 28/01/2025
On January 24, Perplexity released an assistant for Android phones, and on January 23, OpenAI previewed the Operator AI agent, capable of performing web tasks.
Meta announced its AI ambitions, including a massive data center, and Google stated that Gemini can now control smart homes.
However, these announcements were overshadowed by a significant development in the AI world.
China's DeepSeek, a relatively unknown player, launched a new generation of AI models that compete with those developed by US Big Tech, but at a fraction of the cost.
DeepSeek's AI assistant quickly became the top-rated free application on Apple’s App Store in the United States and has surpassed a million downloads on Google’s Play Store for Android phones.
This surge in popularity sparked a significant market reaction, with US AI companies facing a major decline.
The Nasdaq fell over 3% in early trade, and Nvidia, a leading chipmaker, saw a 13% drop, losing $465 billion in market value—the largest single-day loss in US market history.
DeepSeek not only outperforms other AI models in terms of cost and capability but also in availability.
DeepSeek claims to have spent around $5.5 million to train its V3 model, a stark contrast to the hundreds of millions invested by tech giants like Google and OpenAI.
According to research by Epoch.AI, Google and OpenAI spent between $70 million and $100 million to train their most advanced models, the Gemini 1.0 Ultra and GPT-4, respectively.
What sets DeepSeek apart is its frugal approach to hardware.
The company's chatbot reveals it was trained on a combination of Nvidia A100 and H100 GPUs, though the exact number is undisclosed.
DeepSeek's CEO, Liang Wenfeng, a billionaire who runs a hedge fund, has funded the company and hired top talent from other Chinese tech firms like ByteDance and Tencent.
DeepSeek is cautious about its responses to sensitive questions.
When asked about human rights challenges in China, including internet censorship and the treatment of Uyghur Muslims in Xinjiang, the AI initially listed several issues but quickly retracted its response, stating it was beyond its current scope.
However, it was more forthcoming on economic and social challenges facing China, India, and the US.
DeepSeek has been working on its AI models for some time.
The DeepSeek Coder was released in late 2023, followed by the 67-billion parameter DeepSeek LLM, DeepSeek V2, a more advanced DeepSeek Coder V2 with 236 billion parameters, the 671 billion parameter DeepSeek V3, and the 32 billion and 70 billion models of the DeepSeek R1.
These achievements have been described as a joke of a budget by Andrej Karpathy, founder of EurekaLabsAI.
The cost efficiency of DeepSeek's API is another major selling point.
The R1 API costs just $0.55 per million input tokens and $2.19 per million output tokens, significantly lower than OpenAI's API, which costs around $15 per million input and $60 per million output tokens.
DeepSeek's R1 model uses reinforced learning, which allows it to learn through trial and error, improving its reasoning capabilities based on feedback.
DeepSeek's approach to AI models is innovative.
Instead of using the traditional 32-bit floating points, it uses an 8-bit floating point, achieving the same accuracy while using 75% less memory.
The multi-token system reads entire phrases at once, making AI responses twice as fast.
DeepSeek’s Mixture-of-Experts (MOE) language model is an evolution that activates only the most relevant parameters for each token, unlike traditional models that keep all parameters active.
Despite its successes, DeepSeek and other Chinese tech companies face skepticism.
There are concerns about their proximity to the Chinese government and the potential risks associated with data sharing.
Additionally, questions remain about DeepSeek’s access to the latest generation GPUs and AI chips.
SemiAnalysis’ Dylan Patel estimates DeepSeek has 50,000 Nvidia GPUs, far more than the 10,000 suggested by some online chatter.
The exact details of how DeepSeek sourced these GPUs remain unclear, which could reveal the true cost of their achievements.
In summary, DeepSeek's entry into the AI market has been nothing short of revolutionary, challenging the dominance of US tech giants and highlighting the cost-effectiveness of Chinese AI models.
As the AI landscape continues to evolve, the world watches with keen interest.
Q: What is DeepSeek?
A: DeepSeek is a Chinese AI company that has developed highly competitive AI models at a fraction of the cost of US tech giants. It has gained significant attention for its cost-effective and efficient AI solutions.
Q: How much did DeepSeek spend to train its V3 model?
A: DeepSeek claims to have spent around $5.5 million to train its V3 model, which is significantly less than the hundreds of millions invested by companies like Google and OpenAI.
Q: What is the cost of DeepSeek's R1 API?
A: DeepSeek's R1 API costs just $0.55 per million input tokens and $2.19 per million output tokens, making it much more cost-effective compared to OpenAI's API, which costs around $15 per million input and $60 per million output tokens.
Q: What are the key features of DeepSeek's AI models?
A: DeepSeek's AI models use 8-bit floating points, a multi-token system, and a Mixture-of-Experts (MOE) language model. These features make the models more efficient, requiring less memory and providing faster responses.
Q: What are the concerns surrounding DeepSeek and other Chinese tech companies?
A: There are concerns about the proximity of Chinese tech companies to the government and the potential risks associated with data sharing. Additionally, there is a lack of clarity about their access to the latest generation GPUs and AI chips.