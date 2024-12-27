DeepSeek AI, a Chinese AI research lab, has been making waves in the open-source AI community. Recently, DeepSeek announced DeepSeek-V3, a Mixture-of-Experts (MoE) large language model with 671 billion total parameters, with 37 billion activated for each token. According to results from popular AI benchmarks, this new DeepSeek-V3 model is the most powerful open-source model out there, and it even performs better than popular closed-source models, including OpenAI's GPT-4o and Anthropic's Claude 3.5.

As you can see from the table above, DeepSeek-V3 posted state-of-the-art results in nine benchmarks—the most for any comparable model of its size. Despite its excellent performance in key benchmarks, DeepSeek-V3 requires only 2.788 million H800 GPU hours for its full training and about $5.6 million in training costs. For comparison, the equivalent open-source Llama 3 405B model requires 30.8 million GPU hours for training. DeepSeek-V3 is cost-effective due to the support of FP8 training and deep engineering optimizations.

DeepSeek-V3 is also highly efficient in inference. From February 8th, DeepSeek-V3 input will cost $0.27/million tokens ($0.07/million tokens with caching), and output will cost $1.10/million tokens. This pricing is almost one-tenth of what OpenAI and other leading AI companies currently charge for their flagship frontier models.

The DeepSeek team wrote the following on X regarding the DeepSeek-V3 release:

DeepSeek’s mission is unwavering. We’re thrilled to share our progress with the community and see the gap between open and closed models narrowing. This is just the beginning! Look forward to multimodal support and other cutting-edge features in the DeepSeek ecosystem.

You can download the DeepSeek-V3 model on GitHub and HuggingFace. With its impressive performance and affordability, DeepSeek-V3 could democratize access to advanced AI models. This release marks a significant step towards closing the gap between open and closed AI models.