
Meta CEO Mark Zuckerberg thinks that 2025 is going to be the year of AI, and his latest post on Facebook says that the company is going to invest between $60 billion and $65 billion in 2025 which is a significant rise from the previous year's budget of approximately $38 billion to $40 billion,
A large portion of this investment will be pushed towards building Meta's data centers. Data centers are crucial for providing the computing that Meta needs for building its AI products. Zuckerberg said that the initiative isn't just about increasing infrastructure but also driving innovation and maintaining technological leadership in the United States. As part of its strategy, Meta is planning to acquire over 1.3 million GPUs by the end of this year.
On the other hand, another player from China has come out with its cost-effective DeepSeek-V3 and DeepSeek-R1 models. DeepSeek's models have outperformed several other leading models from OpenAI and Meta and are gaining a lot of attention for their capabilities and cost-effectiveness.
DeepSeek-V3 was trained with just 2,048 GPUs with over 2.78 million GPU hours at a cost of approximately $6 million, a fraction of what other leading models typically require. In contrast, Meta's Llama models, like Llama 3.1, required upwards of $60 million and 30.8 million GPU hours for training.
Also, the models are open-source, similar to Meta's Llama which means anyone can run them on their own hardware. The pricing for using DeepSeek-R1 reasoning API is also much lower than that of competitors like OpenAI. For example, DeepSeek charges $0.14 per million tokens for input compared to OpenAI's $7.5.
What makes DeepSeek cheap is its architecture. The model employs a Mixture-of-Experts (MoE) framework, which allows it to activate only a portion of its parameters during processing. DeepSeek claims that this method leads to greater efficiency and reduced computational requirements compared to traditional models like Llama that may not utilize similar techniques as effectively. Also, unlike OpenAI's o1 that uses supervised fine-tuning (SFT), DeepSeek uses pure reinforcement learning (RL) using which it can develop advanced reasoning capabilities autonomously.
It would be fun to see major AI companies compete to create better and more efficient models in 2025.
8 Comments - Add comment