DeepSeek Introduces Advanced Models, Challenging Proprietary AI Performance

Emily Carter
Emily Carter
DeepSeek AI logo with abstract digital patterns, symbolizing advanced AI models and innovation

As AI systems move beyond text generation, the capabilities of large language models continue to evolve. DeepSeek has announced the release of two new models, DeepSeek V3.2 and DeepSeek-V3.2-Speciale, on December 1st. These models aim to narrow the performance gap between open-source and proprietary AI solutions. DeepSeek V3.2 is positioned to compete with models like GPT-5, while DeepSeek-V3.2-Speciale, a high-performance variant, reportedly achieves parity with Gemini, a leading proprietary model. The new models have also secured top positions in competitions, including the IMO 2025 (International Mathematical Olympiad) and CMO 2025 (China Mathematical Olympiad). This marks DeepSeek's ninth model release this year.

Key Points

DeepSeek's latest advancements are attributed to several methodological innovations:

  • Sparse Attention (DSA) Integration: DeepSeek has officially adopted Sparse Attention, a feature previously tested in the V3.2-EXP version. This mechanism addresses the computational inefficiencies of traditional attention models, where processing long sequences leads to a quadratic increase in computational load. DSA helps models focus on key information, significantly enhancing their ability to process extended texts while reducing inference costs.

  • Enhanced Post-Training Protocols: DeepSeek has increased its investment in the post-training phase for open-source models. The company designed a new reinforcement learning protocol, allocating over 10% of its total training compute to this stage. This intensive "special tutoring" aims to improve model performance, particularly in complex problem-solving.

  • DeepSeek V3.2 Speciale for Extended Reasoning: This specialized version removes penalties for deep, prolonged thinking during training, encouraging the model to engage in more extensive reasoning processes. This approach has enabled DeepSeek V3.2 Speciale to compete with models like Gemini 3.

  • Improved Agent Capabilities: DeepSeek has focused on enhancing the model's ability to act as an agent. This involves building virtual environments and synthesizing extensive data for training. DeepSeek-V3.2 utilized 24,667 real code environment tasks, 50,275 real search tasks, 4,417 synthetic general agent scenarios, and 5,908 real code explanation tasks during post-training.

  • Optimized Tool Usage: A significant overhaul was implemented in how the model interacts with external tools. Previous DeepSeek versions would reset their thought process after calling an external tool. The V3.2 update ensures that the model's "thinking process" is continuously preserved during a sequence of tool calls, improving efficiency and coherence.

Under the Hood

From a structural standpoint, the integration of Sparse Attention (DSA) is a notable architectural change. Traditional attention mechanisms require each token to compute its relationship with every preceding token. This leads to a quadratic increase in computational load as sequence length grows. DSA, however, introduces a "fixed-page directory" concept, allowing subsequent tokens to calculate relationships only with these highlighted "directory entries." This method is analogous to using a table of contents to navigate a book, significantly reducing computational overhead for long texts. As sentence lengths increase, the inference cost for V3.2 with DSA remains relatively stable, in contrast to the substantial increase observed in traditional models like V3.1.

In practice, DeepSeek's emphasis on post-training represents a strategic shift. Historically, open-source models have invested less computational effort in this phase compared to proprietary models. DeepSeek's allocation of over 10% of its total training compute to a new reinforcement learning protocol aims to bridge this gap, allowing the models to "practice difficult problems" more effectively.

For developers, the improvements in agent capabilities and tool usage are particularly relevant. The continuous preservation of the model's thinking process during tool calls means that the model no longer has to re-establish its reasoning from scratch after receiving tool results. This streamlines complex tasks and enhances the overall user experience.

Competitive Landscape

While DeepSeek V3.2 Speciale demonstrates strong performance, including head-to-head competition with Google's Gemini 3 Pro, DeepSeek acknowledges areas for improvement. For instance, DeepSeek models may require more tokens to answer the same questions compared to some proprietary alternatives. Based on data obtained by toolmesh.ai, a test involving a complex question revealed that DeepSeek used 8077 tokens compared to Gemini's 4972 tokens for the same query.

However, DeepSeek's pricing model offers a different advantage. Despite higher token consumption, the cost per query for DeepSeek can be significantly lower. In the aforementioned test, DeepSeek's 8000+ tokens cost approximately $0.0032, whereas Google's less than 5000 tokens cost around $0.06, representing a substantial price difference.

Outlook

Looking ahead, DeepSeek's strategy aligns with a broader industry discussion about the future of AI development. The company's focus on algorithmic innovation—such as V2's MoE, V3's Multi-Head Latent Attention (MLA), DeepSeek Math V2's self-verification, and V3.2's Sparse Attention (DSA)—suggests a commitment to building more intelligent systems with limited data and computing resources. This approach contrasts with the strategy of simply scaling up parameters, a sentiment echoed by figures like Ilya Sutskever, who has suggested that merely increasing parameters may not be a sustainable path for AI advancement. DeepSeek's continuous efforts aim to narrow the gap between open-source and closed-source models through efficient and innovative algorithmic solutions.