DeepSeek V3.2 Models: Enhanced Agent Capabilities & Reasonin

DeepSeek has introduced two new official models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, following the experimental release of DeepSeek-V3.2-Exp two months prior. The initial experimental version demonstrated the efficacy of the DSA sparse attention mechanism, with user feedback indicating no significant performance inferiority compared to DeepSeek-V3.1-Terminus. The official web interface, application, and API have been updated to incorporate DeepSeek-V3.2. The Speciale version is currently available as a temporary API service for community evaluation and research. A technical report detailing the new models has been released.

Key Points

DeepSeek-V3.2 aims to balance reasoning capabilities with output length, making it suitable for daily applications such as Q&A and general agent tasks. Public reasoning benchmarks indicate that DeepSeek-V3.2 performs at a level comparable to GPT-5, slightly below Gemini-3.0-Pro. Compared to Kimi-K2-Thinking, V3.2 significantly reduces output length, which lowers computational overhead and user waiting times.

DeepSeek-V3.2-Speciale is designed to push the boundaries of open-source model reasoning capabilities. This enhanced version of DeepSeek-V3.2 incorporates the theorem proving functionalities of DeepSeek-Math-V2. It exhibits strong instruction following, rigorous mathematical proof, and logical verification abilities, performing comparably to Gemini-3.0-Pro in mainstream reasoning benchmarks. Notably, the V3.2-Speciale model achieved results equivalent to gold medals in hypothetical 2025 competitions, including the International Mathematical Olympiad (IMO), China Mathematical Olympiad (CMO), ICPC World Finals, and International Olympiad in Informatics (IOI). In the ICPC and IOI, its performance reached the level of the second and tenth-best human contestants, respectively.

For highly complex tasks, the Speciale model offers significant performance advantages over the standard version, though it consumes more tokens and incurs higher costs. DeepSeek-V3.2-Speciale is currently intended for research use only, does not support tool calls, and has not been optimized for daily dialogue or writing tasks.

Under the Hood

DeepSeek-V3.2 marks the first instance of the model integrating thinking with tool usage, supporting tool calling in both thinking and non-thinking modes. This capability was developed using a large-scale agent training data synthesis method, which involved constructing over 1,800 environments and more than 85,000 complex instructions for "hard to answer, easy to verify" reinforcement learning tasks. This approach significantly improved the model's generalization ability.

Evaluations indicate that the DeepSeek-V3.2 model achieved the highest performance among current open-source models in agent benchmarks, narrowing the gap with closed-source alternatives. The model was not specifically trained for the tools used in these test sets, suggesting strong generalization in real-world application scenarios.

What Comes Next

DeepSeek-V3.2 is now the official service model, with the website, application, and API models upgraded from DeepSeek-V3.2-Exp. For community evaluation and research, an unofficial API service for DeepSeek-V3.2-Speciale has been deployed. This API supports dialogue functions in thinking mode, with a maximum output length of 128K, and is available until December 15, 2025.

The API update for DeepSeek-V3.2 supports tool calling in thinking mode, enabling the model to provide more detailed and accurate answers through multiple rounds of thinking and tool calling. For developers, the process involves returning the chain of thought content (reasoning_content) to the API for continued model thinking. When initiating a new user question, the previous chain of thought must be cleared, while other relevant content is retained.

DeepSeek-V3.2's thinking mode also supports Claude Code. Users can activate it by changing the model name to deepseek-reasoner or by pressing the Tab key in the Claude Code CLI. However, thinking mode is not fully adapted to components like Cline and RooCode that use non-standard tool calls; for these, non-thinking mode is recommended.