OpenAI Launches GPT-5.2, Integrating into Microsoft Products and Targeting Professional Tasks
OpenAI has released GPT-5.2, its latest large language model, marking the company's tenth anniversary. The new model, available in Instant, Thinking, and Pro versions, is being rolled out to paid ChatGPT users and integrated into OpenAI's API and Codex for developers. Free and Go users are expected to gain access later. The previous GPT-5.1 will remain available for paid users for three months before its retirement.
OpenAI stated that GPT-5.2 is part of ongoing model improvements, with future iterations set to address issues such as over-rejection and response latency. API access for developers includes gpt-5.2 for Thinking, gpt-5.2-chat-latest for Instant, and gpt-5.2-pro for Pro.
Pricing for GPT-5.2 has increased, with input tokens at $1.75 per million and output tokens at $14 per million. The GPT-5.2 Pro version is priced at $21 and $168 per million tokens, respectively, and introduces a fifth level of inference intensity, xhigh.
Enhanced Performance Across Benchmarks
Sam Altman, OpenAI's co-founder and CEO, shared performance metrics for GPT-5.2 on the social platform X. The model achieved 55.6% on SWE-Bench Pro, 52.9% on ARC-AGI-2, and 40.3% on Frontier Math. These benchmarks assess the model's capabilities in complex code repair, general reasoning, and advanced mathematical tasks.
According to OpenAI's official blog, GPT-5.2 surpasses industry professionals in explicit knowledge work across 44 professions. GPT-5.2 Thinking showed significant improvements over GPT-5.1 Thinking in knowledge-based tasks, programming, scientific problems, mathematics, and abstract reasoning. It achieved a perfect score in the AIME 2025 math competition and matched or outperformed 70.9% of human experts in OpenAI's professional work benchmark test, GDPval.
Yann Dubois, an OpenAI team member, noted on X that GPT-5.2 Thinking is designed for "high economic value tasks," including coding, spreadsheets, and presentation documents. In eight benchmark tests, including SWE-Bench Pro and GPQA Diamond, GPT-5.2 Thinking's scores exceeded those of Google Gemini 3 Pro and Anthropic Claude Opus 4.5. GPT-5.2 also demonstrated improved multimodal task handling, suggesting potential to rival Gemini in this area.
Professional Task Capabilities and Coding
OpenAI reported that GPT-5.2 Thinking achieved "expert level" performance in the GDPval evaluation, matching or exceeding industry professionals in 70.9% of 44 professional task types. GPT-5.2 Pro further improved this to 74.1%. In tasks where only "clear wins" were counted, GPT-5.2 Thinking scored 49.8%, and Pro reached 60%. These evaluations covered business outcomes such as sales presentations, budget models, and operational scheduling. GPT-5.2 generated these tasks approximately 11 times faster than human experts, at less than 1% of the cost.
For investment research, GPT-5.2 Thinking achieved an average score of 68.4% in internal evaluations for scenarios like investment banking three-statement models and leveraged buyout models, up from GPT-5.1 Thinking's 59.1%. GPT-5.2 Pro scored 71.7%.
In coding, GPT-5.2 Thinking achieved 55.6% on SWE-bench Pro and 80% on SWE-bench Verified, both improvements over GPT-5.1's 50.8% and 76.3%. On the SWE-Lancer IC Diamond task, GPT-5.2 Thinking reached 74.6% compared to GPT-5.1's 69.7%.
GPT-5.2 also appeared on the AI benchmark platform Imarena.ai (Arena) leaderboard, scoring 1486 points in the WebDev test, placing second. Another version, GPT-5.2, ranked sixth with 1399 points. Arena's evaluation focuses on end-to-end coding capabilities in deployable web application scenarios.
GPT-5.2 Thinking achieved a 93.9% error-free response rate in ChatGPT queries with search mode enabled, an improvement from GPT-5.1's 91.2%. Without search, it improved from 87.3% to 88%. The reliability of tool calling and long-chain tasks also improved, with GPT-5.2 Thinking scoring 98.7% in Tau-2 Bench Telecom. In the noisier Retail scenario, accuracy increased from 77.9% to 82%. For the general toolchain evaluation BrowseComp, GPT-5.2 Thinking reached 65.8%, and the Pro version reached 77.9%, both higher than GPT-5.1's 50.8%. Both GPT-5.2 Thinking and Pro support the xhigh inference intensity level for complex professional tasks.
Long Context and Visual Understanding
GPT-5.2 Thinking demonstrated improved long context capabilities in OpenAI MRCRv2. In the 8 needles test, it maintained higher performance than GPT-5.1 across 4k to 256k token lengths, reaching 98.2% at 4k–8k and 77.0% at 128k–256k, while GPT-5.1 ranged from 29.6%–47.8%. In other long-text scenarios, GPT-5.2 Thinking achieved 92.0% and 89.8% in BrowseComp Long Context (128k/256k) respectively. In the GraphWalks task, it scored 94.0% and 89.0% in the bfs and parents subsets, compared to GPT-5.1's 76.8% and 71.5%.
In visual understanding, GPT-5.2 Thinking achieved 82.1% in the CharXiv scientific chart reasoning task without tools, improving to 88.7% with Python tools. In ScreenSpot-Pro interface understanding, it reached 86.3%, significantly higher than GPT-5.1's 64.2%. In video-based and multimodal Video MMMU, it improved from 82.9% to 85.9%. These advancements enhance its reliability in processing professional visual inputs like scientific charts and operational dashboards.
Microsoft Integration and Future Plans
Microsoft Chairman and CEO Satya Nadella announced on X that GPT-5.2 will be fully integrated into Microsoft 365 Copilot, GitHub Copilot, and Foundry product systems, serving as the new "default inference model." In Microsoft 365 Copilot, users can now select GPT-5.2 for complex tasks such as meeting minute analysis, document reasoning, and strategic planning. Nadella stated that combining the model with user work data allows GPT-5.2 to leverage its reasoning advantages more effectively.
For GitHub Copilot, GPT-5.2 is designed for long-context reasoning and complex codebase review, focusing on engineering use cases like cross-file relationship analysis and refactoring suggestions. GPT-5.2 has also been integrated into Microsoft Foundry and Copilot Studio, enabling developers to use the model when building automated processes or enterprise internal agents. The consumer-facing Copilot will also receive a phased update.
The AI programming assistant Cursor has also launched GPT-5.2, adopting OpenAI's official API pricing.
At the GPT-5.2 launch event, Fidji Simo, head of OpenAI's application business, confirmed that a ChatGPT "adult mode" is expected in the first quarter of 2026. Simo stated that OpenAI aims to ensure the age prediction model is mature enough to accurately identify underage users while avoiding misidentifying adults. This age prediction model is currently undergoing early testing in some countries to apply content restrictions and safety policies automatically.
In addition to its model release, OpenAI announced a licensing agreement with Disney, allowing Sora 2 users to incorporate Disney characters into generated images. Disney will invest $1 billion in OpenAI and has an option to increase its stake.
