OpenAI GPT-5.2 Released: Enhanced Reasoning for Professional

OpenAI has released GPT-5.2, marking its first major model introduction since the emergence of Gemini 3 Pro. The launch coincides with OpenAI's tenth anniversary and follows an internal "red alert" status declared by CEO Sam Altman. OpenAI describes GPT-5.2 as "the most capable model series yet for professional knowledge work."

Performance Benchmarks and New Evaluation Sets

GPT-5.2 demonstrates improved performance across traditional benchmarks. In evaluations for software engineering (SWE-Bench Pro), scientific questions (GPQA Diamond), and math competitions (AIME 2025), the model has regained a leading position. It also shows advancements in front-end aesthetics, 3D elements, and visual understanding. For instance, when tasked with identifying components in low-quality image inputs and returning approximate bounding boxes, GPT-5.2 accurately identifies main regions and places boxes that roughly match true positions, a notable improvement over GPT-5.1's limited and spatially inaccurate labeling.

Beyond traditional metrics, two new evaluation sets highlight significant advancements: ARC-AGI-2 and GDPval.

Fluid Intelligence with ARC-AGI-2

ARC-AGI-2, or Abstraction and Reasoning Corpus, measures a model's general intelligence and ability to infer rules and generalize from new, unseen problems, rather than relying on existing knowledge. This capability, known as Fluid Intelligence, assesses logical reasoning and pattern identification in novel situations. Previously, top AI models scored low on this test. GPT-5.1 achieved 17.6% on ARC-AGI-2, while GPT-5.2 scored 52.9%, tripling its predecessor's performance and dominating the leaderboard.

Real-World Economic Value with GDPval

OpenAI's newly introduced GDPval evaluation set measures AI performance on "real-world, economically valuable tasks." This framework moves beyond traditional benchmarks like coding or knowledge recall to assess AI's utility across a broad spectrum of professional knowledge work.

OpenAI selected 44 core professions from the nine U.S. industries contributing most to GDP. Senior experts, with an average of 14 years of experience, created 1,320 professional knowledge tasks based on real work outcomes. These tasks, which can involve multimodal inputs such as PDFs, Excel spreadsheets, images, and presentations, often require human experts up to seven hours, and sometimes weeks, to complete.

Model and human outputs are then blind-reviewed by independent experts who evaluate which submission they would prefer to give to a client. On this GDPval set, GPT-5.2 Thinking achieved a win or draw rate of 70.9% against industry experts, while the GPT-5.2 Pro model reached 74.1%. In contrast, GPT-5 scored 38.8%.

Context Understanding and Availability

GPT-5.2 also features enhanced context understanding. In a "needle in a haystack" test, where four pieces of information are embedded in a 256K document, GPT-5.2 achieved 100% accuracy in retrieving the information. Accuracy for eight embedded items also showed significant improvement over GPT-5.1. The model incorporates the latest knowledge cutoff date.

GPT-5.2 is currently available to ChatGPT paid members and will be rolled out to free members tomorrow, replacing GPT-5.1. Paid members will retain access to GPT-5.1 for an additional three months. Developers can access GPT-5.2 via API, with pricing slightly higher than GPT-5.1.