AI Models Achieve Near-Perfect Scores on All Three CFA Exam Levels
Advanced AI reasoning models have demonstrated the ability to pass all three levels of the Chartered Financial Analyst (CFA) examination, with some achieving near-perfect scores. This development follows years of AI models struggling with the exam's more complex sections, particularly essay questions.
Historically, obtaining CFA certification has required extensive study, often exceeding 1,000 hours for human candidates. Recent evaluations, however, show AI models completing the exam in minutes.
AI Conquers Financial Industry's "Hardest Exam"
The CFA certification is widely regarded as one of the most challenging qualifications in finance, requiring sequential passage of three levels that cover foundational knowledge, applied analysis, and complex portfolio construction.
In 2023, AI models could only answer some CFA questions with inconsistent performance, particularly struggling with the essay questions in Level III. By July of the current year, AI models were able to pass the exam. Researchers from NYU Stern School of Business and the AI wealth management platform GoodFin explored whether AI possessed the analytical reasoning skills required for professional financial decisions.
Their study involved a review of 23 large language models, testing their performance on multiple-choice and essay questions from CFA Level III mock exams. These exams focus on portfolio management and wealth planning. The results indicated that models such as o4-mini, Gemini 2.5 Pro, and Claude Opus, when utilizing "chain-of-thought" prompting techniques, successfully passed.
Anna Joo Fee, founder and CEO of GoodFin, stated that this technology is expected to reshape the financial industry. Further research, published on the 9th of this month, confirmed that current-generation reasoning models not only passed the Level III exam but also approached perfect scores in some subjects.
AI's New Scores "Break the Defense"
A research team comprising members from Columbia University, Rensselaer Polytechnic Institute, and the University of North Carolina tested six reasoning models using a question bank of 980 CFA questions across all three levels.
The test sets included:
Level I: Three papers, 540 multiple-choice questions (180 per paper).
Level II: Two papers, 176 multiple-choice questions (88 per paper), structured into 22 item sets each containing 4 questions.
Level III: Three papers, 264 questions (88 per paper), with a mixed format of 11 item sets (44 multiple-choice questions) and 11 constructed-response case studies (44 essay questions).
The mock exams followed a standard and representative structure, covering topics from ethical conduct and equity investment practices to foreign currency statement translation and asset allocation theory.
All six tested models—Gemini 3.0 Pro, Gemini 2.5 Pro, GPT-5, Grok 4, Claude Opus 4.1, and DeepSeek-V3.1—passed all levels according to established standards, with some scores nearing perfection.
Gemini and GPT-5 Lead Performance
In the Level I exam, Gemini 3.0 Pro achieved an accuracy rate of 97.6%, followed by GPT-5 at 96.1% and Gemini 2.5 Pro at 95.7%. DeepSeek-V3.1, the lowest performer in this section, still achieved 90.9%.
For the Level II exam, which emphasizes application and analysis, GPT-5 led with 94.3% accuracy. Gemini 3.0 Pro and Gemini 2.5 Pro followed with 93.2% and 92.6%, respectively. Researchers noted that these models performed "almost perfectly" at this stage. However, the "Ethics" section remained a weakness for AI, with even the strongest models showing a 17% to 21% error rate in ethics-related Level II questions.
In the complex Level III exam, Gemini 2.5 Pro achieved an 86.4% accuracy rate in the multiple-choice section. For the essay question section, which requires generative ability, Gemini 3.0 Pro achieved a 92.0% score rate, a significant improvement over the previous generation model's 82.8%.
The research team used the o4-mini model for automated grading of open-ended questions, acknowledging potential measurement errors and "verbosity bias" where longer answers might score higher. The test results are thus considered model-based estimates.
Passing standards were set at no less than 60% in any Level I subject and 70% overall; no less than 50% in any Level II subject and 60% overall; and an average score rate of at least 63% in both multiple-choice and essay sections for Level III.
Researchers concluded that the professional capabilities of these reasoning models now surpass the requirements for junior to mid-level financial analysts and could potentially reach senior analyst levels. While previous large language models mastered "codified knowledge" in Level I and II, the latest generation has acquired the "synthesis skills" necessary for Level III.
Despite these advancements, limitations persist. Benchmarking, particularly in multiple-choice formats, offers a partial view of model capabilities. However, the rapid progress from "failing" to "near-perfect" in two years highlights AI's fast evolution in professional fields.
Implications of AI Passing the CFA
The ability of machines to pass professional certifications raises questions about the future roles of human analysts. Matthias Bastian, a media industry entrepreneur, noted that passing the exam does not equate to handling the daily tasks of a financial analyst, which include client interaction, assessing market sentiment, and making decisions with incomplete information.
The research also highlighted that models continue to struggle with "ethical" questions, which often require deep contextual understanding and value judgments. Exams test isolated knowledge points, not the flexible application of knowledge in complex real-world scenarios.
Concerns about "data contamination" were also raised. Although the tests used copyrighted materials, it is possible that exam questions or their variants might have permeated the models' training data, suggesting memorization rather than true logical reasoning.
Dr. Ingrid Tierens, a CFA and Head of Data Strategy Team at Goldman Sachs Global Investment Research, stated that AI passing the CFA exam is an expected outcome, given its performance in other challenging exams like the Olympiad in Mathematics. She believes the CFA exam's clearly defined knowledge system, massive homogeneous training data, and standardized format are areas where AI excels.
The progression of AI mirrors historical technological advancements in finance, from calculators to computers and programming languages. Benjamin Graham, a key figure in the CFA certification's development, expressed optimism about the future of financial analysis in a 1963 article, anticipating technological integration.
The consensus is that AI is an unstoppable force, and the focus should be on leveraging its capabilities effectively within safety guardrails. This could free human analysts for strategic thinking, complex problem-solving, and deeper client engagement.
While AI may not fully replace investment experts in the short term, human professionals will need to demonstrate adaptability, critical thinking, and innovation beyond rote memorization. Exceptional investment performance often stems from identifying "outliers" and hidden information, which extends beyond exam content.
Benjamin Graham's 1963 remarks remain relevant: "No matter how things change, one thing I firmly believe: the future path of financial analysis, like the past, will have more than one road to success."
