
OpenAI’s latest model has achieved a gold-level score on the 2025 International Math Olympiad. It answered five out of the six questions under exam conditions, scoring 35 out of a possible 42 points.
The International Math Olympiad is known to be the most prestigious and challenging mathematics competition for high school students in the world. Only about 10% of this year’s competitors received gold medals, and numerous Fields Medalists have won it in the past. Each competitor has two 4.5-hour sessions to complete the six questions without access to the internet or any tools.
AI models’ mixed success at solving math problems
Artificial intelligence models are not known to excel at complex mathematical problems because they can struggle to understand logic. And yet, recently, Gemini 2.5 Pro and OpenAI’s o3 scored 86.7% and 88.9%, respectively, in the American Invitational Mathematics Examination, a key math benchmark for AI models. In contrast, in September 2024, o1 scored 83% in just a qualifying exam for the International Olympiad. And, Grok 4 reportedly got a perfect 100% on AIME (math olympiad problems).
“IMO problems demand a new level of sustained creative thinking compared to past benchmarks,” OpenAI researcher Alexander Wei posted on X after announcing the unreleased model’s milestone. His colleague, Noam Brown, said that just last year, AI labs were using grade school math as a benchmark, referring to the GSM8K test.
OpenAI CEO Sam Altman said the experimental model was “an LLM doing math and not a specific formal math system” like AlphaGeometry, indicating that the company is well on its way to achieving general intelligence.
Manon Bischoff, an editor at the German-language version of Scientific American, predicted in January 2024 that it would be “a few years” before AI models could conceivably compete in the International Math Olympiad; however, AI models are improving quickly. At the time, Bischoff was announcing the release of the math-specific model AlphaGeometry, which could solve 54% of all the geometry questions included in the competition over the last 25 years. By February, a second-generation version could solve 84% of them.
Questions arise about OpenAI’s gold medal at IMO
Not everyone is convinced of OpenAI’s leaps and bounds in mathematical capabilities.
According to Google DeepMind researcher Thang Luong and OpenAI’s former CTO Mikhail Samin, OpenAI’s model was not graded based on the International Math Olympiad’s official guidelines, and thus its claims to be a gold medallist are not verifiable. Wei said on X that “three former IMO medalists independently graded the model’s submitted proof” and reached “unanimous consensus” on their scores.
OpenAI doesn’t have the strongest reputation when it comes to benchmarking the mathematical ability of its models. In April, Epoch AI, the independent research institute behind the FrontierMath benchmark, found that the o3 model could correctly answer only about 10% of the advanced problems, a steep decline from the over 25% accuracy originally claimed by OpenAI in December 2024.
It will be difficult for anyone to conduct the same level of independent verification on the experimental model that took part in the Olympiad until it is released. Unfortunately, Wei confirmed that OpenAI does not “plan to release anything with this level of math capability for several months,” and as GPT-5 is coming “soon,” it’s unlikely that this experimental system will be part of that release.
Mathematical ability is clearly an important quality for OpenAI. Last month, it released the o3-pro model, which it dubbed its most intelligent yet.