Published Date : 12/7/2025
On a weekend in mid-May, a clandestine mathematical conclave convened in Berkeley, California. Thirty of the world's most renowned mathematicians traveled from various parts of the globe, including the U.K., to participate. The purpose of this gathering was to test a reasoning chatbot powered by OpenAI's o4-mini model, which was tasked with solving a series of complex mathematical problems.
The mathematicians, some of the brightest minds in the field, were tasked with creating and solving problems that would challenge the AI. The chatbot, powered by o4-mini, a reasoning large language model (LLM), was designed to make highly intricate deductions. Google's equivalent, Gemini 2.5 Flash, has similar capabilities. Unlike traditional LLMs, o4-mini and its counterparts are lighter-weight, more nimble models that train on specialized datasets with stronger reinforcement from humans. This approach allows the chatbot to delve deeper into complex mathematical problems than traditional LLMs.
To track the progress of o4-mini, OpenAI previously tasked Epoch AI, a nonprofit that benchmarks LLMs, with creating 300 math questions whose solutions had not yet been published. Traditional LLMs could correctly answer many complicated math questions, but when Epoch AI tested several models with these novel questions, the most successful could solve less than 2 percent. This demonstrated that these LLMs lacked the ability to reason. However, o4-mini would prove to be very different.
Epoch AI hired Elliot Glazer, a recent math Ph.D. graduate, to join the new collaboration for the benchmark, dubbed FrontierMath, in September 2024. The project collected novel questions across varying tiers of difficulty, covering undergraduate-, graduate-, and research-level challenges. By April 2025, Glazer found that o4-mini could solve around 20 percent of the questions. He then moved on to a fourth tier: a set of questions that would be challenging even for an academic mathematician. Only a small group of people in the world would be capable of developing such questions, let alone answering them. The mathematicians who participated had to sign a nondisclosure agreement, requiring them to communicate solely via the messaging app Signal to avoid contaminating the dataset.
Each problem the o4-mini couldn't solve would garner the mathematician who came up with it a $7,500 reward. The group made slow, steady progress in finding questions. But Glazer wanted to speed things up, so Epoch AI hosted the in-person meeting on Saturday, May 17, and Sunday, May 18. The 30 attendees were split into groups of six. For two days, the academics competed against themselves to devise problems that they could solve but would trip up the AI reasoning bot.
By the end of that Saturday night, Ken Ono, a mathematician at the University of Virginia and a leader and judge at the meeting, was frustrated with the bot. He asked o4-mini to solve a problem that experts in his field would recognize as an open question in number theory, a good Ph.D.-level problem. Over the next 10 minutes, Ono watched in stunned silence as the bot unfurled a solution in real time, showing its reasoning process. The bot spent the first two minutes finding and mastering the related literature in the field. Then it wrote on the screen that it wanted to try solving a simpler
Q: What is o4-mini?
A: o4-mini is a reasoning large language model (LLM) developed by OpenAI, designed to make highly intricate deductions and solve complex mathematical problems.
Q: How did the mathematicians test the AI?
A: The mathematicians created and solved complex mathematical problems to test the AI, with the goal of finding questions that the AI could not solve.
Q: What were the results of the meeting?
A: The AI, powered by o4-mini, outperformed the mathematicians by solving many of the complex problems, including some that were considered challenging even for academic mathematicians.
Q: What concerns do mathematicians have about AI's progress?
A: Mathematicians are concerned that AI's results might be trusted too much, leading to a reliance on AI that could undermine the role of human mathematicians.
Q: What is the future outlook for mathematicians with the rise of AI?
A: Mathematicians may shift to posing questions and interacting with reasoning-bots to discover new mathematical truths, similar to how a professor works with graduate students.