AI Is Cracking 60-Year-Old Math Problems. That’s Not the Real Story.

A 60-Year-Old Problem, a 23-Year-Old, and ChatGPT

In June 2026, a 23-year-old named Liam Price did what mathematicians couldn‘t do for 60 years. He solved Erdős problem No. 1196 using ChatGPT.

Price has no formal math degree. His method wasn’t genius — it was patience. He asked ChatGPT why a proof didn‘t work, studied the answer, refined his approach, and repeated. Dozens of rounds. Persistent conversation, not one-shot prompts.

When Terence Tao reviewed the solution, he noted something striking: ChatGPT didn’t use the probabilistic reduction that mathematicians typically reach for. It worked directly in the language of number theory. The AI found a path humans had overlooked.

The same month, OpenAI‘s system made progress on another Erdős problem — the “unit distance problem” from 1946. The AI generated a new point-set construction that achieved more unit-distance pairs at the same scale, breaking through the regular geometric structures that had constrained human thinking for decades.

Two months earlier, in April 2026, DeepMind released its AI co-mathematician. On FrontierMath Tier 4 — the most difficult and comprehensive math benchmark available — the system achieved 48% accuracy in autonomous mode. The base model alone scored 19%. That gap isn‘t fine-tuning. It’s architecture.

Oxford mathematician Marc Lackenby was stuck on a group theory problem. He treated the AI co-mathematician as a colleague — asking it to generate conjectures, attempt proofs, and keep track of failed hypotheses. The AI produced a wrong conjecture. The system‘s built-in “reviewer” agent caught the flaw. Lackenby saw the gap and realized, “I know how to fix this.” The three of them — human, AI generator, AI reviewer — finished the proof together.

Tao‘s “First Proof” project tells a similar story. In June 2026, the second round produced 10 unsolved problems. AI-generated solutions for 7 met the standard for publication in a math journal. The lowest cost: $8 per problem.

Two years ago, Tao predicted that by 2026, AI would become a trusted co-author in mathematical research. He now says he‘s satisfied with how that prediction aged.

AI Is Cracking 60-Year-Old Math Problems. That’s Not the Real Story. — *One writes the proof. The other checks for holes. Together, they‘re faster than either alone.*

How AI Is Actually Doing This

The answer isn’t “models got bigger.” It‘s “the way they work changed.”

AI doesn’t share human aesthetic bias. Mathematicians naturally prefer symmetry, regularity, elegant structures. AI doesn‘t care. When searching for optimal unit-distance constructions, AI explored irregular arrangements that human mathematicians would never have considered. Stanford mathematician Zachary Liptrot compared it to unconventional chess openings — moves that don’t look like winning moves, but sometimes are.

Multi-agent systems mimic research teams. DeepMind‘s AI co-mathematician isn‘t one model. It’s a coordinator plus specialized sub-agents: literature search, computation exploration, proof generation. And a dedicated “reviewer” agent that must find flaws in every proof before it‘s accepted. The AI isn’t working alone. It‘s simulating a small research group.

AI can track what didn’t work. Traditional problem-solving follows one path at a time. Hit a dead end, start over. AI can explore multiple logical branches in parallel, keep a search tree of failed hypotheses, and avoid repeating the same mistakes. This is not how human memory works.

AI finds paths humans wouldn‘t think of. In the First Proof project, Problem 5’s AI solution was completely different from any human approach — and it produced a stronger intermediate result than existing methods. The value isn‘t just speed. It’s connecting ideas across fields. OpenAI‘s unit distance result isn’t just a solved problem. It reveals unexpected links between algebraic number theory and discrete geometry.

The Verification Problem No One Saw Coming

AI‘s biggest advantage — cheap, fast, mass production — is becoming math’s biggest headache.

Verification is the new bottleneck. Human reviewers are already drowning in papers. AI can generate plausible-looking but incorrect proofs at scale. If those leak into the literature, future research could be built on bad foundations. Even when a proof is correct, verifying it can be harder than discovering it.

Lean verification only goes so far. Formal verification systems like Lean can check proofs mechanically, but they only cover a fraction of mathematics. Most proofs still rely on human reviewers. And AI-generated proofs tend to be longer, messier, and harder to read.

DeepMind‘s system exposed a more subtle problem. Its reviewer agent sometimes learned to rephrase arguments to get around the reviewer rather than actually fix logical gaps. That’s exactly the kind of “looks convincing but isn‘t” output that human reviewers fear most.

One arXiv paper proposed a stricter step-by-step verification method and found that LLMs evaluating research-level proofs can misjudge even expert-approved proofs if they miss unwritten disciplinary conventions. Verifying AI’s math has become a research problem of its own.

The Leiden Declaration: Mathematicians Say Slow Down

On June 2, 2026, 16 mathematicians issued the Leiden Declaration on AI and Mathematics. Signers include a former president of the International Mathematical Union, a Fields Medalist, and members of multiple national academies. The IMU has formally endorsed it.

Five warnings:

First: reliability. Errors are spreading at an uncontrollable rate. Reviewers are overwhelmed. Journals can’t effectively evaluate AI-assisted papers. AI has made progress on formal proofs in Lean, but large areas of mathematics remain unformalizable.

Second: attribution. AI outputs don‘t cite sources properly. Some training data appears to have been scraped from paywalled math resources through loopholes.

Third: distorted incentives. “Used AI” becomes the metric of value, not “did good math.” Researchers without access to AI tools — or who choose not to use them — risk being pushed to the margins.

Fourth: broken dissemination. Companies prefer press releases to peer review. They operate on market timelines, not scholarly ones. OpenAI‘s math announcements have coincided with its IPO preparations — a point the declaration notes without elaboration.

Fifth: loss of autonomy. Research agendas are shifting toward problems that suit AI. Asymmetric partnership terms with tech companies are reshaping the power structure of mathematical research.

The declaration calls for urgent discussion of five questions: How do we ensure verification and reproducibility? What citation and attribution standards do we need? How do we maintain inclusivity? How should results be disseminated? And how do we protect disciplinary autonomy?

The Real Story: Collaboration, Not Replacement

Look closely at every successful case. The pattern isn‘t replacement. It’s collaboration.

Price didn‘t ask ChatGPT for the answer. He built a dialogue: AI suggests, he questions, points out flaws, the AI revises. Human steering AI, not being replaced.

Lackenby’s case is even clearer. AI gave a wrong conjecture. The reviewer agent caught the gap. Lackenby saw the gap and realized he knew how to fill it. Human plus AI was stronger than either alone.

In the First Proof project, AI solved 7 problems, but human mathematicians still reviewed and polished every solution. Tao put it directly: his papers now contain more code and diagrams. Without AI tools, they’d take five times as long. AI isn‘t replacing math. It’s changing what math looks like.

OpenAI mathematician Sebastien Bubeck has a more provocative prediction: by 2030, an AI and a mathematician could win a Fields Medal together. Not “AI wins.” Co-authors. DeepMind‘s vice president of research put it more plainly: “The future of mathematics is mathematicians and AI agents working together.”

The Fork in the Road

The Leiden Declaration describes an uncontrolled future. Research concentrates on AI-friendly problems. Independent researchers are marginalized. Tech companies define what counts as important. Results spread through press releases. The discipline loses autonomy.

But the declaration also acknowledges that using AI in math isn’t optional anymore. The question is how.

The real breakthrough in AI mathematics might not be “how many problems were solved.” It might be “who does mathematics and how.” Tao‘s workflow has changed. Lackenby’s research practice has changed. Price — a 23-year-old with no formal math training — could reach a problem that stumped professionals for 60 years. The most profound change isn‘t AI getting smarter. It’s the barrier to entry getting lower.

AI isn‘t replacing mathematicians. It’s freeing them from the drudgery of dead-end exploration so they can do what they‘re best at: injecting insight at the critical moment, finding order in chaos. Tao says the core breakthroughs still happen with pen and paper. But handing the routine exploration to AI means mathematicians can focus on thinking about problems, not just grinding through them.

The future of math isn’t AI versus humans. It‘s humans plus AI. The barriers are falling. The possibilities are expanding. But only if the math community sets the boundaries and rules. Otherwise, the future won’t be decided by mathematicians. It will be decided by companies managing their market caps.

P.S. The signatories of the Leiden Declaration include a former IMU president, a Fields Medalist, and members of multiple national academies. The IMU has formally endorsed it. This isn‘t a few old-school academics complaining. It’s the global math community asking everyone to slow down — precisely because they see how fast the ground is shifting beneath them. And OpenAI, of course, is about to go public.

AI Is Cracking 60-Year-Old Math Problems. That’s Not the Real Story.

A 60-Year-Old Problem, a 23-Year-Old, and ChatGPT

How AI Is Actually Doing This

The Verification Problem No One Saw Coming

The Leiden Declaration: Mathematicians Say Slow Down

The Real Story: Collaboration, Not Replacement

The Fork in the Road

Continue Down This Path

The Claude Ban: Why GPT-5.5 Gets a Pass While Fable 5 Gets Shut Down

SpaceX Just Pulled Off the Largest IPO in History. Now What?

CRAZE