In 1969, a German mathematician named Volker Strassen published a paper that should have been impossible. He proved you could multiply two 2x2 matrices using seven multiplications, not eight. Every math student since has been taught that result. For 56 years after, nobody improved on it. Researchers refined specific cases, but the general bound sat there like a wall.
Then, in May 2025, Google DeepMind just walked through it.
AlphaEvolve, an AI coding agent built on Gemini’s language models, discovered a way to multiply two 4x4 matrices over complex numbers using 48 scalar multiplications instead of 49. One operation gone. That one operation, removed from an algorithm that had been optimal since 1969, means billions of matrix calculations across the Internet just got faster.
I keep coming back to this part: it wasn’t a mathematician who found it. It was an algorithm that evolved its way to an answer humans never found.
What AlphaEvolve Actually Is
This isn’t a chatbot you ask for Python help. AlphaEvolve is closer to a search engine that searches for algorithms instead of web pages. You describe a problem, define a scoring function, and hand it to Gemini. The model generates code, the system runs it, scores it, and feeds the best versions back into the loop. Iteration after iteration, it mutates candidates until they stop improving.
The trick is that it doesn’t just use one model. It switches between Gemini Flash for speed and Gemini Pro for heavier lifting when progress stalls. Think of it as evolutionary pressure applied to software itself: bad code dies, fast code breeds.
Pushmeet Kohli, a VP at DeepMind leading their AI for Science team, described it as a “super coding agent” that “produces a result that maybe nobody was aware of.” I think that undersells it. AlphaEvolve isn’t helping humans code. It’s replacing the human-in-the-loop for a class of problems we’ve been too slow to solve ourselves.
The 56-Year Record
Matrix multiplication is everywhere. Machine learning models, graphics engines, scientific simulations, database queries. Behind every recommendation you see, every frame your GPU renders, billions of tiny matrix multiplications are churning away.
Strassen’s 1969 breakthrough showed that the naive approach, multiplying every row by every column, wasn’t actually the floor. There was a cleverer way. After that? Silence. For 56 years, the field hit a plateau. Specific matrix sizes got cracked, but the general bound for 4x4 complex matrices stayed at 49 multiplications.
Here’s why one multiplication matters. Matrices get tiled and composed at enormous scales. An 8x8 matrix multiplication can be decomposed into 4x4 blocks. Instead of 49 operations per block, you now need 48. The gap grows exponentially as matrices get larger. At Google scale, it isn’t theoretical. It’s power, compute, and money.
Where It Matters: Production, Not Just Papers
The most interesting thing about AlphaEvolve isn’t the math. It’s that Google has been running it in production for over a year before even telling anyone.
In Google’s data centers, AlphaEvolve developed a better heuristic for job scheduling on Borg, Google’s cluster manager. It beat a deep reinforcement learning system that had been hand-tuned for years. The result? AlphaEvolve freed up 0.7% of Google’s total worldwide compute resources. At Google scale, that’s not a rounding error.
That’s the part that wakes me up. This isn’t a research curiosity sitting on ArXiv. It’s already saving Google real money and real energy. It also found ways to reduce TPU power consumption and speed up parts of Gemini’s own training pipeline. The tool improved the tool that made the tool.
The Skeptics Have a Point
Not everyone is buying the hype. Simon Frieder, a researcher at Oxford, pointed out that DeepMind has a spotty history of public reproducibility. AlphaFold2 shipped without training scripts. AlphaGeometry had bugs in its release. Frieder’s concern is straightforward: if AlphaEvolve’s automatic evaluator has hidden issues, some of its claimed improvements may be unreliable.
This is a fair objection. The leap from “can verify” to “has verified correctly” is not automatic. The matrix multiplication result has been checked because enough mathematicians care about that specific problem. But the data center scheduling improvements? The TPU optimizations? Validating those independently is much harder because you don’t have access to Google’s infrastructure.
I think Frieder’s warning matters because the headline version of this story, “AI beats mathematicians,” is only partially true. AlphaEvolve beat humans on a specific problem with a verifiable answer. On fuzzier problems, we don’t yet know if the speedups are real or if the evaluator is chasing its own tail.
What Actually Changed This Month
The bigger shift is what Alpha Evolve represents for the future of discovery. For decades, algorithmic breakthroughs were a human activity. You needed intuition, years of training, and a lot of wasted afternoons scribbling on whiteboards. Alpha Evolve doesn’t need intuition. It needs a scoring function and patience.
DeepMind tested it on more than 50 categories of math problems. It matched the best-known human solutions 75% of the time. In 20% of cases, it found something better. That’s not perfect. But it’s a rate of improvement that scales with compute, not with how many PhDs you can hire.
| Category | Result |
|---|---|
| Match best-known human solution | 75% of cases |
| Found better solution than humans | 20% of cases |
| Matrix multiplication (4x4 complex) | 48 multiplications (down from 49) |
| Data center job scheduling (Borg) | 0.7% global compute freed |
| Flash Attention kernel | 32% speedup |
The Mathematician Manuel Kauers at Johannes Kepler University agreed the matrix multiplication result was “likely to have practical relevance.” Even researchers who had been working on the same problem using different methods, who posted a similar result the week before, acknowledged the significance.
Why This Is Not Just About Math
AlphaEvolve is general purpose. Any problem that can be expressed as an algorithm and evaluated automatically is fair game. DeepMind has already talked about applying it to materials science, drug discovery, and sustainability modeling. The architecture is simple enough that the real constraint isn’t the method. It’s your ability to write a good evaluation function.
That feels like the bigger story. We’re entering a phase where some types of human expertise, specifically the kind that involves searching through algorithmic design space for small improvements, are now economically inferior to automated search. Not all expertise. Not creative leaps. But the grueling, iterative optimization work that fills most of applied mathematics and engineering? That’s getting automated fast.
“While AlphaEvolve is currently being applied across math and computing, its general nature means it can be applied to any problem whose solution can be described as an algorithm, and automatically verified. We believe AlphaEvolve could be transformative across many more areas such as material science, drug discovery, sustainability and wider technological and business applications.” — Google DeepMind
The Quiet Takeaway
Here’s what gets me. Google didn’t announce this with a product demo or a flashy keynote. They published a paper, slipped it into production, and kept running it for a year. The announcement felt almost incidental. Like the thing was already too useful to keep secret.
And maybe that’s the real pattern. The AI systems that change how work gets done won’t arrive with fanfare. They’ll show up in production dashboards showing slightly lower power draw, slightly faster training times, slightly better scheduling efficiency. Death by a thousand small improvements.
The 56-year record was broken by a tool that sits in a loop, generating code, throwing away the bad stuff, and keeping the good stuff. It’s not creative. It’s not insightful. It’s just fast and relentless and, apparently, better at this specific task than every human who tried for the last half-century.
I’m not sure if that’s exciting or unsettling. Probably both. The truth is probably somewhere boring in the middle—just another tool, another step, another quiet shift in where human effort adds value and where we should just get out of the way.
But I keep thinking about Strassen in 1969, working out that seventh multiplication by hand. And I keep thinking about AlphaEvolve running overnight, testing 16,000 candidate algorithms in the time it would take a human to check maybe one. The gap there isn’t just speed. It’s patience. The machine doesn’t get bored. It doesn’t need coffee. It just keeps going until it finds the thing we missed.
One operation down. 56 years of dormancy ended. And the loop is still running.