
Audio By Carbonatix
Last summer, researchers at Apple released a very important research paper, “The Illusion of Thinking, Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity.”
Essentially this was blowing the whistle on the AI industry’s own hype. The paper showed that that current LRMs (“large reasoning models”) do not have a robust, scale-free reasoning faculty and that benchmark accuracy oversells what they can do. The paper itself may have oversold the authors’ own findings, by positing a “collapse” to zero accuracy at a certain level of complexity.
But the fact that there is a mathematical limitation on what computers can do, even when they’ve been trained on the vast bulk of significant text that man has produced, seems intuitive.
The paper’s main finding was that using a controllable-puzzle methodology demonstrated that LRMs could come to correct answers, but they do not “reason” themselves to an answer. So too the observation that these models are wasting computing resources on easy problems.
Even though newer AI models are doing astonishing things, none of the new model releases seriously challenge the strongest points that Apple’s researchers made. These models are finding answers to long-unsolved math equations and they are occasionally coming up with medical breakthroughs, or at least highly suggestive research. But they are not “reasoning” themselves to these conclusions in the way we understand. This is not merely a semantic point about reasoning, but a real and, to my mind, ongoing dilemma for the industry.





