Apple Research Questions AI Reasoning Models Just Days Before WWDC

A newly published Apple Machine Learning Research study has challenged the prevailing narrative around AI "reasoning" large-language models like OpenAI's o1 and Claude's thinking variants, revealing fundamental limitations that suggest these systems aren't truly reasoning at all.

ml research apple
For the study, rather than using standard math benchmarks that are prone to data contamination, Apple researchers designed controllable puzzle environments including Tower of Hanoi and River Crossing. This allowed a precise analysis of both the final answers and the internal reasoning traces across varying complexity levels, according to the researchers.

The results are striking, to say the least. All tested reasoning models – including o3-mini, DeepSeek-R1, and Claude 3.7 Sonnet – experienced complete accuracy collapse beyond certain complexity thresholds, and dropped to zero success rates despite having adequate computational resources. Counterintuitively, the models actually reduce their thinking effort as problems become more complex, suggesting fundamental scaling limitations rather than resource constraints.

Perhaps most damning, even when researchers provided complete solution algorithms, the models still failed at the same complexity points. Researchers say this indicates the limitation isn't in problem-solving strategy, but in basic logical step execution.

Models also showed puzzling inconsistencies – succeeding on problems requiring 100+ moves while failing on simpler puzzles needing only 11 moves.

The research highlights three distinct performance regimes: standard models surprisingly outperform reasoning models at low complexity, reasoning models show advantages at medium complexity, and both approaches fail completely at high complexity. The researchers' analysis of reasoning traces showed inefficient "overthinking" patterns, where models found correct solutions early but wasted computational budget exploring incorrect alternatives.

The take-home of Apple's findings is that current "reasoning" models rely on sophisticated pattern matching rather than genuine reasoning capabilities. It suggests that LLMs don't scale reasoning like humans do, overthinking easy problems and thinking less for harder ones.

The timing of the publication is notable, having emerged just days before WWDC 2025, where Apple is expected to limit its focus on AI in favor of new software designs and features, according to Bloomberg.

Popular Stories

General Apps Messages Redux

iOS 26: New Messages and Phone App Features Leaked Ahead of WWDC

Friday June 6, 2025 7:27 am PDT by
Apple is planning to announce several new features for the Messages and Phone apps on iOS 26, according to Bloomberg's Mark Gurman. In a lengthy report outlining his WWDC 2025 expectations today, Gurman said that the two main changes in the Messages app will be the ability to create polls, as well as the option to set a background image within a conversation. 9to5Mac was first to report...
iPhone 17 Air Size Feature

'iPhone 17 Air' Launching Later This Year With These 17 New Features

Friday June 6, 2025 6:17 am PDT by
While the so-called "iPhone 17 Air" is not expected to launch until September, there are already plenty of rumors about the ultra-thin device. Overall, the iPhone 17 Air sounds like a mixed bag. While the device is expected to have an impressively thin and light design, rumors indicate it will have some compromises compared to iPhone 17 Pro models, including worse battery life, only a single ...
iOS 26 white

iOS 26's Digital Glass Design: Home Screen Widgets, Camera, and More

Friday June 6, 2025 8:32 am PDT by
In a lengthy report outlining his WWDC 2025 expectations today, Bloomberg's Mark Gurman shared more details about iOS 26's rumored new design. According to Gurman, iOS 26 will feature a "digital glass" design inspired by visionOS, the operating system for Apple's Vision Pro headset. That is a well-known rumor by now, but he goes on to provide some more specific details, as listed below:There ...
macOS Tahoe Render

macOS Tahoe Might Support One Fewer Mac Than Previously Rumored

Saturday June 7, 2025 5:27 am PDT by
macOS 26 will drop support for several older Intel-based Mac models currently compatible with macOS Sequoia, according to a private account on X with a proven track record of leaking information about Apple's software platforms. macOS 26 will be compatible with the following Mac models, the account said:MacBook Air (M1 and later) MacBook Pro (2019 and later) iMac (2020 and later) Mac...
AirTag Backpack

New AirTag With Three Upgrades is 'Nearly Ready' to Launch

Sunday June 8, 2025 11:44 am PDT by
Apple's long-rumored AirTag 2 might be coming soon. In his Power On newsletter today, Bloomberg's Mark Gurman briefly mentioned that a new AirTag is "nearly ready" to launch. Last year, he said that it would be released around the middle of 2025, and the midpoint of the year is just a few weeks away. "The new AirTag is nearly ready, having been prepared for launch over the past several...

Top Rated Comments

citysnaps Avatar
3 hours ago at 05:52 am
I don't find this surprising at all.
Score: 17 Votes (Like | Disagree)
trip1ex Avatar
3 hours ago at 05:56 am
Breaking news. The people who pretended otherwise always had something to sell.
Score: 16 Votes (Like | Disagree)
zorinlynx Avatar
3 hours ago at 06:08 am
LLM GenAI is pretty garbage technology. The less time it takes people to realize this, the better.

Yes, it does have some niche uses. But people are trying to push it as a solution to everything and even as far as replacing human beings, and it's just not capable of that. Not only that, but why do we want to replace human beings? Especially in the arts? I'd rather look at things made by people. It doesn't matter how visually stunning something is; art has no soul if there is no artist.
Score: 16 Votes (Like | Disagree)
turbineseaplane Avatar
3 hours ago at 06:20 am
“….and now here’s Ashley to talk about some new Genmoji!”
Score: 14 Votes (Like | Disagree)
Orange Bat Avatar
3 hours ago at 06:03 am
Of course. “AI” is just a marketing term at this point, and not any kind of actual intelligence. These AIs are really just glorified search engines that steal peoples’ hard work and regurgitate that work as if the data is it’s own. We’re just living in an “AI bubble” that will burst sooner rather than later.
Score: 12 Votes (Like | Disagree)
Salty Pirate Avatar
3 hours ago at 05:54 am
So AI is nothing more than clever programing?
Score: 11 Votes (Like | Disagree)