Apple Research Questions AI Reasoning Models Just Days Before WWDC

A newly published Apple Machine Learning Research study has challenged the prevailing narrative around AI "reasoning" large-language models like OpenAI's o1 and Claude's thinking variants, revealing fundamental limitations that suggest these systems aren't truly reasoning at all.

ml research apple
For the study, rather than using standard math benchmarks that are prone to data contamination, Apple researchers designed controllable puzzle environments including Tower of Hanoi and River Crossing. This allowed a precise analysis of both the final answers and the internal reasoning traces across varying complexity levels, according to the researchers.

The results are striking, to say the least. All tested reasoning models – including o3-mini, DeepSeek-R1, and Claude 3.7 Sonnet – experienced complete accuracy collapse beyond certain complexity thresholds, and dropped to zero success rates despite having adequate computational resources. Counterintuitively, the models actually reduce their thinking effort as problems become more complex, suggesting fundamental scaling limitations rather than resource constraints.

Perhaps most damning, even when researchers provided complete solution algorithms, the models still failed at the same complexity points. Researchers say this indicates the limitation isn't in problem-solving strategy, but in basic logical step execution.

Models also showed puzzling inconsistencies – succeeding on problems requiring 100+ moves while failing on simpler puzzles needing only 11 moves.

The research highlights three distinct performance regimes: standard models surprisingly outperform reasoning models at low complexity, reasoning models show advantages at medium complexity, and both approaches fail completely at high complexity. The researchers' analysis of reasoning traces showed inefficient "overthinking" patterns, where models found correct solutions early but wasted computational budget exploring incorrect alternatives.

The take-home of Apple's findings is that current "reasoning" models rely on sophisticated pattern matching rather than genuine reasoning capabilities. It suggests that LLMs don't scale reasoning like humans do, overthinking easy problems and thinking less for harder ones.

The timing of the publication is notable, having emerged just days before WWDC 2025, where Apple is expected to limit its focus on AI in favor of new software designs and features, according to Bloomberg.

Popular Stories

Foldable iPhone 2023 Feature Iridescent 1

Foldable iPhone's Display Sizes Leaked

Tuesday July 22, 2025 6:00 pm PDT by
Apple's first foldable iPhone will be equipped with a 7.8-inch inner display, and a 5.5-inch outer display, according to Taiwanese research firm TrendForce. Apple supply chain analyst Ming-Chi Kuo already mentioned those same display sizes for the foldable iPhone in March, meaning there are now multiple sources backing those sizes, so long as TrendForce is not simply copying what Kuo said. ...
Apple AppleCare One hero

Apple Announces 'AppleCare One' Subscription Plan for Multiple Devices

Wednesday July 23, 2025 5:06 am PDT by
Apple today announced AppleCare One, a new subscription plan for customers to cover multiple devices with a single plan. AppleCare One starts at $19.99 per month for up to three products, with the ability to add more for $5.99 per month for each additional device. The plan incudes all of the benefits that come with AppleCare+, such as unlimited repairs for accidents, priority support,...
iOS 26 Feature

Everything New in iOS 26 Beta 4

Tuesday July 22, 2025 3:56 pm PDT by
Apple released the fourth beta of iOS 26 today, and the company has continued making changes to the way that Liquid Glass looks. There are also new features, including the return of Apple Intelligence Notification Summaries for news. This beta is of particular interest because it's likely the beta that public beta testers will get in the not too distant future. Liquid Glass Changes Liquid...
iPhone 17 Pro on Desk Centered 1

iPhone 17 Pro Launching in Two Months With These 16 New Features

Tuesday July 22, 2025 5:00 pm PDT by
Apple's iPhone 17 Pro and iPhone 17 Pro Max are less than two months away, and there are plenty of rumors about the devices. Below, we recap key changes rumored for the iPhone 17 Pro models, as of July 2025:Aluminum frame: iPhone 17 Pro models are rumored to have an aluminum frame, whereas the iPhone 15 Pro and iPhone 16 Pro models have a titanium frame, and the iPhone X through iPhone 14...
CarPlay Liquid Glass Dark

iOS 26's Biggest CarPlay Feature Was Quietly Hiding on Apple's Website

Monday July 21, 2025 7:45 am PDT by
Apple recently announced that iPhone users will soon be able to watch videos right on the CarPlay screen in supported vehicles. This is arguably the biggest new CarPlay feature coming with the iOS 26 update later this year, and yet Apple did not even mention it during its WWDC 2025 keynote last month. Instead, it was buried on Apple's developer website. iPhone users will be able to...
iphone 16 pro models 1

18 Reasons to Wait for the iPhone 17

Tuesday July 22, 2025 8:10 am PDT by
Apple's iPhone development roadmap runs several years into the future and the company is continually working with suppliers on several successive iPhone models simultaneously, which is why we often get rumored features months ahead of launch. The iPhone 17 series is now less than two months away, so we already have a good idea of what to expect from Apple's 2025 smartphone lineup. If you...
Liquid Glass Realistic

Apple Improves Liquid Glass in iOS 26 Beta 4, Reversing Some Beta 3 Changes

Tuesday July 22, 2025 11:21 am PDT by
With the fourth beta of iOS 26, Apple has again made changes to the Liquid Glass design that's available across the operating system, tweaking how the menus and buttons appear in apps. In response to criticism about too little Liquid Glass in beta 3, Apple has upped the translucency in several areas. Beta 4 on left, beta 3 on right Navigation bars in apps like Photos, Music, the App ...
iOS 26 Feature

iOS 26 Public Beta Appears Imminent Based on This Latest Hint

Monday July 21, 2025 7:32 pm PDT by
An anonymous leaker with a proven track record today shared alleged build numbers for the fourth developer betas of iOS 26 and more. The private account on X has accurately leaked build numbers for Apple software updates in the past. We do not link to the account at the owner's request. Here are all of the build numbers shared by the account today: iOS 26 and iPadOS 26 beta 4 (23A5297i)...

Top Rated Comments

citysnaps Avatar
6 weeks ago
I don't find this surprising at all.
Score: 24 Votes (Like | Disagree)
trip1ex Avatar
6 weeks ago
Breaking news. The people who pretended otherwise always had something to sell.
Score: 22 Votes (Like | Disagree)
zorinlynx Avatar
6 weeks ago
LLM GenAI is pretty garbage technology. The less time it takes people to realize this, the better.

Yes, it does have some niche uses. But people are trying to push it as a solution to everything and even as far as replacing human beings, and it's just not capable of that. Not only that, but why do we want to replace human beings? Especially in the arts? I'd rather look at things made by people. It doesn't matter how visually stunning something is; art has no soul if there is no artist.
Score: 22 Votes (Like | Disagree)
turbineseaplane Avatar
6 weeks ago
“….and now here’s Ashley to talk about some new Genmoji!”
Score: 18 Votes (Like | Disagree)
Orange Bat Avatar
6 weeks ago
Of course. “AI” is just a marketing term at this point, and not any kind of actual intelligence. These AIs are really just glorified search engines that steal peoples’ hard work and regurgitate that work as if the data is it’s own. We’re just living in an “AI bubble” that will burst sooner rather than later.
Score: 16 Votes (Like | Disagree)
Salty Pirate Avatar
6 weeks ago
So AI is nothing more than clever programing?
Score: 15 Votes (Like | Disagree)