A reporter for The Washington Post has put ChatGPT's new optional Apple Health integration feature to the test by feeding it ten years of their Apple Watch data. The results were not encouraging, to say the least.



Earlier this month, OpenAI announced the launch of ChatGPT Health, a dedicated section of ChatGPT where users can ask health-related questions completely separated from their main ChatGPT experience. For more personalized responses, users can connect various health data services such as Apple Health, Function, MyFitnessPal, Weight Watchers, AllTrails, Instacart, and Peloton.

ChatGPT Health can also integrate with your medical records, allowing it to analyze your lab results and other aspects of your medical history to inform its answers to your health-related questions.

With this in mind, reporter Geoffrey Fowler gave ChatGPT Health access to 29 million steps and 6 million heartbeat measurements from his Apple Health app, and asked the bot to grade his cardiac health. It gave him an F.

Feeling understandably alarmed, Fowler asked his actual doctor, who in no uncertain terms dismissed the AI's assessment entirely. His physician said Fowler was at such low risk for heart problems that his insurance likely wouldn't even cover additional testing to disprove the chatbot's findings.

Cardiologist Eric Topol of the Scripps Research Institute was likewise unimpressed with the large language model's assessment. He called ChatGPT's analysis "baseless" and said people should ignore its medical advice, as it's clearly not ready for prime time.

Perhaps the most troubling finding, though, was ChatGPT's inconsistency. When Fowler asked the same question several times, his score swung wildly between an F and a B. ChatGPT also kept forgetting basic information about him, including his gender and age, despite it having full access to his records.

Anthropic's Claude chatbot fared slightly better – though not by much. The LLM graded Fowler's cardiac health a C, but it also failed to properly account for limitations in the Apple Watch data.

Both companies say their health tools aren't meant to replace doctors or provide diagnoses. Topol rightly argued that if these bots can't accurately assess health data, then they shouldn't be offering grades at all.

Yet nothing appears to be stopping them. The U.S. Food and Drug Administration earlier this month said the agency's job is to "get out of the way as a regulator" to promote innovation. An agency commissioner drew a red line at AI making "medical or clinical claims" without FDA review, but ChatGPT and Claude argue they are just providing information.

"People that do this are going to get really spooked about their health," Topol said. "It could also go the other way and give people who are unhealthy a false sense that everything they're doing is great."

ChatGPT's Apple Health integration is currently limited to a group of beta users. Responding to the report, OpenAI said it was working to improve the consistency of the chatbot's responses. "Launching ChatGPT Health with waitlisted access allows us to learn and improve the experience before making it widely available,” OpenAI VP Ashley Alexander told the publication in a statement.