AI Health Coaches Are Here. Can You Trust Them?
I put my well-being in the hands of a personal, highly sophisticated, non-human expert to find out.
I put my well-being in the hands of a personal, highly sophisticated, non-human expert to find out.
At the turn of the current millennium, the thought that a wearable device would be able to track health metrics like blood oxygen, VO2 max, or exact minutes spent in restorative REM sleep seemed inconceivable. Fast forward 24 years, and having biometric data at your fingertips is the norm. But, knowing how to use this data to help you reach health goals or manage a chronic disease isn’t intuitive.
That’s where artificial intelligence health coaches come in: They can help you sift through the data, contextualize it, and offer personalized wellness plans and advice. Plus, unlike their human counterparts, some AI health coaches are available to chat 24/7.
Several star health gadgets have introduced this type of coaching to their apps. Last year, WHOOP unveiled WHOOP Coach, which uses GPT-4, OpenAI’s most advanced text-generating artificial intelligence program (1) to “generate highly individualized, conversational responses to their health and fitness questions—all within seconds.” Oura launched Oura Advisor earlier this year, a new opt-in “personal wellness coach” that analyzes data from your Oura ring and offers actionable tips based on your chosen interaction style (supportive, mentoring, or goal-oriented).
Unsurprisingly, Apple is rumored to have its own AI health coaching program in the works. Called “Quartz,” it promises to provide personalized nutrition, fitness, and sleep recommendations based on your Apple Watch data.
Then there’s the buzzy, recently announced iteration from Thrive AI Health, a partnership between OpenAI and Arianna Huffington’s Thrive Global. Trained on “the latest peer-reviewed science, biometric, lab and other medical data” that you chosen to share with it, it might end up being the most sophisticated version yet.
With AI health coaching becoming as ubiquitous as smart watches, I decided to test three currently available options for one week. My aim? To find out how helpful (and accurate) advice from an AI health coach can actually be. And to see if following the advice would help me get closer to my physical (and mental) health goals.
Fei Wang, Ph.D., founding director of the Weill Cornell Medicine Institute of AI for Digital Health, and a professor of health informatics in the department of Population Health Sciences at Weill Cornell Medicine.
David L. Katz, M.D., M.P.H., is the former director of Yale University Prevention Research Center. He’s also a specialist in internal medicine, preventive medicine/public health, and lifestyle medicine.
Jason Russell, vice president of Consumer Software Product at Oura.
My primary health goal in using an AI health coach: To improve my energy and stress levels by optimizing sleep and increasing daily movement. Sure, I could already get nudges from my Apple Watch to stand up from my desk or take a moment to breathe, but AI health coaches claim to offer personalized advice.
First, I bought a subscription to Humanity, a longevity-focused AI health coach. After answering a few basic questions about my diet and lifestyle habits, and connecting it to Apple Health and my Oura ring, I was notified that “an AI system built by our team of doctors and longevity scientists is analyzing your data right now.”
Twenty-four hours later, the AI platform spit out a measurement of how “functionally old my body is right now,” or my biological age: 25.9. That’s 6.3 years younger than my chronological age, which felt super precise and ego-boosting but also, maybe too good to be true? Keep in mind, I didn’t do any blood work before I began working with this AI Health Coach. (For Pro users, Humanity can decode your blood test information and give you a more comprehensive understanding of how you’re aging (2).)
Humanity gamifies anti-aging best practices by prompting you to score points across four pillars (movement, nutrition, mind, and recovery) to increase your daily Humanity Score, or H score. For example you can score points for fasting more than 14+ hours, since fasting “triggers survival pathways that are thought to improve lifespan,” the app told me.
Overall, the advice felt trustworthy, but not groundbreaking or personal enough to help me get closer to reaching my health goals. For example, to increase my H score, the app told me to promote a calm and resilient mental state by meditating, spending 15 minutes in nature, or messaging a friend. The nutrition component of the app requires logging meals and timing, and the advice was similarly generic. I also didn’t see a chat feature to ask my AI coach any follow-up questions.
ONVY, an AI health and performance coach, provides daily scores across several categories: recovery, activity, sleep, and mind. It gave more of the one-on-one coaching experience I was expecting. For example: On day two of using ONVY, my sleep summary indicated that my recovery score was a high 88, largely due to getting 8 hours and 29 minutes of restorative sleep. The AI coach instructed me to take 7,784 steps (exactly!) that day to get closer to my optimal activity score.
When I typed in questions that I might have otherwise searched on Google, I got generic advice. But to my surprise, I also got personalized tips. For example, after asking the AI coach, “Why do I feel tired after getting 8 hours of sleep?” it informed me my elevated heart rate and stress score could be contributing to my fatigue. To feel more rested, the coach gave me breathing exercises to try with clear instructions on how to perform them.
When I asked, “I’m getting married next week, what can I do to feel more energized and less stressed?” I was not only congratulated, but given (fairly generic but helpful) advice to keep getting as much sleep as I have been and to drink plenty of water. My “coach” even encouraged me to balance out the five minutes of high stress I had during the day by practicing at least 10 minutes of mindfulness and relaxation techniques. The advice felt trustworthy, and I appreciated that I could ask the coach questions and get immediate responses on how to interpret my Oura ring data. Even if the tips weren’t especially groundbreaking, it felt a touch more personalized than Humanity.
Oura Advisor offers the ability to have a one-on-one chat with a personal health companion based on “biometric tracking, deep scientific expertise, and a humane approach,” according to Jason Russell, vice president of Consumer Software Product at Oura. As with Humanity, you can also get a sense of your biological age by asking questions such as, ‘how has my cardiovascular age evolved since I started running more regularly,’ he adds. (Unfortunately, my Oura Advisor told me my biological age was 32, aligning with my chronological age.)
When I asked Oura Advisor questions about how to feel more energized and less stressed, the advice was similar to ONVY: prioritize quality sleep, engage in deep-breathing exercises, and take short, energizing walks. While my sleep scores were mostly in the 80s, it flagged I hadn’t been meeting my daily activity targets and had too many sedentary moments. Interestingly, the advice on how many steps to take (based on my sleep and recovery score) was similar to ONVY’s, give or take a couple thousand.
Overall, all three apps provided solid advice that I’m sure any health coach would approve of—but it didn’t feel personalized enough to make me want to continue using them. That said, there’s a chance they’d get more helpful as you use them more and AI health coaches learn more about you.
While an AI health coach isn’t intended to replace your doctor, the experts I spoke to say that by providing personalized and objective data-driven insights without human bias or fatigue, the trend could have a positive impact on our collective well-being. Here are a few of the biggest selling points.
From encouraging people to engage in healthier behaviors to making healthcare more accessible in general, AI health coaches can have a measurable impact on public health, says Fei Wang, Ph.D., founding director of the Weill Cornell Medicine Institute of AI for Digital Health. Non-human coaches have the potential to “reduce the burden of diseases, especially chronic diseases, at a population level,” and reduce healthcare costs, he adds.
An estimated 129 million Americans live with at least one major chronic disease (3). And in 2023, eight chronic conditions, including cardiovascular diseases, depression, and diabetes, hit all-time highs (4). Reversing this concerning trend and preventing diseases before they develop is “a challenge that AI-driven hyper-personalized coaching is uniquely positioned to address,” according to the Thrive AI Health Coach press release.
Given that chronic diseases such as diabetes and cardiovascular conditions are unevenly distributed across demographics, one of Thrive’s ambitious objectives is to leverage AI to “democratize the life-saving benefits of improved daily habits,” as noted in the press release. In practical terms, their personalized, behavioral coaching aims to bridge the gap between those resources (like trainers, chefs, life coaches) to make healthy behavior changes in underserved communities. Recent data suggests that as many as 25.6 million people in the U.S. are uninsured (5).
The ability to make recommendations based solely on rigorously validated measures of objective data is the “tremendous advantage” of AI health coaches, says preventive medicine specialist David L. Katz, M.D., M.P.H.
Whereas a personal trainer at the gym who doesn’t have nutrition credentials shouldn’t be telling you exactly how much protein you should be eating each day, an “algorithm that processes input and delivers output is free of ideology and biases,” Katz says. Theoretically, it’s also informed about the latest science and health guidelines.
“Algorithmic constraints work as safety wheels that keep the car from toppling over,” he adds. In other words, your AI health coach really can function as both your trainer and nutritionist all in one comprehensive tool.
Plus, robots take human factors out of the equation, Katz says. “They’ll never be sleepy. They’ll never be apathetic. They’ll always do what they do.”
Like a human, what makes AI special is that it’s always learning more about you and then changing the conclusions it reaches based on that accumulated knowledge, Katz says. But unlike a human, it also has a super-human memory, so it can recall your detailed health history. For instance, it can call up how you slept two weeks ago after taking magnesium, and help track your long-term progress beyond what a human health coach might be capable of.
Another example: Oura Advisor stores what it learns about you as “Memories.” Say, that you are currently on vacation or, in my case, about to get married, when you are discussing changes in your stress levels. This particular AI health coach can help remind you that when you’re stressed, like you were during your wedding week, you found this breathing exercise to be most helpful in bringing your stress levels down.
AI in many ways is still in its infancy. Just as Google’s AI overview may piece together a wildly inaccurate response to your search query, experts fear AI health coaches may also produce incorrect information or give people a false sense of security.
While an AI health coach may not have human biases, the people who provided the programming language could have passed along their own potential biases, ideology, or agendas, Katz says. “If garbage goes in, then garbage will inevitably come out,” he continues. “That’s the critical issue. That’s where we need transparency: What was the source of information [for this advice]?”
For example, a recent study by Wang and his colleagues revealed that current AI tools incorrectly predict depression risk because of algorithmic bias (6). Models predicted lower risk for males experiencing depression compared to healthier females. Of course, this can have a ripple effect, the researchers conclude: “Males are less likely to seek treatment for their mental health than females and AI tools underestimating male depression risk may further reduce the likelihood that males seek care.”
Wang agrees that the trustworthiness of AI recommendations remains a major drawback. For example, in a 2023 study, Wang and his colleagues evaluated the ability of Chat GPT-4 to answer a set of questions frequently encountered in the laboratory medicine field, ranging from basic knowledge to more complex interpretation of data (7). They found that out of 65 questions, ChatGPT correctly answered just over half of the questions, and generated an incomplete or partially correct answer to about 23 percent of questions. Most concerningly, for 16.9 percent of questions, the response provided misleading or straight-up false information, also known as an AI “hallucination.”
What’s worse, these responses appear to be super convincing. This can be especially problematic when you’re turning to an AI health coach to do something like interpret lab results and make recommendations for you, Wang says.
Overconfidence—and not showing receipts that prove the data is from reliable, up-to-date sources—is Katz’s main concern with AI health coaching. “The single most important thing for any human health practitioner is to acknowledge what she or he doesn’t know,” he says.
Thankfully, my ONVY AI health coach told me that feeling tired could be a sign of an underlying medical condition and that, “it might be worth talking to a healthcare provider.”
To state the obvious, bots don’t have the ability to make decisions based on human experience, judgment, and wisdom—and can’t replace the analysis and intuition of a trained professional.
“There is a lot about health that calls for a human touch, that calls for compassion and understanding and empathy,” Katz says. An AI health coach helping you improve your marathon time might not need to be as emotionally sophisticated as one aimed at helping you deal with a chronic disease, but “we have to be sensitive to and understand when humans really ought to be interacting with other humans,” he explains.
While doctors may have subconscious subjective bias, they also have profound knowledge within their specialities that is invaluable for patient care, Wang says. “Data-driven insights may be less subjective, but they can also be noisy,” he adds.
Not only is data hard for the layperson to interpret, but wearable sensors can also be imprecise or oversensitive when capturing physiological signals, including heart rate and sleep data, Wang says.
For example, one study examined heart rate measurements from four popular wrist-worn devices (including the Fitbit Charge and Apple Watch) and found them to underestimate heart rate during intense exercise compared to an electrocardiogram (ECG) (8). Translation: They can potentially offer misinformation that can lead to unnecessary concern or misguided advice.
Plus, not every health issue is easily solved by data from our wearables or phones. In fact, the authors of the study looking at depression and AI concluded that mental health remains a deeply personal and subjective experience that can’t be measured or predicted reliably using AI and data from our smartphones.
As long as the focus is promoting lifestyle practices that can support general health, there’s no harm in trying an AI health coach, Katz says. He cautions against it, however, for people with overlapping morbidities or those who are looking to troubleshoot more complex medical conditions.
Even as AI health coaches get smarter, we should be wary of falling into the trap of blindly following the advice of a computer that we assume is all-knowing, Katz says. “If we become enraptured with AI and think that it’s omniscient, we’re going to make terrible mistakes.”
The best solution is some combination of the two, with “some organic integration of data-driven insights and domain expertise and knowledge,” Wang says. And if you have a chronic health condition, advice from your care team will always supersede that of an AI health coach.