This story is part of a series on the current progression in Regenerative Medicine. This piece discusses advances in artificial intelligence technologies.
In 1999, I defined regenerative medicine as the collection of interventions that restore to normal function tissues and organs that have been damaged by disease, injured by trauma, or worn by time. I include a full spectrum of chemical, gene, and protein-based medicines, cell-based therapies, and biomechanical interventions that achieve that goal.
ChatGPT may soon be deployed as a tool for identifying and treating depression, both mild and severe. Throughout my discussion of regenerative medicine, I have often examined whether artificial intelligence technologies could be used as practical medical tools or even as primary care alternatives. While there are pros in favor of medical AI, such as cost-reduction and accessibility, there are also cons, such as privacy concerns and machine error.
We now have another data point in this emerging field to analyze. Dr. Inbar Levkovich and colleagues from Oranim Academic College in Israel released a study in BMJ Journals: Family Medicine and Community Health discussing using ChatGPT to identify and treat depression. Their study compares the identification and treatment methods of ChatGPT-3.5 and ChatGPT-4 to the recommendations of primary care physicians, in addition to screening for gendered or socioeconomic biases the program may convey.
Here, I will discuss Levkovich’s findings and expand on the larger implications of AI technology in the medical field.
Levkovich’s study was designed in a simple but elegant way. Present ChatGPT-3.5 and ChatGPT-4 with vignettes of eight hypothetical patients. Each patient had three distinguishing traits—sex, socioeconomic status, and depression severity—each of which presented on a binary scale and all other factors identical. The patient was either male or female, blue collar or white collar, and mildly depressed or severely depressed. The test was run ten times per vignette.
The reason behind these binary traits was to test for gendered and socioeconomic biases native to the ChatGPT large language modeling. If a chatbot AI system is biased in diagnosing and treating, it will not hold up to long-term medical scrutiny.
Beginning with mild depression, the differences between ChatGPT and primary care physicians were significant. Among primary care physicians, only 4.3% exclusively recommended psychotherapy, whereas with ChatGPT, between 95.0% and 97.5% of instances resulted in exclusive psychotherapy. Primary care physicians were much more likely to recommend prescription drugs (48.3%) compared to ChatGPT (0.0%) or a combination of therapy and drugs (32.5%) compared to ChatGPT (2.5% to 5.0%).
FIGURE 1: Treatment strategies for mild depression proposed by primary care physicians, ChatGPT-3.5 and ChatGPT-4.
The differences here between ChatGPT and physicians are dramatic. While we do not know why the large language models lean towards psychotherapy, we can speculate on a few factors. ChatGPT is free of certain constraints by which human physicians are often bound—for instance, the availability and cost of treatment.
Any direct interaction with medical personnel will be limited by the number of people in a medical facility, the geographic distribution of medical personnel, and the cost of medical personnel in education, training, and labor. Administrative and labor costs are among the highest categories of spend for a medical facility, not drugs.
However, the cost of drugs is another constraint for the physician. It is no secret that the pharmaceutical industry profits billions from the sale of prescription drugs to patients. Those profits are facilitated by the physician who prescribes the drugs, and they are incentivized to write prescriptions rather than recommend psychotherapy at most likely a different medical facility. The unrestricted AI suggests what it deduces is the correct course of action, which in this case is psychotherapy.
In severe cases of depression, the difference is not as extreme but still significant. General practitioners most commonly recommended a combination of therapy and drugs to severe depression patients (44.4%) as compared to a much more significant proportion for ChatGPT (72% to 100%). Further, primary care physicians also recommended solely drug treatment 40% of the time, a treatment not suggested at all by ChatGPT.
FIGURE 2: Treatment strategies for severe depression proposed by primary care physicians, ChatGPT-3.5 and ChatGPT-4.
For severe depression, the constraints discussed above still persist, but the unrestricted AI is more likely to suggest pharmacological treatment, as would be consistent with most physicians when treating severe forms of depression.
Perhaps most notable is the lack of apparent biases in the ChatGPT language model. Primary care physicians disproportionately recommend antidepressants to men to women, suggesting a gender bias towards females, as well as disproportionately recommending solely drug treatments to blue-collar workers as opposed to combination treatments for white-collar workers.
ChatGPT lacked these typical biases in its treatment recommendation, displaying an apparent strength of impartiality over its human counterpart.
Further, physicians are much more likely to recommend anxiolytics and sedatives as opposed to antidepressants. Anxiolytics and sedatives are calming medications commonly used to treat anxiety and stress. A majority of the time (68% to 74%), ChatGPT recommended solely antidepressants, whereas the primary care physicians recommended a combination of antidepressants and anxiolytics 68% of the time. This is not to say that the physicians incorrectly prescribe anxiolytics and sedatives, but instead that ChatGPT is much more likely to recommend antidepressants for its depression patients.
FIGURE 3: Psychopharmacology treatment strategies proposed by primary care physicians, ChatGPT-3.5 and ChatGPT-4 (%).
Levkovich and colleagues conclude that ChatGPT large language models align with the standard accepted guidelines for mild and severe depression treatment but differ significantly from the average operating procedure of actual primary care physicians. The model is less likely to recommend pharmacological treatment in mild cases, more likely to recommend combination therapy and drug treatment in severe cases, and much less likely to recommend anxiolytics in its psychopharmacological treatment strategy. Further, the ChatGPT model is far less likely to suggest treatment based on inherent biases often found in human providers.
ChatGPT and similar AI technological models offer an exciting new avenue for mental health treatment and healthcare in general. Early studies such as this show a promising tool that could be used in several ways, including mental health diagnoses, as shown here.
AI systems can provide personalized care in an accessible context. There is no need to schedule an appointment or deal with insurance information for simple medical issues that could be solved at home on your computer or phone.
However, I maintain that chatbox technologies, while exciting and innovative, are not yet ready for the total replacement of physicians. While incredibly accurate, these systems still make mistakes, and we must be cognizant of the fallibility of technology at this early stage. While it may not always be such, AI remains a tool rather than a replacement for medical laborers.
To read more of this series, please visit www.williamhaseltine.com