Artificial intelligence has come to stay in the health care industry. The term refers to a constellation of computational tools that can comb through vast troves of data at rates far surpassing human ability, in a way that can streamline providers’ jobs. Some types of AI commonly found in health care already are:
- Machine learning AI, where a computer trains on datasets and “learns” to, for example, identify patients who would do well with a certain treatment.
- Natural language processing AI, which can identify the human voice, and might transcribe a doctor’s clinical notes.
- Rules-based AI, where computers train to act in a specific way if a particular data point shows up–these kinds of AI are commonly used in electronic medical records to perhaps flag a patient who has missed their last two appointments.
Regardless of the specific type of AI, these tools are generally capable of making a massive, complex industry run more efficiently. But several studies show it can also propagate racial biases, leading to misdiagnosis of medical conditions among people of color, insufficient treatment of pain, under-prescription of life-affirming medications, and more. Many patients don’t even know they’ve been enrolled in healthcare algorithms that are influencing their care and outcomes.
A growing body of research shows a paradox, however. While some algorithms do indeed exacerbate inequitable medical care, other algorithms can actually close such gaps.
Popular press tends to cover AI in medicine only when something goes wrong. While such reports are critical in holding institutions to account, they can also paint the picture that when AI enters health care, trouble is always around the corner. If done correctly, AI can actually make health care fairer for more people.
Historically, much of the research in the medical sciences and in the biological sciences has relied on subject pools of white—often male—people of European ancestry. These foundational studies on everything from normal internal body temperature to heart disease become the stuff of textbooks and trainings that doctors, nurses, and other health care professionals engage with as they move up the professional ladder.
However, those studies offer a limited, one-size-fits-all view of human health that opens the door to racial bias—which patients get treated and how. The most easily graspable example of this type of knowledge gone wrong is consulting images of white skin to diagnose dermatological diseases across all skin types, when such diseases may manifest in unique ways depending on the pigmentation of someone’s skin.
When AI is trained by data that lack diversity, then it is more likely to mimic the same racial bias that healthcare professionals can themselves exhibit. A poorly structured AI training dataset is no better (and in fact is sometimes worse) than a human with a medical degree predicated on lessons learned about the health of primarily white patients.
On the flipside, when AI is trained on datasets that include information from a diverse population of patients, it can help move the health care field away from deep-seated biases.
Below are summaries of some of the research on the intersection of AI and race.
Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations
Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. Science, October 2019.
What the researchers focused on: This study dove into how a nationally circulated health care algorithm perpetuated the under-serving of Black patients as compared with white patients. Such algorithms have the potential to do immense harm, by replicating the same racial biases in play by humans, but at an even more massive scale, the authors write.
What they found: Commercially applied risk-prediction algorithms are among the most common types of AI the health care industry currently uses. They’re applied to the care of some 200 million Americans every year. In this study, researchers show one unnamed algorithm assigned Black patients the same level of health risk as white patients, when in reality the Black patients were sicker.
The researchers learned that the machine-learning algorithm had trained itself to to see health care costs as a proxy for a patient’s level of health, when in reality it is reflective of the health care industry’s inequitable investment in some patient populations over others.
In other words, the algorithm assumed that because it cost hospitals less to care for Black patients, Black patients were healthier and required less care. However, hospital costs are lower for Black patients even when they are sicker than white patients, because hospitals funnel fewer resources toward the care of sick Black patients. The researchers suggest that training the algorithm not to equate cost with health would undo this tripwire.
What researchers did with their findings: “After completing the analyses described above, we contacted the algorithm manufacturer for an initial discussion of our results,” the authors write. “In response, the manufacturer independently replicated our analyses on its national dataset of 3,695,943 commercially insured patients. This effort confirmed our results—by one measure of predictive bias calculated in their dataset, Black patients had 48,772 more active chronic conditions than White patients, conditional on risk score—illustrating how biases can indeed arise inadvertently.”
Researchers then began experimenting with solutions with the algorithm manufacturer and have already made improvements in the product.
“Of course, our experience may not be typical of all algorithm developers in this sector,” they write. “But because the manufacturer of the algorithm we study is widely viewed as an industry leader in data and analytics, we are hopeful that this endeavor will prompt other manufacturers to implement similar fixes.”
AI Recognition of Patient Race in Medical Imaging: A Modelling Study
Judy Wawira Gichoya; et al. The Lancet: Digital Health, May 2022.
What the researchers focused on: Previous research has shown that AI can be trained to detect a person’s race from medical images, even though human experts who are looking at the images aren’t able to tell the patient’s race just from looking at those images. The authors wanted to find out more about AI’s ability to recognize a patient’s race from medical images. They analyzed a total of 680,201 chest X-rays across 3 datasets where Black patients comprised 4.8% to 46.8% of the subjects, white patients 42.1% to 64.1%, Asian patients 3.6% to 10.8%; 458,317 chest CTs also across 3 datasets where Black patients comprised 9.1% to 72% of the subjects, white patients 28% to 90.9% and Asian patients were unrepresented; 691 digital radiography X-rays where Black patients comprised 48.2% of the subjects, white patients 51.8%, and Asian patients were unrepresented; 86,669 breast mammograms where Black patients comprised 50.4% of the subjects, white patients 49.6% and Asian patients were unrepresented; and 10,358 lateral c-spine X-rays where Black patients comprised 24.8% of the subjects, white patients 75.2%, and Asian patients were unrepresented. The images themselves contained no racial information and represented different degrees of image clarity, full and cropped views and other variations.
What they found: The deep learning model was able to identify a patient’s race accurately from medical images that contained no identifiable racial information. Researchers thought perhaps the model was learning to do this by matching known health outcomes with racial information.
There is “evidence that Black patients have a higher adjusted bone mineral density and a slower age-adjusted annual rate of decline in bone mineral density than White patients,” the researchers write, so they thought perhaps they could trick the model by cropping out parts of medical images that showed such characteristic bone density information. Even still, the model was able to identify the patient’s race from the images. “This finding is striking as this task is generally not understood to be possible for human experts,” the authors write.
How they explain it: “The results from our study emphasize that the ability of AI deep learning models to predict self-reported race is itself not the issue of importance. However, our finding that AI can accurately predict self-reported race, even from corrupted, cropped, and noised medical images, often when clinical experts cannot, creates an enormous risk for all model deployments in medical imaging,” the researchers write. “The regulatory environment in particular, while evolving, has not yet produced strong processes to guard against unexpected racial recognition by AI models; either to identify these capabilities in models or to mitigate the harms that might be caused.”
An Algorithmic Approach to Reducing Unexplained Pain Disparities in Underserved Populations
Emma Pierson; et al. Nature Medicine, January 2021.
What the researchers focused on: Previous research has shown Black patients are more likely than white patients to have their pain dismissed and untreated. One example is knee pain due to osteoarthritis. Researchers wanted to find out if an AI could undo biases in how knee pain is diagnosed and treated.
What they found: The researchers used a deep learning model trained on X-rays of osteoarthritis in the knee of 2,877 patients —18% of whom were Black, 38% were low-income, and 39% were non-college graduates — to predict the level of pain a patient would be expected to have based on the progression of their osteoarthritis. The model was better at assigning pain levels to underserved patients than human radiologists. The researchers conclude that the model was able to predict pain even when the imaging did not necessarily show the expected level of disease severity. That’s because patients of color are more likely than white patients to have “factors external to the knee” that influence their level of pain, such as work conditions and higher stress, the researchers write. In other words, the same level of osteoarthritis severity can result in very different levels of pain depending on the patient population, and evaluating a patient without that context can lead to underdiagnosis for underserved patients. In this case, an AI could solve an issue that persists because of human racial bias.
How they explain it: “In addition to raising important questions regarding how we understand potential sources of pain, our results have implications for the determination of who receives arthroplasty for knee pain … Consequently, we hypothesize that underserved patients with disabling pain but without severe radiographic disease could be less likely to receive surgical treatments and more likely to be offered non-specific therapies for pain. This approach could lead to overuse of pharmacological remedies, including opioids, for underserved patients and contribute to the well-documented disparities in access to knee arthroplasty.”
Other academic studies, reports and commentaries to consider:
Ziad Obermeyer, Rebecca Nissan, Michael Stern, Stephanie Eaneff, Emily Joy Bembeneck, and Sendhil Mullainathan. Center for Applied Artificial Intelligence, The University of Chicago Booth School of Business. June 2021. Jonathan Huang, Galal Galal, Mozziyar Etemadi and Mahesh Vaidyanathan. JMIR Medical Informatics, May 2022.
Evaluation and Mitigation of Racial Bias in Clinical Machine Learning Models: Scoping Review
Jonathan Huang, Galal Galal, Mozziyar Etemadi and Mahesh Vaidyanathan. JMIR Medical Informatics, May 2022.
L. Ebony Boulware, Tanjala S. Purnell and Dinushika Mohottige. JAMA Network Open, January 2021
Hidden in Plain Sight – Reconsidering the Use of Race Correction in Clinical Algorithms
Darshali A. Vyas, Leo G. Eisenstein and David S. Jones. New England Journal of Medicine. August 2020
Challenging the Use of Race in the Vaginal Birth after Cesarian Section Calculator
Darshali A. Vyas, David S. Jones, Audra R. Meadows, Khady Diouf, Nawal M. Nour and Julianna Schantz-Dunn. Women’s Health Issues, April 2019.
Expert Commentary