Does AI matter if human clinicians ignore it?
Artificial intelligence (AI) has been heralded as a game changer for clinical decision support (CDS) in a variety of medical contexts, particularly radiology. In recent years, AI tools have routinely achieved more than 90% accuracy when asked to identify signs of injury or disease in certain common imaging tests, and often outperform humans by sizable margins.
The advent of fast, automated, highly sensitive, and extremely accurate tools seems like a win-win for patients, health systems, and individual radiologists. With imaging specialists in short supply, having an AI tool to supplement existing diagnostic capacity could ensure that patients with time-sensitive conditions, such as strokes, always have expert help available without overloading the limited number of flesh-and-blood providers.
However, advocates of the human-driven healthcare system are quick to point out that AI isn’t ever going to fully replace the clinical decision-making skills of an experienced provider. There are some things that AI will never truly understand about the nuances of the patient-provider relationship or the unfathomable complexity of the body and its unpredictable functions.
Instead, they say that while humans will remain essential, physicians who use AI will soon outperform those who don’t.
But a new discussion paper from researchers at MIT and Harvard Medical School could throw this comforting trope out the window. The paper, which hasn’t been peer reviewed, indicates that radiologists who use AI to assist their decision-making process don’t actually do any better than those who make diagnoses on their own.
In fact, they might do worse. In the experiment, radiologists sometimes discounted the AI tool’s input if the suggested diagnosis didn’t match their own conclusions, preferring to rely on their own experience despite the fact that the AI is more accurate than the majority of human providers.
The results beg the question: is AI worth it if people aren’t going to trust it?
Architecting a detailed experiment
To get a better idea of how radiologists view the recommendations of AI assistants, the team recruited 180 radiologists working in the US and in Vietnam to make diagnoses from a series of chest x-rays. They also selected a deep learning prediction model called CheXpert to provide the AI input.
This model was trained on close to 225,000 chest x-rays from over 65,000 patients. In previous studies, the tool was proven to outperform the diagnostic skills of two out of every three human radiologists.
During the study, the human radiologists were randomly given four different sets of information: just the x-ray image; the image and an AI prediction of the diagnosis; the image and clinical context notes; or the image and both AI results and clinical context. The participants were also made aware that the AI algorithm produces high accuracy results at the beginning of the experiment.
How do radiologists integrate AI input into their decisions?
The research team found that the human radiologists took approximately 4% more time to review images when given clinical context and/or AI data. They were almost 5% more accurate in their diagnoses when they received clinical context around a given image – but their overall accuracy remained at baseline when given AI input into the situation.
The team called the results “puzzling,” especially given the fact that the radiologists knew of the algorithm’s diagnostic prowess. But the average of the results is hiding an important secret, they pointed out.
“The zero effect of AI assistance is driven by heterogeneous treatment effects: diagnostic quality increases when the AI is confident but decreases when the AI is uncertain. In parallel, AI assistance improves diagnostic quality for patient cases in which our participants are uncertain but decreases quality for patient cases in which our participants are certain.”
In other words, radiologists still ended up relying on their gut most of the time. This may be appropriate for a clinician who is very confident in their diagnostic skills, assuming their confidence isn’t misplaced. But it can be highly problematic for a radiologist who isn’t entirely sure whether they are making the right decision, especially if they end up being swayed by an algorithm with built-in biases or other unanticipated issues.
The researchers concluded that “the majority of cases are optimally decided either by the radiologist or the AI alone but not by the radiologist with access to AI.”
It all comes down to trust
Ultimately, the experiment reveals a major trust gap that has yet to be solved. The study does note that additional training and experience with AI might change these outcomes, but separate industry research indicates that human providers might not be willing to give AI the chance.
For example, a recent survey by GE HealthCare found that only 42% of providers worldwide believe that AI is trustworthy in its current form. That number drops precipitously to just 26% of US-based clinicians.
The majority of participants in that poll do not trust the quality of the data used to train algorithms and believe that baked-in biases leave AI vulnerable to poor decision-making that could have an impact on health equity and overall outcomes.
The fact that providers in the radiology study took more time to complete their diagnoses when given AI information is another strike against a hybrid, AI-augmented future. Speed and efficiency are major selling points for AI tools, and a slower process without a corresponding bump in quality could be a problem for providers looking to simplify and streamline their workflows.
Education, experience, and extensive testing might be the key to encouraging widespread adoption, engendering trust, and reducing the extra time it takes to synthesize AI information.
For larger, well-resourced health systems with the willingness to put in the work to further advance the field of AI, being an early adopter could be beneficial down the road.
But for others, the discussion paper may be a signal to take exercise caution with AI and make sure that any tools they adopt are as bulletproof as possible in terms of the trustworthiness of their training data and accuracy of their diagnostic suggestions.
There’s no question that AI is going to bring wholesale change to healthcare. Most of these changes are likely to be positive in nature. However, industry leaders will need to be responsible, ethical, and realistic about the optimal way to integrate automated clinical decision tools into the care environment, especially in areas that could have life-or-death implications for patients.
Jennifer Bresnick is a journalist and freelance content creator with a decade of experience in the health IT industry. Her work has focused on leveraging innovative technology tools to create value, improve health equity, and achieve the promises of the learning health system. She can be reached at email@example.com.