Given the very real life and death risks of clinical decision-making, researchers and policymakers are taking steps to ensure AI models are safe, secure and trustworthy — and that their use will lead to improved outcomes
AI models in health care are a double-edged sword, with models improving diagnostic decisions for some demographics, but worsening decisions for others when the model has absorbed biased medical data, suggests a study.
However, the new study, published in JAMA, finds that even with provided AI explanations, clinicians can be fooled by biassed AI models.
“The problem is that the clinician has to understand what the explanation is communicating and the explanation itself,” said Sarah Jabbour, doctoral candidate in computer science and engineering at the College of Engineering at the University of Michigan.
The team studied AI models and AI explanations in patients with acute respiratory failure.
“Determining why a patient has respiratory failure can be difficult. In our study, we found clinicians baseline diagnostic accuracy to be around 73 per cent,” said Michael Sjoding, Associate Professor of internal medicine at the varsity’s Medical School.
“During the normal diagnostic process, we think about a patient’s history, lab tests and imaging results, and try to synthesise this information and come up with a diagnosis. It makes sense that a model could help improve accuracy,” he added.
The team designed a study to evaluate the diagnostic accuracy of 457 hospitalist physicians, nurse practitioners and physician assistants with and without assistance from an AI model.
Each clinician was asked to make treatment recommendations based on their diagnoses. Half were randomised to receive an AI explanation with the AI model decision, while the other half received only the AI decision with no explanation.
Clinicians were then given real clinical vignettes of patients with respiratory failure, as well as a rating from the AI model on whether the patient had pneumonia, heart failure or Chronic obstructive pulmonary disease (COPD).
In the half of participants who were randomised to see explanations, the clinician was provided a heatmap, or visual representation, of where the AI model was looking in the chest radiograph, which served as the basis for the diagnosis.
The team found that clinicians who were presented with an AI model trained to make reasonably accurate predictions, but without explanations, had their own accuracy increase by 2.9 percentage points. When provided an explanation, their accuracy increased by 4.4 percentage points.
However, to test whether an explanation could enable clinicians to recognise when an AI model is clearly biased or incorrect, the team also presented clinicians with models intentionally trained to be biased — for example, a model predicting a high likelihood of pneumonia if the patient was 80 years old or older.
When clinicians were shown the biased AI model, however, it decreased their accuracy by 11.3 percentage points and explanations which explicitly highlighted that the AI was looking at non-relevant information (such as low bone density in patients over 80 years) did not help them recover from this serious decline in performance.