Published Date: 1/09/2024
An estimated 10–15% of diagnoses are incorrect and serious patient harm or death from misdiagnosis affects one in 200 patients admitted to hospital. Up to 80% of diagnostic errors are potentially preventable and are mostly due to faults in clinician reasoning related to the gathering of relevant background information and integrating symptoms, signs and situational factors in generating an appropriate differential diagnosis.
Experience with digital symptom checkers, electronic differential diagnosis generators, and electronic medical record (EMR) screening for missed diagnoses has shown minimal impact, in part due to poor integration into clinical workflows and negative clinician perceptions. In this perspective article, we consider how artificial intelligence (AI) may assist clinicians in diagnosing complex cases at the bedside or in the clinic.
Advent of AI-assisted diagnosis
Machine learning prediction models applied to imaging data have shown promise in diagnosing pneumothoraces from chest radiographs, diabetic retinopathy from fundal images, or skin cancer from dermatoscopic photographs. Randomised trials confirm superior AI-assisted clinician performance in diagnosing diabetic retinopathy, detecting adenomas on colonoscopy, and identifying impaired cardiac function from electrocardiographs.
To date, most diagnostic machine learning models input images or structured data from EMRs or investigations and generate single disease probabilities or disease present/not present predictions. Moving upstream and using machine learning tools to assist bedside clinicians in more complex reasoning tasks requires integration of relevant clinical information (history from medical records, presenting complaint and findings from physical examination) and formulation of a differential diagnosis containing the correct diagnosis.
Using ChatGPT and related technologies to assist with diagnostic reasoning
Large language models (LLMs), such as the general purpose generative pretrained transformer (GPT) series of models, embodied in the chatbot ChatGPT, use natural language processing to learn and generate human-like text content in response to text-based prompts. Studies of LLM-assisted diagnostic reasoning have used GPT-3.5 or GPT-4. Applied to EMRs and other source documents, these LLMs can generate concise summaries of patients’ active diagnoses and past medical history (thus saving interview time and effort), suggest differential diagnoses surpassing previous differential diagnosis generators, detect diagnostic uncertainty in clinical documentation, and solve complex diagnostic problems.
Experimental studies of LLMs in diagnostic reasoning
ChatGPT does not appear to significantly enhance the differential diagnosis of clinicians for common clinical presentations. In contrast, in a study comparing GPT-4 with a simulated population of 10,000 online medical-journal-reading clinicians in solving 38 challenging cases, the March 2023 version of GPT-4 correctly diagnosed a mean of 22 cases (57%) versus 14 cases (36%) for the clinicians.
Future directions
Several innovations will likely move LLM-assisted diagnosis towards prime-time use. Biomedically trained LLMs, such as Med-PaLM-2, augmented with real-time access to additional, up-to-date medical information, semantic knowledge graphs, reinforcement learning with human feedback, and optimised prompt engineering, will develop accuracy superior to models such as GPT-4 trained on internet data of variable quality. Multimodal LLMs are emerging that can process not only text but also numerical, image, video, and audio data, further enhancing performance.
Q: What is the current state of diagnostic accuracy in healthcare?
A: An estimated 10–15% of diagnoses are incorrect and serious patient harm or death from misdiagnosis affects one in 200 patients admitted to hospital.
Q: How can AI assist clinicians in diagnosing complex cases?
A: AI can assist clinicians in diagnosing complex cases by providing a second opinion in real-time, sharing uncertainty, dealing with limited or noisy data, and deferring appropriately to clinician expertise and judgement.
Q: What are the limitations of LLMs in diagnostic reasoning?
A: LLMs have limitations, including the potential for errors, biases, and inaccuracies, particularly when used by patients to self-diagnose and self-triage.
Q: What are the future directions for LLM-assisted diagnosis?
A: Future directions include the development of biomedically trained LLMs, multimodal LLMs, and the implementation of LLMs in ways that blend with clinician workflows.
Q: What are the challenges of implementing LLMs in clinical practice?
A: Challenges include variations in patient populations, clinical settings, and data quality, as well as the need to embed cognitive bias mitigations into the design of LLM applications and their user interfaces.