COMMENTARY

The Rise of Large Language Models in Medicine: The Next Frontier

Arturo Loaiza-Bonilla, MD, MSEd

DISCLOSURES

Every day, there is a new mention on social media about advancements in artificial intelligence (AI), particularly for large language models (LLMs), such as ChatGPT, Llama, Cohere, Claude, and Bard. It can be hard to keep up, but these LLMs are shaping changes in medical education and how we practice. In fact, two recent pivotal studies offer a glimpse into a future where LLMs not only complement but also sometimes surpass medical trainees in standardized examinations.

This leads us to the question: In a world where LLMs can potentially pass all board exams, how do we approach medical knowledge? How do LLMs change the practice of medicine and the role that we physicians play?

The LLM Studies and Medical Knowledge

The first study, “Large Language Models Encode Radiation Oncology Domain Knowledge,” published last year in AI in Precision Oncology, looked at how various LLMs, including OpenAI’s GPT models, performed on the American College of Radiology Standardized Examination for Radiation Oncology. The models included GPT-3.5 Turbo, GPT-4, GPT-4 Turbo, Meta’s Llama 2 models, and Google’s PaLM 2 Text Bison. (Disclosure: I am one of the authors, though most of the work was done by Nikhil Thaker, MD. ) 

Impressively, models such as GPT-4 Turbo achieved scores comparable to upper-level trainees, especially in statistical domains, although the models were less effective in clinical scenarios without specific training data​.

The second study, “GPT Versus Resident Physicians — A Benchmark Based on Official Board Scores,” published in, presents a broader comparison across multiple medical specialties, including internal medicine, general surgery, pediatrics, psychiatry, and obstetrics and gynecology (ob/gyn). In this study, the AI language model GPT-4 was benchmarked against the scores of resident physicians in Israel. The LLM performed admirably, surpassing the median scores of residents in several fields and passing the board residency examination in four of five specialties, though it lagged in pediatrics and ob/gyn. 

What Do These Findings Mean for the Future of Medicine and the Role We Play as Physicians?

Integrating LLMs such as GPT-4 into medical training and practice will change the way medical knowledge is acquired, assessed, and applied. The internet and the smartphone brought an encyclopedia into our hands. Now LLMs are providing us with copilots that, when optimized by our practices, can help us focus on the patient and on continuous medical education, rather than memorization. 

Here are some ways LLMs can be our copilots:

  • Enhanced medical education: LLMs can serve as advanced educational tools, providing medical students and residents with immediate feedback and access to a vast repository of medical knowledge. This can reshape how medical knowledge is accessed and assimilated, potentially speeding up the learning process and enhancing the depth of understanding.
  • Augmented clinical decision-making: LLMs can process and analyze vast amounts of medical literature and patient data. We could leverage this technology to make more evidence-based and informed decisions, which could be particularly impactful in complex cases where nuanced understanding and rapid integration of new research findings are crucial. By providing quick access to the latest research and recommendations, LLMs could act as a second opinion or a virtual case board that enhances the quality of care delivered to patients.
  • Reduced cognitive and administrative load and burnout: As demonstrated by the performance of LLMs in standardized exams, these models can efficiently handle tasks, such as patient documentation and data analysis, that traditionally require significant time from medical professionals. This capability could greatly reduce administrative burdens, a known factor in physician burnout. When LLMs are combined with other AI tools, such as Ambience AI, physicians can focus more on the nuanced aspects of patient care such as emotional support and decision-making that require a deep understanding of individual patient needs. 
  • Shift in educational focus: As LLMs handle more knowledge-based tasks, medical training may shift towards developing stronger interpersonal, ethical, and professional skills. The role of a physician could evolve to emphasize these aspects, which are difficult for AI to replicate. While some studies claim that chatbots may be more empathetic than physicians, my personal take is that we were the ones who trained those chatbots. We just need more time to be more human.
  • Changes to standardized testing: The ability of LLMs to score well on medical exams suggests a potential reevaluation of how these tests are structured and what they aim to assess. There might be a move towards evaluations that better measure clinical judgment and interpersonal skills rather than mere factual recall. Then we could focus more on practicing medicine and pursuing continuing medical education and less on standardized testing.

Challenges and Considerations

Despite its promise, integrating LLMs into clinical practice must be approached with caution. First, we need rigorous validation against established medical standards, and we must address the concerns regarding ethical considerations around patient privacy and the potential for AI to perpetuate existing biases. We also need to prevent automation bias, where we rely solely on the AI algorithm and decision support systems as they become more available. The interpersonal aspect of medical care, critical for patient trust and treatment efficacy, must remain at the forefront of the healthcare we deliver.

Looking Forward

The potential for LLMs to enhance medical training and practice is undeniable. As we stand on the brink of this technological wave, it is crucial to foster collaborations between technologists, clinicians, and ethicists to ensure these tools are developed and implemented responsibly and effectively. To successfully integrate LLMs into clinical practice, these models need to be tailored and refined to handle the complexities of specific medical specialties. Continuous input from clinical environments will be crucial in training these models to address the nuanced and diverse scenarios encountered in different fields of medicine. As clinicians, we need to be directly involved in such optimization and fine-tuning.

As these models become integrated into our healthcare systems, our role as physicians is poised to evolve dramatically. As we navigate this shift, our approach must be guided by a commitment to improve patient care and the well-being of healthcare providers. In addition, we need to foster a culture of self-improvement to make sure that AI tools are implemented thoughtfully, with ongoing evaluation of their impact on patient outcomes and physician workflow.

The integration of sophisticated AI tools in medicine is not just a possibility, it is imminent. The potential of this new wave in medical training and practice is immense and exciting. How cool is that?

I would love to hear your comments on this column or other topics you would like me to cover in the future. Contact me at Arturo.AI.MedTech@gmail.com.

Arturo Loaiza-Bonilla, MD, MSEd, is the co-founder and chief medical officer at Massive Bio, a company connecting patients to clinical trials using artificial intelligence. His research and professional interests focus on precision medicine, clinical trial design, digital health, entrepreneurship, and patient advocacy. Dr Loaiza-Bonilla serves as Systemwide Chief of Hematology and Oncology at St. Luke’s University Health Network, where he maintains a connection to patient care by attending to patients 2 days a week.

TOP PICKS FOR YOU
Recommendations

3090D553-9492-4563-8681-AD288FA52ACE