The healthcare industry with its vast amounts of patient data and medical literature, seeks efficient ways to use this information for better patient outcomes. By leveraging the capabilities of LLMs in healthcare processes, organizations can provide better patient care, research, and data privacy thanks to LLMs’ ability to generate, and summarize text-rich data.
- Healthcare-specific LLMs used in patient care
- LLM use cases in healthcare
- Challenges of LLMs in healthcare
Healthcare LLMs
Large language models (LLMs) are general language models trained on vast amounts of data on web services. Therefore, they are not selective or specialized. For a specific application area, LLMs need to be fine-tuned with the data within that area, such as literature, or healthcare. For more: LLM fine-tuning and LLM training.
Currently, although not used widely, there are attempts to use large language models in healthcare and medical applications through fine-tuning:
Open source healthcare LLMs
Med-PaLM 2
Radiology-Llama2
Meta
MedAlpaca
Technical University of Munich
Clinical Camel
GatorTron
NVIDIA
Model | Developer | Year released | Parameters | Multimodal |
---|---|---|---|---|
Med-PaLM 2 | 2023 | 340B | ✅ | |
Radiology-Llama2 | Meta | 2023 | 70B | ✅ |
MedAlpaca | Technical University of Munich | 2023 | 13B | ✅ |
Clinical Camel | 2023 | 13B | ❌ | |
GatorTron | NVIDIA | 2021 | 8.9B | ❌ |
BioMedLM | Stanford University | 2022 | 2.7B | ✅ |
PubMedGPT | Stanford CRFM | 2023 | 2.7B | ❌ |
Microsoft Research | 2022 | 347M | ✅ | |
BioLinkBERT | 2022 | 340M | ❌ | |
Health Acoustic Representations (HeAR) | 2024 | 313M | ❌ | |
MedBERT | Stanford University | 2021 | 17M | ✅ |
1- Med-PaLM 2
Med-PaLM 2 is a specialized version of Google’s PaLM models, tailored specifically for the healthcare domain. It is trained on datasets of medical knowledge and has been fine-tuned to answer medical questions, perform clinical reasoning, and provide diagnosis support.
It has higher accuracy than other models (~85%) on the MedQA medical exam benchmark.1
2- Radiology-Llama2
Radiology-Llama2 is a domain-specific model derived from Meta’s LLaMA 2, fine-tuned for radiology tasks. It focuses on analyzing radiological images and generating medical reports with clinically relevant findings. Llama-2 is free for research and commercial use.
Figure: The overall framework of Radiology-Llama2

Source: Arxiv2
Llama 2 matches GPT-4 and outperforms GPT-3.5 on factual accuracy
- Llama-2-70B was ~80% accurate in detecting fact discrepancies in summarized news snippets. This corresponds to GPT-4’s ~85% accuracy.
- Both Llama-2-70B and GPT-4 performed substantially better than GPT-3.5-turbo, which received ~65% because of high ordering bias.
- On a fact-checking task, Llama-2-70B and GPT-4 performed similarly to humans, with ~85% accuracy.3
3- Health Acoustic Representations (HeAR)
Google Research team leveraged 300 million pieces of de-identified audio data including 100 million cough sounds to build HeAR.
Salcit Technologies based in India used HeAR to build Swaasa which aims to be used for TB diagnosis. This would address a shortage of physicians and accessible healthcare facilities in India, enabling earlier diagnosis which typically leads to significantly better patient outcomes.4
Closed-source healthcare LLMs
Hippocratic AI
Launched in 2023 with a $50M funding round, Hippocratic AI builds LLMs to automate patient communication. The company already raised $120M and it claims that its technology is being tested by tens of hospitals.
10 Use Cases of Large Language Models in Healthcare
1- Medical Transcription
LLMs can help create medical transcriptions by:
- Listening to the organic dialogue between a patient and clinician
- Extracting important medical details
- Condensing medical data into compliant medical records that align with the relevant sections of an EHR
Real-life use case – Google’s MedLM can capture &transform the patient-clinician conversation into a medical transcription.5
2- Electronic Health Records (EHR) Enhancement
The proliferation of electronic health records (EHR) has accumulated a vast repository of patient data, which, if mined effectively, can become a goldmine for healthcare improvement.
Real-life use case – Google’s MedLM is also used by BenchSci, Accenture, and Deloitte for electronic health records enhancement (EHR).
- BenchSci has integrated MedLM into its ASCEND platform to improve the quality of preclinical research.
- Accenture uses MedLM to organize unstructured data from numerous sources, automating human operations that were previously time-consuming and error-prone.
- Deloitte works with MedLM to minimize friction in finding treatment. They use an interactive chatbot that helps health plan participants better understand the provider alternatives.6
3- Clinical Decision Support
Large language models can summarize complex medical concepts allowing them to support valuable insights in the decision-making process.
Real-life use case – Memorial Sloan Kettering Cancer Center uses IBM Watson Oncology to assist oncologists by analyzing patient data and medical literature to recommend evidence-based treatment options.7
4- Medical Research Assistance
LLMs can parse and summarize vast amounts of data, can extract key findings from new research, providing synthesized insights. For example, one of the most famous LLMs, ChatGPT, is used for text summarization.
Real-life use case – John Snow’s healthcare chatbot helps researchers find relevant scientific papers, extract key insights, and identify research trends. It is particularly valuable for navigating the vast amount of biomedical literature.8
Real-life use case – TidalHealth Peninsula Regional clinicians used the Micromedex with Watson solution for healthcare research, claiming that, clinicians received their answers in less than one minute ~70% of the time.9
5- Automated Patient Communication
Large language models in healthcare can draft informative and compassionate responses to patients’ queries.
Some examples include:
- Medication management and reminders: A chatbot provides patients regular reminders to take their diabetic medication and requests confirmation.
- Health monitoring and follow-up care: A post-operative patient sends their pain and wound status to a chatbot, which determines if the healing process is progressing.
- Informational and educational communication: A patient asks a chatbot how to manage high blood pressure, and the chatbot responds with nutrition and lifestyle tips.
Real-life use case – Boston Children’s Hospital uses Buoy Health, an AI-driven online symptom checker chatbot, that provides patients with instant answers to health-related questions and initial consultations.
The chatbot can triage patients by analyzing their symptoms and advising whether they need to see a doctor.10
6- Predictive Health Outcomes
LLMs can assist in predictive analysis by discerning patterns within data.
Real-life use case – WVU Pharmacists using AI to reduce patient readmission rates: WVU pharmacists use a predictive algorithm to leverage LLMs to determine readmission risk. This approach will examine data from electronic health records (EHRs), which include patient demographics, clinical history, and socioeconomic determinants of health.
Based on this research, the WVU pharmacists identify patients at high risk of readmission and assign care coordinators to follow up with them after discharge. This can help reduce readmission rates.11
7- Personalized Treatment Plans
LLMs can suggest treatment plans tailored to an individual’s medical history and specific needs. Their ability to distill complex patient narratives into actionable insights can ensure that each patient receives a care plan that’s as unique as their health journey.
Real-life use case – Babylon Health: Babylon Health’s AI chatbot provides individualized health recommendations based on the user’s symptoms and medical history. It engages users in a conversation by asking relevant questions to analyze their issues better and giving tailored recommendations.12
8- Medical Coding and Billing
Large language models can automate audit processes by analyzing patient records and EHRs.
For example, Epic Systems, a major EHR provider, integrates LLMs into its software to assist with coding and billing. The LLMs can monitor for anomalies in access patterns to sensitive patient information or inconsistencies in coding and billing practices.13
However, LLMs are not ready for medical coding but promising: Researchers examined how frequently four LLMs (GPT-3.5, GPT-4, Gemini Pro, and Llama2-70b Chat) issued the correct CPT, ICD-9-CM, and ICD-10-CM codes.
Their findings show that there is a significant opportunity for improvement. Researchers discovered that LLMs frequently create codes that transmit inaccurate information, with a maximum accuracy of 50%.14
9- Training and Education
Large language models and generative AI in general can be leveraged as interactive educational tools, elucidating complex concepts or offering clarifications on perplexing topics.
Real-life use case – Oxford Medical Simulation uses LLMs integrated with VR technology to create immersive virtual patient simulations.
These simulations allow students to experience high-pressure scenarios, such as handling a cardiac arrest patient without any real-world consequences.
The LLMs power the virtual patients’ responses, making them more realistic and unpredictable, preparing students for the variability of real clinical environments.15
10- Ethical and Compliance Monitoring
Large Language Models (LLMs) can be employed in healthcare compliance monitoring to ensure adherence to regulations such as HIPAA (Health Insurance Portability and Accountability Act), and GDPR (General Data Protection Regulation).
Real-life use case – FairWarning, a leading provider of patient privacy intelligence, uses LLMs to monitor healthcare organizations for potential privacy violations.
The system scans and analyzes user activity within EHRs to identify potential breaches, such as unauthorized access to patient records.
This helps healthcare providers ensure that all interactions with patient data comply with regulatory requirements.16
Challenges of Large Language Models in Healthcare
Accuracy and reliability
Medical decisions can be life-altering, and there’s little room for error. Large language models in healthcare, while powerful, can still produce inaccuracies or misunderstand context. A misinterpretation or incorrect recommendation could have grave consequences for patient care.
In a recent study of ~150 GPT-4-generated responses, the responses were evaluated by physicians17 :
- 7% of recommended patient responses were judged to be harmful.
- 1% of responses could be fatal.
Generalization vs. specialization
Healthcare encompasses a wide range of specialties, each with its nuances. An LLM trained in general medical data might not have the detailed expertise needed for specific medical specialties.
Biases and ethical considerations
Beyond accuracy, there are ethical concerns, like the potential for LLMs to perpetuate biases in the training data. This could result in unequal care recommendations for different demographic groups.
For more details on the challenges of large language models in healthcare, you can check our articles on the risks of generative AI and generative AI ethics.
External Links
- 1. [2305.09617] Towards Expert-Level Medical Question Answering with Large Language Models.
- 2. [2305.09617] Towards Expert-Level Medical Question Answering with Large Language Models.
- 3. Llama 2 vs. GPT-4: Nearly As Accurate and 30X Cheaper.
- 4. Researchers built an AI model to detect diseases based on coughs. Google
- 5. Google Launches A Healthcare-Focused LLM.
- 6. How doctors are using Google's new AI models for health care. CNBC
- 7. ResearchGate - Temporarily Unavailable.
- 8. Medical ChatBot | Healthcare ChatBot | Medical GPT.
- 9. IBM Case Studies.
- 10. Buoy Health - IDHA. Boston Children's Hospital
- 11. WVU pharmacists using AI to help lower patient readmission rates | WVU Today | West Virginia University.
- 12. Babylon's AI-enabled symptom checker added to recently acquired Higi's app | MobiHealthNews.
- 13. Artificial Intelligence | Epic.
- 14. Large Language Models Are Poor Medical Coders — Benchmarking of Medical Code Querying | NEJM AI.
- 15. Oxford Medical Simulation - Virtual Reality Healthcare Training. Oxford Medical Simulation
- 16. Protect Patient Privacy with Imprivata Patient Privacy Intelligence - YouTube.
- 17. “The effect of using a large language model to respond to patient messages“. The Lancet. April, 24, 2024
Comments
Your email address will not be published. All fields are required.