nature.com

Multimodal generative AI for medical image interpretation

Abstract

Accurately interpreting medical images and generating insightful narrative reports is indispensable for patient care but places heavy burdens on clinical experts. Advances in artificial intelligence (AI), especially in an area that we refer to as multimodal generative medical image interpretation (GenMI), create opportunities to automate parts of this complex process. In this Perspective, we synthesize progress and challenges in developing AI systems for generation of medical reports from images. We focus extensively on radiology as a domain with enormous reporting needs and research efforts. In addition to analysing the strengths and applications of new models for medical report generation, we advocate for a novel paradigm to deploy GenMI in a manner that empowers clinicians and their patients. Initial research suggests that GenMI could one day match human expert performance in generating reports across disciplines, such as radiology, pathology and dermatology. However, formidable obstacles remain in validating model accuracy, ensuring transparency and eliciting nuanced impressions. If carefully implemented, GenMI could meaningfully assist clinicians in improving quality of care, enhancing medical education, reducing workloads, expanding specialty access and providing real-time expertise. Overall, we highlight opportunities alongside key challenges for developing multimodal generative AI that complements human experts for reliable medical report writing.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

$29.99 / 30 days

cancel any time

Learn more

Subscribe to this journal

Receive 51 print issues and online access

$199.00 per year

only $3.90 per issue

Learn more

Buy this article

Purchase on SpringerLink

Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Additional access options:

Log in

Learn about institutional subscriptions

Read our FAQs

Contact customer support

Fig. 1: Applications for automated medical report generation.

Fig. 2: The capabilities of GenMI.

Fig. 3: Implementing an AI resident.

References

Côté, M. J. & Smith, M. A. Forecasting the demand for radiology services. Health Syst. 7, 79–88 (2018).

MATHGoogle Scholar

Al Yassin, A., Sadaghiani, M. S., Mohan, S., Bryan, R. N. & Nasrallah, I. It is about “time”: academic neuroradiologist time distribution for interpreting brain MRIs. Acad. Radiol. 25, 1521–1525 (2018).

PubMedGoogle Scholar

Reiner, B. I., Knight, N. & Siegel, E. L. Radiology reporting, past, present, and future: the radiologist’s perspective. J. Am. Coll. Radiol. 4, 313–319 (2007).

PubMedGoogle Scholar

Carter, A. J., Davis, K. A., Evans, L. V. & Cone, D. C. Information loss in emergency medical services handover of trauma patients. Prehosp. Emerg. Care 13, 280–285 (2009).

PubMedMATHGoogle Scholar

Clynch, N. & Kellett, J. Medical documentation: part of the solution, or part of the problem? A narrative review of the literature on the time spent on and value of medical documentation. Int. J. Med. Inf. 84, 221–228 (2015).

Google Scholar

Bruno, M. A., Walker, E. A. & Abujudeh, H. H. Understanding and confronting our mistakes: the epidemiology of error in radiology and strategies for error reduction. Radiographics 35, 1668–1676 (2015).

PubMedMATHGoogle Scholar

Srinivasa Babu, A. & Brooks, M. L. The malpractice liability of radiology reports: minimizing the risk. Radiographics 35, 547–554 (2015).

PubMedGoogle Scholar

Wahl, B., Cossy-Gantner, A., Germann, S. & Schwalbe, N. R. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? BMJ Glob. Health 3, e000798 (2018).

PubMedPubMed CentralGoogle Scholar

Ganaie, M. A., Hu, M., Malik, A. K., Tanveer, M. & Suganthan, P. N. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 115, 105151 (2022).

Google Scholar

Weinberg, B. D., Richter, M. D., Champine, J. G., Morriss, M. C. & Browning, T. Radiology resident preliminary reporting in an independent call environment: multiyear assessment of volume, timeliness, and accuracy. J. Am. Coll. Radiol. 12, 95–100 (2015).

PubMedGoogle Scholar

Taylor, A. G., Mielke, C. & Mongan, J. Automated detection of moderate and large pneumothorax on frontal chest X-rays using deep convolutional neural networks: a retrospective study. PLoS Med. 15, e1002697 (2018).

PubMedPubMed CentralGoogle Scholar

Cui, S. et al. Development and clinical application of deep learning model for lung nodules screening on CT images. Sci. Rep. 10, 13657 (2020).

ADSCASPubMedPubMed CentralMATHGoogle Scholar

Liu, W. N. et al. Study on detection rate of polyps and adenomas in artificial-intelligence-aided colonoscopy. Saudi J. Gastroenterol. 26, 13 (2020).

PubMedMATHGoogle Scholar

Rodríguez-Ruiz, A. et al. Detection of breast cancer with mammography: effect of an artificial intelligence support system. Radiology 290, 305–314 (2019).

PubMedMATHGoogle Scholar

Delrue, L. et al. in Comparitive Interpretation of CT and Standard Radiography of the Chest (eds Baert, A. I. et al.) 27–49 (Springer, 2011).

Messina, P. et al. A survey on deep learning and explainability for automatic report generation from medical images. ACM Comput. Surv. 54, 1–40 (2022).

MATHGoogle Scholar

Mohsan, M. M. et al. Vision transformer and language model based radiology report generation. IEEE Access. 11, 1814–1824 (2022).

MATHGoogle Scholar

Yang, B., Raza, A., Zou, Y. & Zhang, T. PCLmed at ImageCLEFmedical 2023: customizing general-purpose foundation models for medical report generation. CLEF (Working Notes) 1754–1766 (CLEF, 2023).

Nicolson, A., Dowling, J. & Koopman, B. Improving chest X-ray report generation by leveraging warm starting. Artif. Intell. Med. 144, 102633 (2023).

PubMedGoogle Scholar

Ramesh, V., Chi, N. A. & Rajpurkar, P. Improving radiology report generation systems by removing hallucinated references to non-existent priors. Proc. Mach. Learn. Res. 193, 456–473 (2022). This study introduces a novel method that uses LLMs in report generation to rewrite generated reports, which subsequent studies have built on.

Google Scholar

Ranjit, M., Ganapathy, G., Manuel, R. & Ganu, T. Retrieval augmented chest X-ray report generation using Openai GPT models. Proc. Mach Learn. Res. 219, 650–666 (2023).

Google Scholar

Tu, T. et al. Towards generalist biomedical AI. NEJM AI 1, AIoa2300138 (2024). This study is a prototypical example of a new generalist medical AI model that uses foundation models to expand the capabilities for report generation.

Moor, M. et al. Med-flamingo: a multimodal medical few-shot learner. Proc. Mach Learn. Res. 225, 353–367 (2023).

MATHGoogle Scholar

Zhao, Z. et al. ChatCAD+: toward a universal and reliable interactive CAD using LLMs. IEEE Trans. Med. Imaging 43, 3755–3766 (2024).

Lin, B. et al. Towards medical artificial general intelligence via knowledge-enhanced multimodal pretraining. Preprint at https://doi.org/10.48550/arXiv.2304.14204 (2023).

Lee, S., Kim, W. J., Chang, J. & Ye, J. C. LLM-CXR: instruction-finetuned LLM for CXR image understanding and generation. In The Twelfth International Conference on Learning Representations (ICLR, 2024).

Xu, S. et al. ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders. Preprint at https://doi.org/10.48550/arXiv.2308.01317 (2023).

Jeong, J. et al. Multimodal image-text matching improves retrieval-based chest x-ray report generation. In Medical Imaging with Deep Learning 978–990 (PMLR, 2024). This study introduces concepts that inspired our AI resident paradigm, including testing models in depth clinically prior to implementation.

Wu, C., Zhang, X., Zhang, Y., Wang, Y. & Xie W. Towards generalist foundation model for radiology. Preprint at https://doi.org/10.48550/arXiv.2308.02463 (2023). This study proposes the RadBench benchmark, which is a good example of a measure designed specifically for foundation models in radiology.

Wu, C. et al. Can GPT-4V(ision) serve medical applications? Case studies on GPT-4V for multimodal medical diagnosis. Preprint at https://doi.org/10.48550/arXiv.2310.09909 (2023).

Senkaiahliyan, S. et al. GPT-4V(ision) unsuitable for clinical care and education: a clinician-evaluated assessment. Preprint at medRxivhttps://doi.org/10.1101/2023.11.15.23298575 (2023).

Han, T. et al. Comparative analysis of GPT-4Vision, GPT-4 and open source LLMs in clinical diagnostic accuracy: a benchmark against human expertise. Preprint at medRxivhttps://doi.org/10.1101/2023.11.03.23297957 (2023).

Bannur, S. et al. Learning to exploit temporal structure for biomedical vision–language processing. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 15016–15027 (Computer Vision Foundation, 2023).

Zhang, K. et al. Multi-task paired masking with alignment modeling for medical vision-language pre-training. IEEE Trans. Multimedia 26, 4706–4721 (2023).

Chen, Z., Diao, S., Wang, B., Li, G. & Wan, X. Towards unifying medical vision-and-language pre-training via soft prompts. In Proc. IEEE/CVF International Conference on Computer Vision 23403–23413 (Computer Vision Foundation, 2023).

Guo, Z. et al. Evaluating large language models: a comprehensive survey. Preprint at https://doi.org/10.48550/arXiv.2310.19736 (2023).

Raji, I. D. & Buolamwini, J. Actionable auditing: investigating the impact of publicly naming biased performance results of commercial ai products. In Proc. 2019 AAAI/ACM Conference on AI, Ethics, and Society (eds Conitzer, V. et al.) 429–435 (Association for Computing Machinery, 2019).

Rastogi, C., Tulio Ribeiro, M., King, N., Nori, H. & Amershi S. Supporting human–AI collaboration in auditing LLMs with LLMs. In Proc. 2023 AAAI/ACM Conference on AI, Ethics, and Society (eds Rossi, F. et al.) 913–926 (Association for Computing Machinery, 2023).

Nori, H., King, N., McKinney, S. M., Carignan, D. & Horvitz, E. Capabilities of GPT-4 on medical challenge problems. Preprint at https://doi.org/10.48550/arXiv.2303.13375 (2023). GPT-4V is one of the most impactful recently introduced VLMs, and this study evaluates its medical capabilities.

Liu, J. et al. Qilin-Med-VL: towards Chinese large vision–language model for general healthcare. Preprint at https://doi.org/10.48550/arXiv.2310.17956 (2023).

Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat. Med. 28, 31–38 (2022).

CASPubMedGoogle Scholar

Yan, B. et al. Style-aware radiology report generation with RadGraph and few-shot prompting. Empir. Method Nat. Lang. Process.https://doi.org/10.18653/v1/2023.findings-emnlp.977 (2023).

Chen, Q. et al. Act like a radiologist: radiology report generation across anatomical regions. In Proceedings of the Asian Conference on Computer Vision 36–52 (Association for Computing Machinery, 2024).

Nicolson, A., Dowling, J., Anderson, D. & Koopman, B. Longitudinal data and a semantic similarity reward for chest X-ray report generation. Inform. Med. Unlocked 50, 101585 (2024).

Hyland S. L. et al. MAIRA-1: a specialised large multimodal model for radiology report generation. Preprint at https://doi.org/10.48550/arXiv.2311.13668 (2023).

Hou, W., Cheng, Y., Xu, K., Li, W. & Liu, J. RECAP: towards precise radiology report generation via dynamic disease progression reasoning. Empir. Method Nat. Lang. Process.https://doi.org/10.18653/v1/2023.findings-emnlp.140 (2023).

Shang, C. et al. MATNet: exploiting multi-modal features for radiology report generation. IEEE Signal Process. Lett. 29, 2692–2696 (2022).

ADSMATHGoogle Scholar

Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023). This review introduces generalist medical AI, which we extend to medical report generation.

ADSCASPubMedMATHGoogle Scholar

Gemini Team. Gemini: A Family of Highly Capable Multimodal Models (Google DeepMind, 2023). Gemini is another very impactful multimodal foundation model that has great potential within medical report generation.

Yue, X. et al. Mmmu: a massive multi-discipline multimodal understanding and reasoning benchmark for expert AGI. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 9556–9567 (Computer Vision Foundation, 2024).

Ni, M. et al. M3p: learning universal representations via multitask multilingual multimodal pre-training. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021 3977–3986 (Computer Vision Foundation, 2021).

Nagrani, A. et al. Attention bottlenecks for multimodal fusion. Adv. Neural Inf. Process Syst. 34, 14200–14213 (2021).

MATHGoogle Scholar

Chen, Y. J. et al. Representative image feature extraction via contrastive learning pretraining for chest X-ray report generation. Preprint at https://doi.org/10.48550/arXiv.2209.01604 (2022).

Shu, C. et al. MITER: medical image–text joint adaptive pretraining with multi-level contrastive learning. Expert Syst. Appl. 238, 121526 (2024).

Google Scholar

Tanida, T., Müller, P., Kaissis, G. & Rueckert, D. Interactive and explainable region-guided radiology report generation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 7433–7442 (Computer Vision Foundation, 2023). This is one of the few report generation studies that has explicitly explored multimodal outputs as a possibility in improving the interpretability of generated reports.

Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process Syst. 33, 9459–9474 (2020).

MATHGoogle Scholar

Yang, S. et al. Radiology report generation with a learned knowledge base and multi-modal alignment. Med. Image Anal. 86, 102798 (2023).

PubMedMATHGoogle Scholar

Chen, Z., Shen, Y., Song, Y. & Wan, X. Cross-modal memory networks for radiology report generation. Preprint at https://doi.org/10.48550/arXiv.2204.13258 (2022).

Li, M. et al. Dynamic graph enhanced contrastive learning for chest X-ray report generation. In Proc IEEE/CVF Conference on Computer Vision and Pattern Recognition 3334–3343 (Computer Vision Foundation, 2023).

Huang, Z., Zhang, X. & Zhang, S. KiUT: knowledge-injected U-transformer for radiology report generation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 19809–19818 (Computer Vision Foundation, 2023).

Zhang, K. et al. Semi-supervised medical report generation via graph-guided hybrid feature consistency. IEEE Trans. Multimed. 26, 904–915 (2024).

MATHGoogle Scholar

Hou, W., Xu, K., Cheng, Y., Li, W. & Liu, J. ORGAN: Observation-Guided Radiology Report Generation via Tree Reasoning. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Rogers, A., Boyd-Graber, J. & Okazaki, N.) 8108–8122 (ACL, 2023).

Wang, Y., Lin, Z. & Dong, H. Rethinking medical report generation: disease revealing enhancement with knowledge graph. In Proceedings of the 3rd Workshop on Interpretable Machine Learning in Healthcare (IMLH) at ICML (ICML, 2023).

Yan, S. et al. Attributed abnormality graph embedding for clinically accurate X-ray report generation. IEEE Trans. Med. Imaging 42, 2211–2222 (2023).

PubMedMATHGoogle Scholar

Kale, K., Bhattacharyya, P., Gune, M., Shetty, A. & Lawyer, R. KGVL-BART: knowledge graph augmented visual language BART for radiology report generation. In Proc. 17th Conference of the European Chapter of the Association for Computational Linguistics (eds Vlachos, A. & Augenstein, I.) 3393–3403 (Association for Computational Linguistics, 2023).

Cao, Y. et al. MMTN: multi-modal memory transformer network for image-report consistent medical report generation. In Proc. AAAI Conference on Artificial Intelligence Vol. 37 (eds Williams, B., Chen, Y. & Neville, J.) 277–285 (Association for Computing Machinery, 2023).

Wang, L. et al. An inclusive task-aware framework for radiology report generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Wang, L. et al.) 568–577 (Springer, 2022).

Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80 (2008).

PubMedMATHGoogle Scholar

Johnson, A. E. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 317 (2019). MIMIC-CXR is one of the largest and most commonly used datasets to develop medical report generation models, and there is a need for more such datasets across different specialties and image modalities.

PubMedPubMed CentralMATHGoogle Scholar

Jain, S. et al. RadGraph: extracting clinical entities and relations from radiology reports. In Advances in Neural Information Processing Systems, Datasets and Benchmarks Track Vol. 35 (NeurIPS, 2021). Radgraph is one of the most popular knowledge graphs used to incorporate external knowledge and boost clinical accuracy for radiology report generation models.

Yang, S., Wu, X., Ge, S., Zhou, S. K. & Xiao, L. Knowledge matters: chest radiology report generation with general and specific knowledge. Med. Image Anal. 80, 102510 (2022).

PubMedGoogle Scholar

Kale K. et al. “Knowledge is power”: constructing knowledge graph of abdominal organs and using them for automatic radiology report generation. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track) (eds Sitaram, S. et al.) 11–24 (Association for Computational Linguistics, 2023).

Zhang, J. et al. A novel deep learning model for medical report generation by inter-intra information calibration. IEEE J. Biomed. Health Inform. 27, 5110–5121 (2023).

PubMedMATHGoogle Scholar

Moon, J. H., Lee, H., Shin, W., Kim, Y. H. & Choi, E. Multi-modal understanding and generation for medical images and text via vision-language pre-training. IEEE J. Biomed. Health Inform. 26, 6070–6080 (2022).

PubMedMATHGoogle Scholar

Zhu, Q. et al. Utilizing longitudinal chest x-rays and reports to pre-fill radiology reports. In International Conference on Medical Image Computing and Computer-Assisted Intervention 189–198 (Springer, 2023).

Kale, K., Bhattacharyya, P. & Jadhav, K. Replace and report: NLP assisted radiology report generation. In Findings of the Association for Computational Linguistics (eds Rogers, A., Boyd-Graber, J. & Okazaki, N.) 10731–10742 (ACL, 2023).

Li, J., Li, S., Hu, Y. & Tao, H. A self-guided framework for radiology report generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Wang, L. et al.) 588–598 (Springer, 2022).

Xu, D. et al. Vision–knowledge fusion model for multi-domain medical report generation. Inf. Fusion 97, 101817 (2023).

MATHGoogle Scholar

Kaur, N. & Mittal, A. CheXPrune: sparse chest X-ray report generation model using multi-attention and one-shot global pruning. J. Ambient Intell. Humaniz. Comput. 14, 7485–7497 (2023).

PubMedMATHGoogle Scholar

You, J., Li, D., Okumura, M. & Suzuki, K. JPG–Jointly learn to align: automated disease prediction and radiology report generation. In Proc. 29th International Conference on Computational Linguistics (eds Calzolari, N. et al.) 5989–6001 (International Committee on Computational Linguistics, 2022).

Saini, T., Ajad, A. & Kumar, N. Deep ensemble architecture for knee osteoarthritis severity prediction and report generation. In 2023 5th International Conference on Recent Advances in Information Technology 1–6 (IEEE, 2023).

Jia, X. et al. Few-shot radiology report generation via knowledge transfer and multi-modal alignment. In 2022 IEEE International Conference on Bioinformatics and Biomedicine 1574–1579 (IEEE, 2022).

Sun, J., Wei, D., Wang, L. & Zheng, Y. Lesion guided explainable few weak-shot medical report generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Wang, L. et al.) 615–625 (Springer, 2022).

Shi, J., Wang, S., Wang, R. & Ma, S. AIMNet: adaptive image-tag merging network for automatic medical report generation. In IEEE International Conference on Acoustics, Speech and Signal Processing 7737–7741 (IEEE, 2022).

Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical twitter. Nat. Med. 29, 2307–2316 (2023).

Sun, Y. et al. Pathasst: a generative foundation ai assistant towards artificial general intelligence of pathology. In Proceedings of the AAAI Conference on Artificial Intelligence Vol. 38 5034–5042 (AAAI, 2024).

Zhou, J. et al. Pre-trained multimodal large language model enhances dermatological diagnosis using SkinGPT-4. Nat. Commun. 15, 5649 (2024).

Lu, M. Y. et al. A multimodal generative AI copilot for human pathology. Nature 634, 466–473 (2024).

Lin, W. et al. Pmc-clip: contrastive language-image pre-training using biomedical documents. In International Conference on Medical Image Computing and Computer-Assisted Intervention 525–536 (Springer, 2023).

Huisman, M., Joye, S. & Biltereyst, D. Searching for health: Doctor Google and the shifting dynamics of the middle-aged and older adult patient–physician relationship and interaction. J. Aging Health 32, 998–1007 (2020).

PubMedGoogle Scholar

Van Riel, N., Auwerx, K., Debbaut, P., Van Hees, S. & Schoenmakers, B. The effect of Dr Google on doctor–patient encounters in primary care: a quantitative, observational, cross-sectional study. BJGP Open 1, bjgpopen17X100833 (2017).

PubMedPubMed CentralGoogle Scholar

Stewart, M. A. What is a successful doctor-patient interview? A study of interactions and outcomes. Soc. Sci. Med. 19, 167–175 (1984).

CASPubMedMATHGoogle Scholar

Street, R. L. Jr, Makoul, G., Arora, N. K. & Epstein, R. M. How does communication heal? Pathways linking clinician–patient communication to health outcomes. Patient Educ. Couns. 74, 295–301 (2009).

PubMedGoogle Scholar

Ende, J. Feedback in clinical medical education. JAMA 250, 777–781 (1983).

CASPubMedMATHGoogle Scholar

Hewson, M. G. & Little, M. L. Giving feedback in medical education: verification of recommended techniques. J. Gen. Intern. Med. 13, 111–116 (1998).

CASPubMedPubMed CentralMATHGoogle Scholar

Fischetti, C. et al. The evolving importance of artificial intelligence and radiology in medical trainee education. Acad. Radiol. 29, S70–S75 (2022).

PubMedMATHGoogle Scholar

Clusmann, J. et al. The future landscape of large language models in medicine. Commun. Med. 3, 141 (2023).

PubMedPubMed CentralGoogle Scholar

Tanno, R., Barrett, D.G.T., Sellergren, A. et al. Collaboration between clinicians and vision–language models in radiology report generation. Nat. Med. 31, 599–608 (2025).

Christiano, P. F. et al. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) (Curran Associates, 2017).

Rafailov, R. et al. Direct preference optimization: your language model is secretly a reward model. In Proc. 37th International Conference on Neural Information Processing Systems (eds Oh, A. et al.) 53728–53741 (Association for Computing Machinery, 2024).

Pellegrini, C., Özsoy, E., Busam, B., Navab, N. & Keicher, M. RaDialog: a large vision–language model for radiology report generation and conversational assistance. Preprint at https://doi.org/10.48550/arXiv.2311.18681 (2023).

De Grave, A. J., Cai, Z. R., Janizek, J. D., Daneshjou, R. & Lee, S.-I. Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians. Nat. Biomed. Eng.https://doi.org/10.1038/s41551-023-01160-9 (2023).

Li, M., Liu, R., Wang, F., Chang, X. & Liang, X. Auxiliary signal-guided knowledge encoder–decoder for medical report generation. World Wide Web 26, 253–270 (2023).

PubMedMATHGoogle Scholar

Tang, Y., Yang, H., Zhang, L. & Yuan, Y. Work like a doctor: unifying scan localizer and dynamic generator for automated computed tomography report generation. Expert Syst. Appl. 237, 121442 (2024).

Google Scholar

Voutharoja, B. P., Wang, L. & Zhou, L. Automatic radiology report generation by learning with increasingly hard negatives. In ECAI 2023 2427–2434 (IOS Press, 2023).

Papineni, K., Roukos, S., Ward, T. & Zhu, W. J. BLEU: a method for automatic evaluation of machine translation. In Proc. 40th Annual Meeting of the Association for Computational Linguistics (ed. Isabel, P.) 311–318 (Association for Computing Machinery, 2002).

Banerjee, S. & Lavie, A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In Proc. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (eds Callison-Burch, C. et al.) 65–72 (Association for Computing Machinery, 2005).

Vedantam, R., Lawrence Zitnick, C. & Parikh, D. CIDEr: consensus-based image description evaluation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 4566–4575 (IEEE, 2015).

Lin, C. Y. Rouge: A package for automatic evaluation of summaries. In Proc. Workshop on Text Summarization Branches Out 74–81 (Association for Computational Linguistics, 2004).

Chaves, J. M. Z. et al. RaLEs: a benchmark for radiology language evaluations. In 37th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (eds Oh, A. et al.) (Association for Computing Machinery, 2023).

Yu, F. et al. Evaluating progress in automatic chest X-ray radiology report generation. Patterns. 4, 100802 (2023).

PubMedPubMed CentralGoogle Scholar

Huang, J. et al. Generative artificial intelligence for chest radiograph interpretation in the emergency department. JAMA Netw. Open 6, e2336100–e2336100 (2023).

PubMedPubMed CentralGoogle Scholar

Chen, L., Zaharia, M. & Zou, J. How is ChatGPT’s behavior changing over time? Harv. Data Sci. Rev.https://doi.org/10.1162/99608f92.5317da47 (2024).

Tu, S. et al. ChatLog: recording and analyzing ChatGPT across time. Preprint at https://doi.org/10.48550/arXiv.2304.14106 (2023).

Shakarian, P., Koyyalamudi, A., Ngu, N. & Mareedu, L. An independent evaluation of ChatGPT on mathematical word problems (MWP). In Proceedings of the AAAI 2023 Spring Symposium on Challenges Requiring the Combination of Machine Learning and Knowledge Engineering (AAAI, 2023).

Finlayson, S. G. et al. Adversarial attacks on medical machine learning. Science 363, 1287–1289 (2019).

ADSCASPubMedPubMed CentralMATHGoogle Scholar

Zou, A., Wang, Z., Kolter, J. Z. & Fredrikson, M. Universal and transferable adversarial attacks on aligned language models. Preprint at https://doi.org/10.48550/arXiv.2307.15043 (2023).

Xu, X., Kong, K., Liu, N., Cui, L., Wang, D., Zhang, J. & Kankanhalli, M. An LLM can fool itself: a prompt-based adversarial attack. In Proc. of the Twelfth International Conference on Learning Representations (ICLR, 2024).

Challen, R. et al. Artificial intelligence, bias and clinical safety. BMJ Qual. Saf. 28, 231–237 (2019).

PubMedPubMed CentralGoogle Scholar

Parasuraman, R. & Manzey, D. H. Complacency and bias in human use of automation: an attentional integration. Hum. Factors 52, 381–410 (2010).

PubMedMATHGoogle Scholar

Saenz, A., Chen, E., Marklund, H. & Rajpurkar, P. The MAIDA initiative: establishing a framework for global medical-imaging data sharing. Lancet Digit. Health 6, e6–e8 (2024).

CASPubMedGoogle Scholar

Seyyed-Kalantari, L., Zhang, H., McDermott, M. B., Chen, I. Y. & Ghassemi, M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat. Med. 27, 2176–2182 (2021).

CASPubMedPubMed CentralGoogle Scholar

Jussupow, E., Spohrer, K., Heinzl, A. & Gawlitza, J. Augmenting medical diagnosis decisions? An investigation into physicians’ decision-making process with artificial intelligence. Inf. Syst. Res. 32, 713–735 (2021).

Google Scholar

Kempt, H., Heilinger, J. C. & Nagel, S. K. “I’m afraid I can’t let you do that, Doctor”: meaningful disagreements with AI in medical contexts. AI Soc. 38, 1407–1414 (2023).

Google Scholar

Montemayor, C., Halpern, J. & Fairweather, A. In principle obstacles for empathic AI: why we can’t replace human empathy in healthcare. AI Soc. 37, 1353–1359 (2022).

PubMedGoogle Scholar

Mittermaier, M., Raza, M. M. & Kvedar, J. C. Bias in AI-based models for medical applications: challenges and mitigation strategies. Npj Digit. Med. 6, 113 (2023).

PubMedPubMed CentralGoogle Scholar

Roselli, D., Matthews, J. & Talagala, N. Managing bias in AI. In Companion Proc. 2019 World Wide Web Conference (eds Liu, L. & White, R.) 539–544 (Association for Computing Machinery, 2019).

Tang, Y., Tang, Y., Zhu, Y., Xiao, J. & Summers, R. M. A disentangled generative model for disease decomposition in chest X-rays via normal image synthesis. Med. Image Anal. 67, 101839 (2021).

PubMedMATHGoogle Scholar

Liu, C., Shah, A., Bai, W. & Arcucci, R. Utilizing synthetic data for medical vision-language pre-training: bypassing the need for real images. Preprint at https://doi.org/10.48550/arXiv.2310.07027 (2023).

Bridge, P., Fielding, A., Rowntree, P. & Pullar, A. Intraobserver variability: should we worry? J. Med. Imag. Rad. Sci. 47, 217–220 (2016).

Google Scholar

Karimi, D., Dou, H., Warfield, S. K. & Gholipour, A. Deep learning with noisy labels: exploring techniques and remedies in medical image analysis. Med. Image Anal. 65, 101759 (2020).

PubMedPubMed CentralGoogle Scholar

Mahapatra, D., Bozorgtabar, B. & Ge, Z. Medical image classification using generalized zero shot learning. In Proc. IEEE/CVF International Conference on Computer Vision (eds Berg, T. et al.) 3344–3353 (IEEE, 2021).

Xian, Y., Schiele, B. & Akata, Z. Zero-shot learning—the good, the bad and the ugly. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 4582–4591 (IEEE, 2017).

Wang, Z., Zhou, L., Wang, L. & Li, X. A self-boosting framework for automated radiographic report generation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Berg, T. et al.) 2433–2442 (IEEE, 2021).

Shi, Y., Ji, J., Zhang, X., Qu, L. & Liu Y. Granularity matters: pathological graph-driven cross-modal alignment for brain CT report generation. In Proc. 2023 Conference on Empirical Methods in Natural Language Processing (eds Bouamor, H. et al.) 6617–6630 (Association for Computational Linguistics, 2023).

Liu, C. F. et al. Automatic comprehensive radiological reports for clinical acute stroke MRIs. Commun. Med. 3, 95 (2023).

CASPubMedPubMed CentralMATHGoogle Scholar

Han, Z. et al. Unifying neural learning and symbolic reasoning for spinal medical report generation. Med. Image Anal. 67, 101872 (2021).

PubMedMATHGoogle Scholar

Han, Z., Wei, B., Leung, S., Chung, J. & Li, S. Towards automatic report generation in spine radiology using weakly supervised framework. In Medical Image Computing and Computer Assisted Intervention 2018: 21st International Conference (eds Frangi, A. F. et al.) 185–193 (Springer, 2018).

Lei, J. et al. Unibrain: universal brain MRI diagnosis with hierarchical knowledge-enhanced pre-training. Preprint at https://doi.org/10.48550/arXiv.2309.06828 (2023). One of the few studies that has explored text generation based on 3D medical images, especially MRI scans, which proposed an expansive dataset consisting of MRI and text pairings.

Wu, F. et al. AGNet: automatic generation network for skin imaging reports. Comput. Biol. Med. 141, 105037 (2022).

PubMedMATHGoogle Scholar

Li, M. et al. Cross-modal clinical graph transformer for ophthalmic report generation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 20656–20665 (IEEE, 2022).

Huang, J. H. et al. DeepOpht: medical report generation for retinal images via deep models and visual explanation. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 2442–2452 (IEEE, 2021).

Li, M. et al. FFA-IR: towards an explainable and reliable medical report generation benchmark. In 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (Association for Computing Machinery, 2021).

Topol, E. Why doctors should organize. The New Yorkerhttps://www.newyorker.com/culture/annals-of-inquiry/why-doctors-should-organize (5 August 2019).

Download references

Author information

Author notes

These authors contributed equally: Eric J. Topol, Pranav Rajpurkar

Authors and Affiliations

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA

Vishwanatha M. Rao, Michael Hla, Subathra Adithan & Pranav Rajpurkar

Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA

Vishwanatha M. Rao

Department of Computer Science, Harvard College, Cambridge, MA, USA

Michael Hla

Department of Computer Science, Stanford University, Stanford, CA, USA

Michael Moor

Department of Biosystems Science and Engineering, ETH Zurich, Zurich, Switzerland

Michael Moor

Department of Radiodiagnosis, Jawaharlal Institute of Postgraduate Medical Education and Research, Puducherry, India

Subathra Adithan

Department of Radiology, Johns Hopkins University, Baltimore, MD, USA

Stephen Kwak

Scripps Research, La Jolla, CA, USA

Eric J. Topol

Authors

Vishwanatha M. Rao

View author publications

You can also search for this author inPubMedGoogle Scholar

2. Michael Hla

View author publications

You can also search for this author inPubMedGoogle Scholar

3. Michael Moor

View author publications

You can also search for this author inPubMedGoogle Scholar

4. Subathra Adithan

View author publications

You can also search for this author inPubMedGoogle Scholar

5. Stephen Kwak

View author publications

You can also search for this author inPubMedGoogle Scholar

6. Eric J. Topol

View author publications

You can also search for this author inPubMedGoogle Scholar

7. Pranav Rajpurkar

View author publications

You can also search for this author inPubMedGoogle Scholar

Contributions

P.R. and E.J.T. conceptualized the study. V.M.R., M.M., E.J.T. and P.R. designed the review article. V.M.R. and M.H. made substantial contributions to the synthesis and writing of the article. V.M.R., M.M. and P.R. designed and implemented the illustrations. S.A. and S.K. offered important clinical insight and context. All authors provided critical feedback and contributed substantially to the revision of the manuscript.

Corresponding authors

Correspondence to Eric J. Topol or Pranav Rajpurkar.

Ethics declarations

Competing interests

P.R. is the co-founder of a2z Radiology AI. The other authors declare no competing interests.

Peer review

Peer review information

Nature thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rao, V.M., Hla, M., Moor, M. et al. Multimodal generative AI for medical image interpretation. Nature 639, 888–896 (2025). https://doi.org/10.1038/s41586-025-08675-y

Download citation

Received:13 January 2024

Accepted:20 January 2025

Published:26 March 2025

Issue Date:27 March 2025

DOI:https://doi.org/10.1038/s41586-025-08675-y

Share this article

Anyone you share the following link with will be able to read this content:

Get shareable link

Sorry, a shareable link is not currently available for this article.

Copy to clipboard

Provided by the Springer Nature SharedIt content-sharing initiative

Read full news in source page