Abstract
Accurately interpreting medical images and generating insightful narrative reports is indispensable for patient care but places heavy burdens on clinical experts. Advances in artificial intelligence (AI), especially in an area that we refer to as multimodal generative medical image interpretation (GenMI), create opportunities to automate parts of this complex process. In this Perspective, we synthesize progress and challenges in developing AI systems for generation of medical reports from images. We focus extensively on radiology as a domain with enormous reporting needs and research efforts. In addition to analysing the strengths and applications of new models for medical report generation, we advocate for a novel paradigm to deploy GenMI in a manner that empowers clinicians and their patients. Initial research suggests that GenMI could one day match human expert performance in generating reports across disciplines, such as radiology, pathology and dermatology. However, formidable obstacles remain in validating model accuracy, ensuring transparency and eliciting nuanced impressions. If carefully implemented, GenMI could meaningfully assist clinicians in improving quality of care, enhancing medical education, reducing workloads, expanding specialty access and providing real-time expertise. Overall, we highlight opportunities alongside key challenges for developing multimodal generative AI that complements human experts for reliable medical report writing.
Access through your institution
Buy or subscribe
This is a preview of subscription content, access via your institution
Access options
Access through your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Learn more
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Learn more
Buy this article
Purchase on SpringerLink
Instant access to full article PDF
Buy now
Prices may be subject to local taxes which are calculated during checkout
Additional access options:
Log in
Learn about institutional subscriptions
Read our FAQs
Contact customer support
Fig. 1: Applications for automated medical report generation.
Fig. 2: The capabilities of GenMI.
Fig. 3: Implementing an AI resident.
References
Côté, M. J. & Smith, M. A. Forecasting the demand for radiology services. Health Syst. 7, 79–88 (2018).
MATHGoogle Scholar
Al Yassin, A., Sadaghiani, M. S., Mohan, S., Bryan, R. N. & Nasrallah, I. It is about “time”: academic neuroradiologist time distribution for interpreting brain MRIs. Acad. Radiol. 25, 1521–1525 (2018).
PubMedGoogle Scholar
Reiner, B. I., Knight, N. & Siegel, E. L. Radiology reporting, past, present, and future: the radiologist’s perspective. J. Am. Coll. Radiol. 4, 313–319 (2007).
PubMedGoogle Scholar
Carter, A. J., Davis, K. A., Evans, L. V. & Cone, D. C. Information loss in emergency medical services handover of trauma patients. Prehosp. Emerg. Care 13, 280–285 (2009).
PubMedMATHGoogle Scholar
Clynch, N. & Kellett, J. Medical documentation: part of the solution, or part of the problem? A narrative review of the literature on the time spent on and value of medical documentation. Int. J. Med. Inf. 84, 221–228 (2015).
Google Scholar
Bruno, M. A., Walker, E. A. & Abujudeh, H. H. Understanding and confronting our mistakes: the epidemiology of error in radiology and strategies for error reduction. Radiographics 35, 1668–1676 (2015).
PubMedMATHGoogle Scholar
Srinivasa Babu, A. & Brooks, M. L. The malpractice liability of radiology reports: minimizing the risk. Radiographics 35, 547–554 (2015).
PubMedGoogle Scholar
Wahl, B., Cossy-Gantner, A., Germann, S. & Schwalbe, N. R. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? BMJ Glob. Health 3, e000798 (2018).
PubMedPubMed CentralGoogle Scholar
Ganaie, M. A., Hu, M., Malik, A. K., Tanveer, M. & Suganthan, P. N. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 115, 105151 (2022).
Google Scholar
Weinberg, B. D., Richter, M. D., Champine, J. G., Morriss, M. C. & Browning, T. Radiology resident preliminary reporting in an independent call environment: multiyear assessment of volume, timeliness, and accuracy. J. Am. Coll. Radiol. 12, 95–100 (2015).
PubMedGoogle Scholar
Taylor, A. G., Mielke, C. & Mongan, J. Automated detection of moderate and large pneumothorax on frontal chest X-rays using deep convolutional neural networks: a retrospective study. PLoS Med. 15, e1002697 (2018).
PubMedPubMed CentralGoogle Scholar
Cui, S. et al. Development and clinical application of deep learning model for lung nodules screening on CT images. Sci. Rep. 10, 13657 (2020).
ADSCASPubMedPubMed CentralMATHGoogle Scholar
Liu, W. N. et al. Study on detection rate of polyps and adenomas in artificial-intelligence-aided colonoscopy. Saudi J. Gastroenterol. 26, 13 (2020).
PubMedMATHGoogle Scholar
Rodríguez-Ruiz, A. et al. Detection of breast cancer with mammography: effect of an artificial intelligence support system. Radiology 290, 305–314 (2019).
PubMedMATHGoogle Scholar
Delrue, L. et al. in Comparitive Interpretation of CT and Standard Radiography of the Chest (eds Baert, A. I. et al.) 27–49 (Springer, 2011).
Messina, P. et al. A survey on deep learning and explainability for automatic report generation from medical images. ACM Comput. Surv. 54, 1–40 (2022).
MATHGoogle Scholar
Mohsan, M. M. et al. Vision transformer and language model based radiology report generation. IEEE Access. 11, 1814–1824 (2022).
MATHGoogle Scholar
Yang, B., Raza, A., Zou, Y. & Zhang, T. PCLmed at ImageCLEFmedical 2023: customizing general-purpose foundation models for medical report generation. CLEF (Working Notes) 1754–1766 (CLEF, 2023).
Nicolson, A., Dowling, J. & Koopman, B. Improving chest X-ray report generation by leveraging warm starting. Artif. Intell. Med. 144, 102633 (2023).
PubMedGoogle Scholar
Ramesh, V., Chi, N. A. & Rajpurkar, P. Improving radiology report generation systems by removing hallucinated references to non-existent priors. Proc. Mach. Learn. Res. 193, 456–473 (2022). This study introduces a novel method that uses LLMs in report generation to rewrite generated reports, which subsequent studies have built on.
Google Scholar
Ranjit, M., Ganapathy, G., Manuel, R. & Ganu, T. Retrieval augmented chest X-ray report generation using Openai GPT models. Proc. Mach Learn. Res. 219, 650–666 (2023).
Google Scholar
Tu, T. et al. Towards generalist biomedical AI. NEJM AI 1, AIoa2300138 (2024). This study is a prototypical example of a new generalist medical AI model that uses foundation models to expand the capabilities for report generation.
Moor, M. et al. Med-flamingo: a multimodal medical few-shot learner. Proc. Mach Learn. Res. 225, 353–367 (2023).
MATHGoogle Scholar
Zhao, Z. et al. ChatCAD+: toward a universal and reliable interactive CAD using LLMs. IEEE Trans. Med. Imaging 43, 3755–3766 (2024).
Lin, B. et al. Towards medical artificial general intelligence via knowledge-enhanced multimodal pretraining. Preprint at https://doi.org/10.48550/arXiv.2304.14204 (2023).
Lee, S., Kim, W. J., Chang, J. & Ye, J. C. LLM-CXR: instruction-finetuned LLM for CXR image understanding and generation. In The Twelfth International Conference on Learning Representations (ICLR, 2024).
Xu, S. et al. ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders. Preprint at https://doi.org/10.48550/arXiv.2308.01317 (2023).
Jeong, J. et al. Multimodal image-text matching improves retrieval-based chest x-ray report generation. In Medical Imaging with Deep Learning 978–990 (PMLR, 2024). This study introduces concepts that inspired our AI resident paradigm, including testing models in depth clinically prior to implementation.
Wu, C., Zhang, X., Zhang, Y., Wang, Y. & Xie W. Towards generalist foundation model for radiology. Preprint at https://doi.org/10.48550/arXiv.2308.02463 (2023). This study proposes the RadBench benchmark, which is a good example of a measure designed specifically for foundation models in radiology.
Wu, C. et al. Can GPT-4V(ision) serve medical applications? Case studies on GPT-4V for multimodal medical diagnosis. Preprint at https://doi.org/10.48550/arXiv.2310.09909 (2023).
Senkaiahliyan, S. et al. GPT-4V(ision) unsuitable for clinical care and education: a clinician-evaluated assessment. Preprint at medRxivhttps://doi.org/10.1101/2023.11.15.23298575 (2023).
Han, T. et al. Comparative analysis of GPT-4Vision, GPT-4 and open source LLMs in clinical diagnostic accuracy: a benchmark against human expertise. Preprint at medRxivhttps://doi.org/10.1101/2023.11.03.23297957 (2023).
Bannur, S. et al. Learning to exploit temporal structure for biomedical vision–language processing. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 15016–15027 (Computer Vision Foundation, 2023).
Zhang, K. et al. Multi-task paired masking with alignment modeling for medical vision-language pre-training. IEEE Trans. Multimedia 26, 4706–4721 (2023).
Chen, Z., Diao, S., Wang, B., Li, G. & Wan, X. Towards unifying medical vision-and-language pre-training via soft prompts. In Proc. IEEE/CVF International Conference on Computer Vision 23403–23413 (Computer Vision Foundation, 2023).
Guo, Z. et al. Evaluating large language models: a comprehensive survey. Preprint at https://doi.org/10.48550/arXiv.2310.19736 (2023).
Raji, I. D. & Buolamwini, J. Actionable auditing: investigating the impact of publicly naming biased performance results of commercial ai products. In Proc. 2019 AAAI/ACM Conference on AI, Ethics, and Society (eds Conitzer, V. et al.) 429–435 (Association for Computing Machinery, 2019).
Rastogi, C., Tulio Ribeiro, M., King, N., Nori, H. & Amershi S. Supporting human–AI collaboration in auditing LLMs with LLMs. In Proc. 2023 AAAI/ACM Conference on AI, Ethics, and Society (eds Rossi, F. et al.) 913–926 (Association for Computing Machinery, 2023).
Nori, H., King, N., McKinney, S. M., Carignan, D. & Horvitz, E. Capabilities of GPT-4 on medical challenge problems. Preprint at https://doi.org/10.48550/arXiv.2303.13375 (2023). GPT-4V is one of the most impactful recently introduced VLMs, and this study evaluates its medical capabilities.
Liu, J. et al. Qilin-Med-VL: towards Chinese large vision–language model for general healthcare. Preprint at https://doi.org/10.48550/arXiv.2310.17956 (2023).
Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat. Med. 28, 31–38 (2022).
CASPubMedGoogle Scholar
Yan, B. et al. Style-aware radiology report generation with RadGraph and few-shot prompting. Empir. Method Nat. Lang. Process.https://doi.org/10.18653/v1/2023.findings-emnlp.977 (2023).
Chen, Q. et al. Act like a radiologist: radiology report generation across anatomical regions. In Proceedings of the Asian Conference on Computer Vision 36–52 (Association for Computing Machinery, 2024).
Nicolson, A., Dowling, J., Anderson, D. & Koopman, B. Longitudinal data and a semantic similarity reward for chest X-ray report generation. Inform. Med. Unlocked 50, 101585 (2024).
Hyland S. L. et al. MAIRA-1: a specialised large multimodal model for radiology report generation. Preprint at https://doi.org/10.48550/arXiv.2311.13668 (2023).
Hou, W., Cheng, Y., Xu, K., Li, W. & Liu, J. RECAP: towards precise radiology report generation via dynamic disease progression reasoning. Empir. Method Nat. Lang. Process.https://doi.org/10.18653/v1/2023.findings-emnlp.140 (2023).
Shang, C. et al. MATNet: exploiting multi-modal features for radiology report generation. IEEE Signal Process. Lett. 29, 2692–2696 (2022).
ADSMATHGoogle Scholar
Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023). This review introduces generalist medical AI, which we extend to medical report generation.
ADSCASPubMedMATHGoogle Scholar
Gemini Team. Gemini: A Family of Highly Capable Multimodal Models (Google DeepMind, 2023). Gemini is another very impactful multimodal foundation model that has great potential within medical report generation.
Yue, X. et al. Mmmu: a massive multi-discipline multimodal understanding and reasoning benchmark for expert AGI. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 9556–9567 (Computer Vision Foundation, 2024).
Ni, M. et al. M3p: learning universal representations via multitask multilingual multimodal pre-training. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021 3977–3986 (Computer Vision Foundation, 2021).
Nagrani, A. et al. Attention bottlenecks for multimodal fusion. Adv. Neural Inf. Process Syst. 34, 14200–14213 (2021).
MATHGoogle Scholar
Chen, Y. J. et al. Representative image feature extraction via contrastive learning pretraining for chest X-ray report generation. Preprint at https://doi.org/10.48550/arXiv.2209.01604 (2022).
Shu, C. et al. MITER: medical image–text joint adaptive pretraining with multi-level contrastive learning. Expert Syst. Appl. 238, 121526 (2024).
Google Scholar
Tanida, T., Müller, P., Kaissis, G. & Rueckert, D. Interactive and explainable region-guided radiology report generation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 7433–7442 (Computer Vision Foundation, 2023). This is one of the few report generation studies that has explicitly explored multimodal outputs as a possibility in improving the interpretability of generated reports.
Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process Syst. 33, 9459–9474 (2020).
MATHGoogle Scholar
Yang, S. et al. Radiology report generation with a learned knowledge base and multi-modal alignment. Med. Image Anal. 86, 102798 (2023).
PubMedMATHGoogle Scholar
Chen, Z., Shen, Y., Song, Y. & Wan, X. Cross-modal memory networks for radiology report generation. Preprint at https://doi.org/10.48550/arXiv.2204.13258 (2022).
Li, M. et al. Dynamic graph enhanced contrastive learning for chest X-ray report generation. In Proc IEEE/CVF Conference on Computer Vision and Pattern Recognition 3334–3343 (Computer Vision Foundation, 2023).
Huang, Z., Zhang, X. & Zhang, S. KiUT: knowledge-injected U-transformer for radiology report generation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 19809–19818 (Computer Vision Foundation, 2023).
Zhang, K. et al. Semi-supervised medical report generation via graph-guided hybrid feature consistency. IEEE Trans. Multimed. 26, 904–915 (2024).
MATHGoogle Scholar
Hou, W., Xu, K., Cheng, Y., Li, W. & Liu, J. ORGAN: Observation-Guided Radiology Report Generation via Tree Reasoning. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Rogers, A., Boyd-Graber, J. & Okazaki, N.) 8108–8122 (ACL, 2023).
Wang, Y., Lin, Z. & Dong, H. Rethinking medical report generation: disease revealing enhancement with knowledge graph. In Proceedings of the 3rd Workshop on Interpretable Machine Learning in Healthcare (IMLH) at ICML (ICML, 2023).
Yan, S. et al. Attributed abnormality graph embedding for clinically accurate X-ray report generation. IEEE Trans. Med. Imaging 42, 2211–2222 (2023).
PubMedMATHGoogle Scholar
Kale, K., Bhattacharyya, P., Gune, M., Shetty, A. & Lawyer, R. KGVL-BART: knowledge graph augmented visual language BART for radiology report generation. In Proc. 17th Conference of the European Chapter of the Association for Computational Linguistics (eds Vlachos, A. & Augenstein, I.) 3393–3403 (Association for Computational Linguistics, 2023).
Cao, Y. et al. MMTN: multi-modal memory transformer network for image-report consistent medical report generation. In Proc. AAAI Conference on Artificial Intelligence Vol. 37 (eds Williams, B., Chen, Y. & Neville, J.) 277–285 (Association for Computing Machinery, 2023).
Wang, L. et al. An inclusive task-aware framework for radiology report generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Wang, L. et al.) 568–577 (Springer, 2022).
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80 (2008).
PubMedMATHGoogle Scholar
Johnson, A. E. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 317 (2019). MIMIC-CXR is one of the largest and most commonly used datasets to develop medical report generation models, and there is a need for more such datasets across different specialties and image modalities.
PubMedPubMed CentralMATHGoogle Scholar
Jain, S. et al. RadGraph: extracting clinical entities and relations from radiology reports. In Advances in Neural Information Processing Systems, Datasets and Benchmarks Track Vol. 35 (NeurIPS, 2021). Radgraph is one of the most popular knowledge graphs used to incorporate external knowledge and boost clinical accuracy for radiology report generation models.
Yang, S., Wu, X., Ge, S., Zhou, S. K. & Xiao, L. Knowledge matters: chest radiology report generation with general and specific knowledge. Med. Image Anal. 80, 102510 (2022).
PubMedGoogle Scholar
Kale K. et al. “Knowledge is power”: constructing knowledge graph of abdominal organs and using them for automatic radiology report generation. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track) (eds Sitaram, S. et al.) 11–24 (Association for Computational Linguistics, 2023).
Zhang, J. et al. A novel deep learning model for medical report generation by inter-intra information calibration. IEEE J. Biomed. Health Inform. 27, 5110–5121 (2023).
PubMedMATHGoogle Scholar
Moon, J. H., Lee, H., Shin, W., Kim, Y. H. & Choi, E. Multi-modal understanding and generation for medical images and text via vision-language pre-training. IEEE J. Biomed. Health Inform. 26, 6070–6080 (2022).
PubMedMATHGoogle Scholar
Zhu, Q. et al. Utilizing longitudinal chest x-rays and reports to pre-fill radiology reports. In International Conference on Medical Image Computing and Computer-Assisted Intervention 189–198 (Springer, 2023).
Kale, K., Bhattacharyya, P. & Jadhav, K. Replace and report: NLP assisted radiology report generation. In Findings of the Association for Computational Linguistics (eds Rogers, A., Boyd-Graber, J. & Okazaki, N.) 10731–10742 (ACL, 2023).
Li, J., Li, S., Hu, Y. & Tao, H. A self-guided framework for radiology report generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Wang, L. et al.) 588–598 (Springer, 2022).
Xu, D. et al. Vision–knowledge fusion model for multi-domain medical report generation. Inf. Fusion 97, 101817 (2023).
MATHGoogle Scholar
Kaur, N. & Mittal, A. CheXPrune: sparse chest X-ray report generation model using multi-attention and one-shot global pruning. J. Ambient Intell. Humaniz. Comput. 14, 7485–7497 (2023).
PubMedMATHGoogle Scholar
You, J., Li, D., Okumura, M. & Suzuki, K. JPG–Jointly learn to align: automated disease prediction and radiology report generation. In Proc. 29th International Conference on Computational Linguistics (eds Calzolari, N. et al.) 5989–6001 (International Committee on Computational Linguistics, 2022).
Saini, T., Ajad, A. & Kumar, N. Deep ensemble architecture for knee osteoarthritis severity prediction and report generation. In 2023 5th International Conference on Recent Advances in Information Technology 1–6 (IEEE, 2023).
Jia, X. et al. Few-shot radiology report generation via knowledge transfer and multi-modal alignment. In 2022 IEEE International Conference on Bioinformatics and Biomedicine 1574–1579 (IEEE, 2022).
Sun, J., Wei, D., Wang, L. & Zheng, Y. Lesion guided explainable few weak-shot medical report generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Wang, L. et al.) 615–625 (Springer, 2022).
Shi, J., Wang, S., Wang, R. & Ma, S. AIMNet: adaptive image-tag merging network for automatic medical report generation. In IEEE International Conference on Acoustics, Speech and Signal Processing 7737–7741 (IEEE, 2022).
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical twitter. Nat. Med. 29, 2307–2316 (2023).
Sun, Y. et al. Pathasst: a generative foundation ai assistant towards artificial general intelligence of pathology. In Proceedings of the AAAI Conference on Artificial Intelligence Vol. 38 5034–5042 (AAAI, 2024).
Zhou, J. et al. Pre-trained multimodal large language model enhances dermatological diagnosis using SkinGPT-4. Nat. Commun. 15, 5649 (2024).
Lu, M. Y. et al. A multimodal generative AI copilot for human pathology. Nature 634, 466–473 (2024).
Lin, W. et al. Pmc-clip: contrastive language-image pre-training using biomedical documents. In International Conference on Medical Image Computing and Computer-Assisted Intervention 525–536 (Springer, 2023).
Huisman, M., Joye, S. & Biltereyst, D. Searching for health: Doctor Google and the shifting dynamics of the middle-aged and older adult patient–physician relationship and interaction. J. Aging Health 32, 998–1007 (2020).
PubMedGoogle Scholar
Van Riel, N., Auwerx, K., Debbaut, P., Van Hees, S. & Schoenmakers, B. The effect of Dr Google on doctor–patient encounters in primary care: a quantitative, observational, cross-sectional study. BJGP Open 1, bjgpopen17X100833 (2017).
PubMedPubMed CentralGoogle Scholar
Stewart, M. A. What is a successful doctor-patient interview? A study of interactions and outcomes. Soc. Sci. Med. 19, 167–175 (1984).
CASPubMedMATHGoogle Scholar
Street, R. L. Jr, Makoul, G., Arora, N. K. & Epstein, R. M. How does communication heal? Pathways linking clinician–patient communication to health outcomes. Patient Educ. Couns. 74, 295–301 (2009).
PubMedGoogle Scholar
Ende, J. Feedback in clinical medical education. JAMA 250, 777–781 (1983).
CASPubMedMATHGoogle Scholar
Hewson, M. G. & Little, M. L. Giving feedback in medical education: verification of recommended techniques. J. Gen. Intern. Med. 13, 111–116 (1998).
CASPubMedPubMed CentralMATHGoogle Scholar
Fischetti, C. et al. The evolving importance of artificial intelligence and radiology in medical trainee education. Acad. Radiol. 29, S70–S75 (2022).
PubMedMATHGoogle Scholar
Clusmann, J. et al. The future landscape of large language models in medicine. Commun. Med. 3, 141 (2023).
PubMedPubMed CentralGoogle Scholar
Tanno, R., Barrett, D.G.T., Sellergren, A. et al. Collaboration between clinicians and vision–language models in radiology report generation. Nat. Med. 31, 599–608 (2025).
Christiano, P. F. et al. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) (Curran Associates, 2017).
Rafailov, R. et al. Direct preference optimization: your language model is secretly a reward model. In Proc. 37th International Conference on Neural Information Processing Systems (eds Oh, A. et al.) 53728–53741 (Association for Computing Machinery, 2024).
Pellegrini, C., Özsoy, E., Busam, B., Navab, N. & Keicher, M. RaDialog: a large vision–language model for radiology report generation and conversational assistance. Preprint at https://doi.org/10.48550/arXiv.2311.18681 (2023).
De Grave, A. J., Cai, Z. R., Janizek, J. D., Daneshjou, R. & Lee, S.-I. Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians. Nat. Biomed. Eng.https://doi.org/10.1038/s41551-023-01160-9 (2023).
Li, M., Liu, R., Wang, F., Chang, X. & Liang, X. Auxiliary signal-guided knowledge encoder–decoder for medical report generation. World Wide Web 26, 253–270 (2023).
PubMedMATHGoogle Scholar
Tang, Y., Yang, H., Zhang, L. & Yuan, Y. Work like a doctor: unifying scan localizer and dynamic generator for automated computed tomography report generation. Expert Syst. Appl. 237, 121442 (2024).
Google Scholar
Voutharoja, B. P., Wang, L. & Zhou, L. Automatic radiology report generation by learning with increasingly hard negatives. In ECAI 2023 2427–2434 (IOS Press, 2023).
Papineni, K., Roukos, S., Ward, T. & Zhu, W. J. BLEU: a method for automatic evaluation of machine translation. In Proc. 40th Annual Meeting of the Association for Computational Linguistics (ed. Isabel, P.) 311–318 (Association for Computing Machinery, 2002).
Banerjee, S. & Lavie, A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In Proc. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (eds Callison-Burch, C. et al.) 65–72 (Association for Computing Machinery, 2005).
Vedantam, R., Lawrence Zitnick, C. & Parikh, D. CIDEr: consensus-based image description evaluation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 4566–4575 (IEEE, 2015).
Lin, C. Y. Rouge: A package for automatic evaluation of summaries. In Proc. Workshop on Text Summarization Branches Out 74–81 (Association for Computational Linguistics, 2004).
Chaves, J. M. Z. et al. RaLEs: a benchmark for radiology language evaluations. In 37th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (eds Oh, A. et al.) (Association for Computing Machinery, 2023).
Yu, F. et al. Evaluating progress in automatic chest X-ray radiology report generation. Patterns. 4, 100802 (2023).
PubMedPubMed CentralGoogle Scholar
Huang, J. et al. Generative artificial intelligence for chest radiograph interpretation in the emergency department. JAMA Netw. Open 6, e2336100–e2336100 (2023).
PubMedPubMed CentralGoogle Scholar
Chen, L., Zaharia, M. & Zou, J. How is ChatGPT’s behavior changing over time? Harv. Data Sci. Rev.https://doi.org/10.1162/99608f92.5317da47 (2024).
Tu, S. et al. ChatLog: recording and analyzing ChatGPT across time. Preprint at https://doi.org/10.48550/arXiv.2304.14106 (2023).
Shakarian, P., Koyyalamudi, A., Ngu, N. & Mareedu, L. An independent evaluation of ChatGPT on mathematical word problems (MWP). In Proceedings of the AAAI 2023 Spring Symposium on Challenges Requiring the Combination of Machine Learning and Knowledge Engineering (AAAI, 2023).
Finlayson, S. G. et al. Adversarial attacks on medical machine learning. Science 363, 1287–1289 (2019).
ADSCASPubMedPubMed CentralMATHGoogle Scholar
Zou, A., Wang, Z., Kolter, J. Z. & Fredrikson, M. Universal and transferable adversarial attacks on aligned language models. Preprint at https://doi.org/10.48550/arXiv.2307.15043 (2023).
Xu, X., Kong, K., Liu, N., Cui, L., Wang, D., Zhang, J. & Kankanhalli, M. An LLM can fool itself: a prompt-based adversarial attack. In Proc. of the Twelfth International Conference on Learning Representations (ICLR, 2024).
Challen, R. et al. Artificial intelligence, bias and clinical safety. BMJ Qual. Saf. 28, 231–237 (2019).
PubMedPubMed CentralGoogle Scholar
Parasuraman, R. & Manzey, D. H. Complacency and bias in human use of automation: an attentional integration. Hum. Factors 52, 381–410 (2010).
PubMedMATHGoogle Scholar
Saenz, A., Chen, E., Marklund, H. & Rajpurkar, P. The MAIDA initiative: establishing a framework for global medical-imaging data sharing. Lancet Digit. Health 6, e6–e8 (2024).
CASPubMedGoogle Scholar
Seyyed-Kalantari, L., Zhang, H., McDermott, M. B., Chen, I. Y. & Ghassemi, M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat. Med. 27, 2176–2182 (2021).
CASPubMedPubMed CentralGoogle Scholar
Jussupow, E., Spohrer, K., Heinzl, A. & Gawlitza, J. Augmenting medical diagnosis decisions? An investigation into physicians’ decision-making process with artificial intelligence. Inf. Syst. Res. 32, 713–735 (2021).
Google Scholar
Kempt, H., Heilinger, J. C. & Nagel, S. K. “I’m afraid I can’t let you do that, Doctor”: meaningful disagreements with AI in medical contexts. AI Soc. 38, 1407–1414 (2023).
Google Scholar
Montemayor, C., Halpern, J. & Fairweather, A. In principle obstacles for empathic AI: why we can’t replace human empathy in healthcare. AI Soc. 37, 1353–1359 (2022).
PubMedGoogle Scholar
Mittermaier, M., Raza, M. M. & Kvedar, J. C. Bias in AI-based models for medical applications: challenges and mitigation strategies. Npj Digit. Med. 6, 113 (2023).
PubMedPubMed CentralGoogle Scholar
Roselli, D., Matthews, J. & Talagala, N. Managing bias in AI. In Companion Proc. 2019 World Wide Web Conference (eds Liu, L. & White, R.) 539–544 (Association for Computing Machinery, 2019).
Tang, Y., Tang, Y., Zhu, Y., Xiao, J. & Summers, R. M. A disentangled generative model for disease decomposition in chest X-rays via normal image synthesis. Med. Image Anal. 67, 101839 (2021).
PubMedMATHGoogle Scholar
Liu, C., Shah, A., Bai, W. & Arcucci, R. Utilizing synthetic data for medical vision-language pre-training: bypassing the need for real images. Preprint at https://doi.org/10.48550/arXiv.2310.07027 (2023).
Bridge, P., Fielding, A., Rowntree, P. & Pullar, A. Intraobserver variability: should we worry? J. Med. Imag. Rad. Sci. 47, 217–220 (2016).
Google Scholar
Karimi, D., Dou, H., Warfield, S. K. & Gholipour, A. Deep learning with noisy labels: exploring techniques and remedies in medical image analysis. Med. Image Anal. 65, 101759 (2020).
PubMedPubMed CentralGoogle Scholar
Mahapatra, D., Bozorgtabar, B. & Ge, Z. Medical image classification using generalized zero shot learning. In Proc. IEEE/CVF International Conference on Computer Vision (eds Berg, T. et al.) 3344–3353 (IEEE, 2021).
Xian, Y., Schiele, B. & Akata, Z. Zero-shot learning—the good, the bad and the ugly. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 4582–4591 (IEEE, 2017).
Wang, Z., Zhou, L., Wang, L. & Li, X. A self-boosting framework for automated radiographic report generation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Berg, T. et al.) 2433–2442 (IEEE, 2021).
Shi, Y., Ji, J., Zhang, X., Qu, L. & Liu Y. Granularity matters: pathological graph-driven cross-modal alignment for brain CT report generation. In Proc. 2023 Conference on Empirical Methods in Natural Language Processing (eds Bouamor, H. et al.) 6617–6630 (Association for Computational Linguistics, 2023).
Liu, C. F. et al. Automatic comprehensive radiological reports for clinical acute stroke MRIs. Commun. Med. 3, 95 (2023).
CASPubMedPubMed CentralMATHGoogle Scholar
Han, Z. et al. Unifying neural learning and symbolic reasoning for spinal medical report generation. Med. Image Anal. 67, 101872 (2021).
PubMedMATHGoogle Scholar
Han, Z., Wei, B., Leung, S., Chung, J. & Li, S. Towards automatic report generation in spine radiology using weakly supervised framework. In Medical Image Computing and Computer Assisted Intervention 2018: 21st International Conference (eds Frangi, A. F. et al.) 185–193 (Springer, 2018).
Lei, J. et al. Unibrain: universal brain MRI diagnosis with hierarchical knowledge-enhanced pre-training. Preprint at https://doi.org/10.48550/arXiv.2309.06828 (2023). One of the few studies that has explored text generation based on 3D medical images, especially MRI scans, which proposed an expansive dataset consisting of MRI and text pairings.
Wu, F. et al. AGNet: automatic generation network for skin imaging reports. Comput. Biol. Med. 141, 105037 (2022).
PubMedMATHGoogle Scholar
Li, M. et al. Cross-modal clinical graph transformer for ophthalmic report generation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 20656–20665 (IEEE, 2022).
Huang, J. H. et al. DeepOpht: medical report generation for retinal images via deep models and visual explanation. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 2442–2452 (IEEE, 2021).
Li, M. et al. FFA-IR: towards an explainable and reliable medical report generation benchmark. In 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (Association for Computing Machinery, 2021).
Topol, E. Why doctors should organize. The New Yorkerhttps://www.newyorker.com/culture/annals-of-inquiry/why-doctors-should-organize (5 August 2019).
Download references
Author information
Author notes
These authors contributed equally: Eric J. Topol, Pranav Rajpurkar
Authors and Affiliations
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Vishwanatha M. Rao, Michael Hla, Subathra Adithan & Pranav Rajpurkar
Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Vishwanatha M. Rao
Department of Computer Science, Harvard College, Cambridge, MA, USA
Michael Hla
Department of Computer Science, Stanford University, Stanford, CA, USA
Michael Moor
Department of Biosystems Science and Engineering, ETH Zurich, Zurich, Switzerland
Michael Moor
Department of Radiodiagnosis, Jawaharlal Institute of Postgraduate Medical Education and Research, Puducherry, India
Subathra Adithan
Department of Radiology, Johns Hopkins University, Baltimore, MD, USA
Stephen Kwak
Scripps Research, La Jolla, CA, USA
Eric J. Topol
Authors
Vishwanatha M. Rao
View author publications
You can also search for this author inPubMedGoogle Scholar
2. Michael Hla
View author publications
You can also search for this author inPubMedGoogle Scholar
3. Michael Moor
View author publications
You can also search for this author inPubMedGoogle Scholar
4. Subathra Adithan
View author publications
You can also search for this author inPubMedGoogle Scholar
5. Stephen Kwak
View author publications
You can also search for this author inPubMedGoogle Scholar
6. Eric J. Topol
View author publications
You can also search for this author inPubMedGoogle Scholar
7. Pranav Rajpurkar
View author publications
You can also search for this author inPubMedGoogle Scholar
Contributions
P.R. and E.J.T. conceptualized the study. V.M.R., M.M., E.J.T. and P.R. designed the review article. V.M.R. and M.H. made substantial contributions to the synthesis and writing of the article. V.M.R., M.M. and P.R. designed and implemented the illustrations. S.A. and S.K. offered important clinical insight and context. All authors provided critical feedback and contributed substantially to the revision of the manuscript.
Corresponding authors
Correspondence to Eric J. Topol or Pranav Rajpurkar.
Ethics declarations
Competing interests
P.R. is the co-founder of a2z Radiology AI. The other authors declare no competing interests.
Peer review
Peer review information
Nature thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Reprints and permissions
About this article
Check for updates. Verify currency and authenticity via CrossMark
Cite this article
Rao, V.M., Hla, M., Moor, M. et al. Multimodal generative AI for medical image interpretation. Nature 639, 888–896 (2025). https://doi.org/10.1038/s41586-025-08675-y
Download citation
Received:13 January 2024
Accepted:20 January 2025
Published:26 March 2025
Issue Date:27 March 2025
DOI:https://doi.org/10.1038/s41586-025-08675-y
Share this article
Anyone you share the following link with will be able to read this content:
Get shareable link
Sorry, a shareable link is not currently available for this article.
Copy to clipboard
Provided by the Springer Nature SharedIt content-sharing initiative