nature.com

Chat Gpt vs an experienced ophthalmologist: evaluating chatbot writing performance in ophthalmology

Abstract

Purpose

To examine the abilities of ChatGPT in writing scientific ophthalmology introductions and to compare those abilities to experienced ophthalmologists.

Methods

OpenAI web interface was utilized to interact with and prompt ChatGPT 4 for generating the introductions for the selected papers. Consequently, each paper had two introductions—one drafted by ChatGPT and the other by the original author. Ten ophthalmology specialists with a minimal experience of more than 15 years, each representing distinct subspecialties—retina, neuro-ophthalmology, oculoplastic, glaucoma, and ocular oncology were provided with the two sets of introductions without revealing the origin (ChatGPT or human author) and were tasked to evaluate the introductions.

Results

For each type of introduction, out of 45 instances, specialists correctly identified the source 26 times (57.7%) and erred 19 times (42.2%). The misclassification rates for introductions were 25% for experts evaluating introductions from their own subspecialty while to 44.4% for experts assessed introductions outside their subspecialty domain. In the comparative evaluation of introductions written by ChatGPT and human authors, no significant difference was identified across the assessed metrics (language, data arrangement, factual accuracy, originality, data Currency). The misclassification rate (the frequency at which reviewers incorrectly identified the authorship) was highest in Oculoplastic (66.7%) and lowest in Retina (11.1%).

Conclusions

ChatGPT represents a significant advancement in facilitating the creation of original scientific papers in ophthalmology. The introductions generated by ChatGPT showed no statistically significant difference compared to those written by experts in terms of language, data organization, factual accuracy, originality, and the currency of information. In addition, nearly half of them being indistinguishable from the originals. Future research endeavours should explore ChatGPT-4’s utility in composing other sections of research papers and delve into the associated ethical considerations.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Subscribe to this journal

Receive 18 print issues and online access

$259.00 per year

only $14.39 per issue

Learn more

Buy this article

Purchase on SpringerLink

Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Additional access options:

Log in

Learn about institutional subscriptions

Read our FAQs

Contact customer support

Fig. 1

Fig. 2: Confusion matrix of assessment of specialists’ discernment between ChatGPT-generated and specialist-written introductions.

Fig. 3

Fig. 4: The chart compares median scores for different metrics across classification outcomes: AI correctly identified, AI mistaken as human, human correctly identified, and human mistaken as AI.

Data availability

The data that support the findings of this study are available from Sheba Medical Center but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Sheba Medical Center.

References

Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023;6:1169595 https://doi.org/10.3389/frai.2023.1169595.

ArticlePubMedPubMed CentralGoogle Scholar

Klang E, Levy-Mendelovich S. Evaluation of OpenAI’s large language model as a new tool for writing papers in the field of thrombosis and hemostasis. J Thromb Haemost. 2023;21:1055–8. https://doi.org/10.1016/j.jtha.2023.01.011.

ArticlePubMedGoogle Scholar

Palal D, Ghonge S, Jadav V, Rathod H. ChatGPT: A Double-Edged Sword?. Heal Serv insights. 2023;16:11786329231174338 https://doi.org/10.1177/11786329231174338.

ArticleGoogle Scholar

Gottlieb M, Kline JA, Schneider AJ, Coates WC. ChatGPT and conversational artificial intelligence: Friend, foe, or future of research?. Am J Emerg Med. 2023;70:81–83. https://doi.org/10.1016/j.ajem.2023.05.018.

ArticlePubMedGoogle Scholar

Sallam M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthc. 2023;11 https://doi.org/10.3390/healthcare11060887.

Uz C, Umay E “Dr ChatGPT”: Is it a reliable and useful source for common rheumatic diseases? Int J Rheum Dis. 2023. https://doi.org/10.1111/1756-185X.14749.

Kleebayoon A, Wiwanitkit V. Rhinoplasty Consultation with ChatGPT. Aesthetic Plast Surg. 2023. https://doi.org/10.1007/s00266-023-03394-z.

Sun GH, Hoelscher SH. The ChatGPT Storm and What Faculty Can Do. Nurse Educ. 2023;48:119–24. https://doi.org/10.1097/NNE.0000000000001390.

ArticlePubMedGoogle Scholar

Lahat A, Shachar E, Avidan B, Shatz Z, Glicksberg BS, Klang E. Evaluating the use of large language model in identifying top research questions in gastroenterology. Sci Rep. 2023;13:4164.

Sorin V, Klang E, Sklair-Levy M, Cohen I, Zippel DB, Balint Lahat N, et al. Large language model (ChatGPT) as a support tool for breast tumor board. NPJ Breast Cancer. 2023;9:44.

Şendur HN, Şendur AB, Cerit MN. ChatGPT from radiologists’ perspective. Br J Radio. 2023;96:20230203 https://doi.org/10.1259/bjr.20230203.

ArticleGoogle Scholar

Pozzessere C. Optimizing Communication of Radiation Exposure in Medical Imaging, the Radiologist Challenge. Tomography. 2023;9:717–20. https://doi.org/10.3390/tomography9020057.

ArticlePubMedPubMed CentralGoogle Scholar

de Pennington N, Mole G, Lim E, Milne-Ives M, Normando E, Xue K, et al. Safety and Acceptability of a Natural Language Artificial Intelligence Assistant to Deliver Clinical Follow-up to Cataract Surgery Patients: Proposal. JMIR Res Protoc. 2021;10:e27227 https://doi.org/10.2196/27227.

ArticlePubMedPubMed CentralGoogle Scholar

Singh S, Djalilian A, Ali MJ. ChatGPT and Ophthalmology: Exploring Its Potential with Discharge Summaries and Operative Notes. Semin Ophthalmol. 2023;38:1–5. https://doi.org/10.1080/08820538.2023.2209166.

ArticleGoogle Scholar

Lee JY. Can an artificial intelligence chatbot be the author of a scholarly article?. J Educ Eval Health Prof. 2023;20:6 https://doi.org/10.3352/jeehp.2023.20.6.

ArticlePubMedPubMed CentralGoogle Scholar

Hill-Yardin EL, Hutchinson MR, Laycock R, Spencer SJ. A Chat(GPT) about the future of scientific publishing. Brain Behav Immun. 2023;110:152–4. https://doi.org/10.1016/j.bbi.2023.02.022.

ArticlePubMedGoogle Scholar

Salvagno M, Taccone FS, Gerli AG. Can artificial intelligence help for scientific writing?. Crit Care. 2023;27: https://doi.org/10.1186/s13054-023-04380-2.

ArticlePubMedPubMed CentralGoogle Scholar

Ali MJ, Singh S. ChatGPT and scientific abstract writing: pitfalls and caution. Graefes Arch Clin Exp Ophthalmol. 2023;261:3205–6. https://doi.org/10.1007/s00417-023-06123-z.

ArticlePubMedGoogle Scholar

Sarohia GS, Nanji K, Khan M, Khalid MF, Rosenberg D, Deonarain DM, et al. Treat-and-extend versus alternate dosing strategies with anti-vascular endothelial growth factor agents to treat center involving diabetic macular edema: A systematic review and meta-analysis of 2,346 eyes. Surv Ophthalmol. 2022;67:1346–63. https://doi.org/10.1016/J.SURVOPHTHAL.2022.04.003.

ArticlePubMedGoogle Scholar

Dalvin LA, Shields CL, Ancona-Lezama DA, Yu MD, Di Nicola M, Williams BK Jr, et al. Combination of multimodal imaging features predictive of choroidal nevus transformation into melanoma. Br J Ophthalmol. 2019;103:1441–7. https://doi.org/10.1136/BJOPHTHALMOL-2018-312967.

ArticlePubMedGoogle Scholar

Hyder YF, Homer V, Thaller M, Byrne M, Tsermoulas G, Piccus R, et al. Defining the Phenotype and Prognosis of People With Idiopathic Intracranial Hypertension After Cerebrospinal Fluid Diversion Surgery. Am J Ophthalmol. 2023;250:70–81. https://doi.org/10.1016/J.AJO.2023.01.016.

ArticlePubMedGoogle Scholar

Chen RI, Purgert R, Eisengart J. Gonioscopy-Assisted Transluminal Trabeculotomy and Goniotomy, With or Without Concomitant Cataract Extraction, in Steroid-Induced and Uveitic Glaucoma: 24-Month Outcomes. J Glaucoma. 2023;32:501–10. https://doi.org/10.1097/IJG.0000000000002183.

ArticlePubMedGoogle Scholar

Tawfik HA, Dutton JJ. Debunking the Puzzle of Eyelid Apraxia: The Muscle of Riolan Hypothesis. Ophthal Plast Reconstr Surg. 2023;39:211–20. https://doi.org/10.1097/IOP.0000000000002291.

ArticlePubMedGoogle Scholar

Bernstein IA, Zhang YV, Govil D, Majid I, Chang RT, Sun Y, et al. Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions. JAMA Netw Open. 2023;6:e2330320 https://doi.org/10.1001/jamanetworkopen.2023.30320.

ArticlePubMedPubMed CentralGoogle Scholar

Oca MC, Meller L, Wilson K, Parikh AO, McCoy A, Chang J, et al. Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations. Cureus. 2023;15:45911 https://doi.org/10.7759/cureus.45911.

ArticleGoogle Scholar

Salimi A, Saheb H. Large Language Models in Ophthalmology Scientific Writing: Ethical Considerations Blurred Lines or Not at All?. Am J Ophthalmol. 2023;254:177–81. https://doi.org/10.1016/j.ajo.2023.06.004.

ArticlePubMedGoogle Scholar

The AI writing on the wall. Nat Mach Intell. 2023;5:1 https://doi.org/10.1038/S42256-023-00613-9.

Chelli M, Descamps J, Lavoué V, Trojani C, Azar M, Deckert M, et al. Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis. J Med Internet Res. 2024;26:53164 https://doi.org/10.2196/53164.

ArticleGoogle Scholar

Mostafapour M, Fortier JH, Pacheco K, Murray H, Garber G. Evaluating Literature Reviews Conducted by Humans Versus ChatGPT: Comparative Study. JMIR AI. 2024;3:56537 https://doi.org/10.2196/56537.

ArticleGoogle Scholar

Májovský M, Černý M, Kasal M, Komarc M, Netuka D. Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora’s Box Has Been Opened. J Med Internet Res. 2023;25:46924 https://doi.org/10.2196/46924.

ArticleGoogle Scholar

American Academy of Ophthalmology. Residency Structure - American Academy of Ophthalmology. 2023. https://www.aao.org/medical-students/residency-program-structure.

The Royal College of Ophthalmologists. Training. The Royal College of Ophthalmologists. 2023. https://www.rcophth.ac.uk/training/.

Aguwa UT, Williams BK, Woreta FA. Diversity, equity and inclusion in ophthalmology. Curr Opin Ophthalmol. 2023;34:378–81. https://doi.org/10.1097/ICU.0000000000000970.

ArticlePubMedGoogle Scholar

Download references

Author information

Author notes

These authors contributed equally: Gabriel Katz, Ofira Zloto.

Authors and Affiliations

Faculty of Medical & Health Sciences, Tel Aviv University, Tel Aviv, Israel

Gabriel Katz, Ofira Zloto, Avner Hostovsky, Ruth Huna-Baron, Iris Ben-Bassat Mizrachi, Zvia Burgansky, Alon Skaat, Vicktoria Vishnevskia-Dai, Ido Didi Fabian, Oded Sagiv & Ayelet Priel

Goldschleger Eye Institute, Sheba Medical Center, Tel Hashomer, Israel

Gabriel Katz, Ofira Zloto, Avner Hostovsky, Ruth Huna-Baron, Iris Ben-Bassat Mizrachi, Zvia Burgansky, Alon Skaat, Vicktoria Vishnevskia-Dai, Ido Didi Fabian, Oded Sagiv & Ayelet Priel

Section of Ophthalmology, Department of Head and Neck Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX, USA

Oded Sagiv

The Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Medical Center, New York, NY, USA

Benjamin S. Glicksberg & Eyal Klang

The Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA

Eyal Klang

Authors

Gabriel Katz

View author publications

You can also search for this author inPubMedGoogle Scholar

2. Ofira Zloto

View author publications

You can also search for this author inPubMedGoogle Scholar

3. Avner Hostovsky

View author publications

You can also search for this author inPubMedGoogle Scholar

4. Ruth Huna-Baron

View author publications

You can also search for this author inPubMedGoogle Scholar

5. Iris Ben-Bassat Mizrachi

View author publications

You can also search for this author inPubMedGoogle Scholar

6. Zvia Burgansky

View author publications

You can also search for this author inPubMedGoogle Scholar

7. Alon Skaat

View author publications

You can also search for this author inPubMedGoogle Scholar

8. Vicktoria Vishnevskia-Dai

View author publications

You can also search for this author inPubMedGoogle Scholar

9. Ido Didi Fabian

View author publications

You can also search for this author inPubMedGoogle Scholar

10. Oded Sagiv

View author publications

You can also search for this author inPubMedGoogle Scholar

11. Ayelet Priel

View author publications

You can also search for this author inPubMedGoogle Scholar

12. Benjamin S. Glicksberg

View author publications

You can also search for this author inPubMedGoogle Scholar

13. Eyal Klang

View author publications

You can also search for this author inPubMedGoogle Scholar

Contributions

Conceived and designed the analysis- OZ, EK, GK. Collected the data- OZ, GK. Contributed data - OZ, GK, AH, RHB, IBBM, ZB, AS, VVD, IDF, OS, AP, BSG. Performed the analysis- EK. Wrote the paper- OZ, EK. Revise the paper- GK, AH, RHB, IBBM, ZB, AS, VVD, IDF, OS, AP, BSG.

Corresponding author

Correspondence to Ofira Zloto.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Katz, G., Zloto, O., Hostovsky, A. et al. Chat GPT vs an experienced ophthalmologist: evaluating chatbot writing performance in ophthalmology. Eye (2025). https://doi.org/10.1038/s41433-025-03779-1

Download citation

Received:02 February 2024

Revised:18 February 2025

Accepted:20 March 2025

Published:01 April 2025

DOI:https://doi.org/10.1038/s41433-025-03779-1

Share this article

Anyone you share the following link with will be able to read this content:

Get shareable link

Sorry, a shareable link is not currently available for this article.

Copy to clipboard

Provided by the Springer Nature SharedIt content-sharing initiative

Read full news in source page