Quality of AI chatbot-generated information on hypersensitivity pneumonitis for clinical and patient use

Derya Yenibertiz; Güzide TOMAS

doi:10.36141/svdld.2026.18399

Authors

Derya Yenibertiz Department of Pulmonology, University of Health Sciences, Dr. Abdurrahman Yurtaslan Ankara Oncology Training and Research Hospital, Ankara, Türkiye https://orcid.org/0000-0002-1783-4015
Güzide TOMAS Department of Pulmonology, University of Health Sciences, Sultan Abdülhamid Han Training and Research Hospital, İstanbul, Türkiye

Keywords:

Hypersensitivity pneumonitis, Artificial intelligence, Chatbots, Patient education

Abstract

Background and Aim: Hypersensitivity pneumonitis (HP) is a complex, immüne mediated interstitial lung disease in which accurate diagnosis and long term management require integration of clinical, radiologic, and exposure-related information. Patients increasingly use artificial intelligence (AI) based chatbots to obtain disease related information; however, the quality, readability, and patient usability of such content remain unclear. This study aimed to evaluate the quality, reliability, readability, and patient-centered usability of AI chatbot generated information on HP.

Materials and Methods: Using Google Trends, we identified four of the most frequently searched patient-oriented questions regarding HP: (1) What is HP and what causes it? (2) What are the clinical features of HP? (3) How is HP treated? (4) How is HP diagnosed? These questions were submitted verbatim to eight AI chatbots (ChatGPT-5.1, Claude 3, Microsoft Copilot, DeepSeek V3, Gemini Pro, Grok 4, Kimi K2, Perplexity AI). A total of 32 responses were independently evaluated in a blinded fashion by four pulmonology professors specializing in interstitial lung diseases. Content quality and reliability were assessed using DISCERN; understandability and actionability with PEMAT-P; global written readability with the Written Readability Rating (WRR); and structural readability with the Flesch–Kincaid Grade Level (FKGL).

Results: All chatbot outputs required advanced literacy, with FKGL scores ranging from 20.17 to 29.07 and a mean of approximately 24–25, indicating college or postgraduate reading level. No chatbot produced content within the recommended patient-appropriate range (FKGL ≤ 8). WRR scores declined with increasing clinical complexity, from 67.85 for definitional content (Q1) to 51.227 for diagnostic explanations (Q4). DISCERN scores varied substantially across models (35.001–57.103), with most chatbots falling into the “fair–good” range, reflecting partially reliable but incomplete information. [..]

Conclusion: AI chatbots can generate clinically rich explanations of HP but currently produce content that is too complex and insufficiently actionable for most patients. [..]

References

1. Calaras D, David A, Vasarmidi E, Antoniou K, Corlateanu A. Hypersensitivity Pneumonitis: Challenges of a Complex Disease. Can Respir J. 2024 Jan 18;2024:4919951. doi: 10.1155/2024/4919951.

2. Hamblin M, Prosch H, Vašáková M. Diagnosis, course and management of hypersensitivity pneumonitis. European Respiratory Review. 2022;31(163).

3. Dabiri M, Jehangir M, Khoshpouri P, Chalian H. Hypersensitivity Pneumonitis: A Pictorial Review Based on the New ATS/JRS/ALAT Clinical Practice Guideline for Radiologists and Pulmonologists. Diagnostics (Basel). 2022 Nov 20;12(11):2874. doi: 10.3390/diagnostics12112874.

4. Magee AL, Montner SM, Husain A, Adegunsoye A, Vij R, Chung JH. Imaging of Hypersensitivity Pneumonitis. Radiol Clin North Am. 2016 Nov;54(6):1033-1046. doi: 10.1016/j.rcl.2016.05.013.

5. Zhao YC, Zhao M, Song S. Online health information seeking among patients with chronic conditions: integrating the health belief model and social support theory. Journal of medical Internet research. 2022;24(11):e42447.

6. Daraz L, Morrow AS, Ponce OJ, Farah W, Katabi A, Majzoub A, et al. Readability of Online Health Information: A Meta-Narrative Systematic Review. Am J Med Qual. 2018 Sep/Oct;33(5):487-492. doi: 10.1177/1062860617751639.

7. Goodman RS, Patrinely JR, Stone CA Jr, Zimmerman E, Donald RR, Chang SS, et al. Accuracy and Reliability of Chatbot Responses to Physician Questions. JAMA Netw Open. 2023 Oct 2;6(10):e2336483. doi: 10.1001/jamanetworkopen.2023.36483.

8. Bezerra Botelho A, Ferreira RG, Coletta ENAM, Cerezoli MT, Martins RB, Gomes PS, et al. Transbronchial biopsy in chronic hypersensitivity pneumonitis . Sarcoidosis Vasc Diffuse Lung Dis [Internet]. 2021 Jun. 28 [cited 2026 May 21];38(2):e2021018. Available from: https://mattioli1885journals.com/index.php/sarcoidosis/article/view/8998

9. Fahy S, Oehme S, Milinkovic D, Jung T, Bartek B. Assessment of Quality and Readability of Information Provided by ChatGPT in Relation to Anterior Cruciate Ligament Injury. J Pers Med. 2024 Jan 18;14(1):104. doi: 10.3390/jpm14010104.

10. Jido JT, Al-Wizni A, Le Aung S. Readability of AI-generated patient information leaflets on Alzheimer’s, vascular dementia, and delirium. Cureus. 2025;17(6).

11. Abeo ANA, Armstrong S, Scriney M, Goss H. Artificial Intelligence Techniques and Health Literacy: A Systematic Review. Mayo Clin Proc Digit Health. 2025 Sep 24;3(4):100269. doi: 10.1016/j.mcpdig.2025.100269.

12. Carlson JA, Cheng RZ, Lange A, Nagalakshmi N, Rabets J, Shah T, et al. Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware. Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996.