Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis

Hazan Tomar Bozkurt; Abdullah Bozkurt

doi:10.62425/esbder.1911961

TR EN

YAPAY ZEKA TABANLI BÜYÜK DİL MODELLERİ POSTPARTUM DEPRESYON HAKKINDA GÜVENİLİR BİLGİ SAĞLAYABİLİR Mİ? SİSTEMATİK BİR İÇERİK ANALİZİ

Abstract

Amaç: Bu çalışma, dört yaygın kullanılan büyük dil modelinin (BDM) —ChatGPT 25.0, Google Gemini, DeepSeek ve Claude— postpartum depresyona (PPD) ilişkin sık sorulan sorulara verdikleri yanıtların bilgi kalitesini, bilimsel güvenilirliğini ve okunabilirliğini sistematik olarak değerlendirmeyi ve karşılaştırmayı amaçlamıştır. Yöntem: Bu metodolojik değerlendirme çalışması, PPD'ye ilişkin 40 BDM tarafından üretilmiş yanıtı incelemiştir. Bilgi kalitesi DISCERN aracı ile, bilimsel güvenilirlik 5'li Likert ölçeği ile, okunabilirlik ise Flesch Okunabilirlik Kolaylığı Puanı (FRES) ve Flesch–Kincaid Sınıf Düzeyi (FKGL) ile ölçülmüştür. Bulgular: Dört modelin tamamı bilgi kalitesi ve bilimsel güvenilirlik açısından yeterlilik eşiklerini karşılamıştır. DeepSeek en yüksek ortalama DISCERN puanına ulaşmış; bu puan Claude'un puanından istatistiksel olarak anlamlı düzeyde yüksek bulunmuştur. Dört model arasında bilimsel güvenilirlik puanları bakımından istatistiksel olarak anlamlı bir fark saptanmamıştır. Okunabilirlik açısından Claude, FKGL puanlarına göre ChatGPT 5.0 ve Google Gemini'ye kıyasla istatistiksel olarak anlamlı biçimde daha karmaşık metinler üretmiştir. Tüm modellerin ortalama FRES değerleri "zor" ile "oldukça zor" aralığında yer almış; gruplar arasında anlamlı bir fark gözlemlenmemiştir. Sonuç: Dört BDM'nin tamamı PPD konusunda yeterli bilgi kalitesi ve bilimsel güvenilirlik sergilemiş; en yüksek bilgi kalitesi DeepSeek'te gözlemlenmiştir. Ancak tüm modellerde okunabilirlik açısından önemli eksiklikler tespit edilmiş; en dilsel açıdan karmaşık çıktılar Claude tarafından üretilmiştir. Bu bulgular, BDM'lerin PPD'ye ilişkin tamamlayıcı sağlık bilgi kaynakları olarak umut verici bir potansiyel taşıdığına işaret etmekle birlikte, karmaşık sağlık bilgilerini kavramada bilişsel ve duygusal engeller yaşayabilecek postpartum kadınlar için hasta eğitimi alanındaki önerilen okunabilirlik standartlarını karşılayabilmek amacıyla çıktıların sadeleştirilmesi gerektiğini ortaya koymaktadır.

Keywords

Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis

Abstract

Objective: This study aimed to systematically evaluate and compare the information quality, scientific reliability, and readability of the responses provided by four widely used large language models (LLMs)—ChatGPT, Google Gemini, DeepSeek, and Claude—to frequently asked questions regarding postpartum depression (PPD).

Methods: This descriptive cross-sectional study assessed 40 LLM-generated responses concerning PPD. Information quality was assessed using the DISCERN tool, scientific reliability was evaluated using a 5-point Likert scale, and readability was measured using the Flesch Reading Ease Score (FRES) and Flesch–Kincaid Grade Level (FKGL).

Results: All four models met the adequacy thresholds for information quality and scientific reliability. DeepSeek achieved the highest mean DISCERN score, which was significantly higher than that of Claude. No statistically significant difference was observed in scientific reliability scores across the four models. Regarding readability, Claude produced significantly more complex texts than ChatGPT and Google Gemini based on FKGL scores. Mean FRES values for all models fell within the "difficult" to "fairly difficult" range, with no significant between-group difference.

Conclusion: All four LLMs demonstrated adequate information quality and scientific reliability regarding PPD, with DeepSeek exhibiting the highest information quality. However, substantial deficiencies were identified in readability across all models, with Claude producing the most linguistically complex outputs. These findings suggest that while LLMs show promising potential as complementary health information sources for PPD, their outputs require simplification to meet recommended readability standards for patient education, particularly for postpartum populations who may experience cognitive and emotional barriers to comprehending complex health information.

Keywords

References

Agarwal, V., Jin, Y., Chandra, M., De Choudhury, M., Kumar, S., & Sastry, N. (2025). MedHalu: hallucinations in responses to healthcare queries by large language models.
Alamleh, S., Mavedatnia, D., Francis, G., Le, T., Davies, J., Lin, V., & Lee, J. J. W. (2025). Readability, reliability, and quality analysis of internet-based patient education materials and large language models on Meniere’s disease. Journal of Otolaryngology Head & Neck Surgery, 54. https://doi.org/10.1177/19160216251360651
Arakawa, Y., Haseda, M., Inoue, K., Nishioka, D., Kino, S., Nishi, D., Hashimoto, H., & Kondo, N. (2023). Effectiveness of mHealth consultation services for preventing postpartum depressive symptoms: a randomized clinical trial. BMC Medicine, 21(1), 221. https://doi.org/10.1186/s12916-023-02918-3
Behers, B. J., Vargas, I. A., Behers, B. M., Rosario, M. A., Wojtas, C. N., Deevers, A. C., & Hamad, K. M. (2024). Assessing the readability of patient education materials on cardiac catheterization from artificial intelligence chatbots: an observational cross-sectional study. Cureus, 16(7). https://doi.org/10.7759/cureus.63865
Charnock, D., Shepperd, S., Needham, G., & Gann, R. (1999). DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. Journal of Epidemiology and Community Health, 53(2), 105–111. https://doi.org/10.1136/jech.53.2.105
Cherrez-Ojeda, I., Zuberbier, T., Rodas-Valero, G., Sanchez, J., Rudenko, M., Dramburg, S., Demoly, P., Caimmi, D., Gómez, R. M., Ramon, G. D., Fouda, G. E., Quimby, K. R., Chong-Neto, H., Calderon Llosa, O., Larco, J. I., Monge Ortega, O. P., Faytong-Haro, M., Pfaar, O., Bousquet, J., & Robles-Velasco, K. (2025). Evaluation of the quality and reliability of ChatGPT-4 responses on allergen immunotherapy using validated instruments. Clinical and Translational Allergy, 15(12), e70130. https://doi.org/10.1002/clt2.70130
Curry, S. J., Krist, A. H., Owens, D. K., Barry, M. J., Caughey, A. B., Davidson, K. W., Doubeni, C. A., Epling, J. W., Jr., Grossman, D. C., Kemper, A. R., Kubik, M., Landefeld, C. S., Mangione, C. M., Silverstein, M., Simon, M. A., Tseng, C.-W., & Wong, J. B. (2019). Interventions to prevent perinatal depression: US Preventive Services Task Force recommendation statement. JAMA, 321(6), 580–587. https://doi.org/10.1001/jama.2019.0007
Danaher, B. G., Seeley, J. R., Silver, R. K., Tyler, M. S., Kim, J. J., La Porte, L. M., Cleveland, E., Smith, D. R., Milgrom, J., & Gau, J. M. (2023). Trial of a patient-directed eHealth program to ameliorate perinatal depression: the MomMoodBooster2 study. American Journal of Obstetrics and Gynecology, 228(4), 453.e1–453.e10. https://doi.org/10.1016/j.ajog.2022.09.027

Dennis CL, Chung-Lee L. (2006). Postpartum depression help-seeking barriers and maternal treatment preferences: a qualitative systematic review. Birth, 33(4), 323–331. https://doi.org/10.1111/j.1523-536X.2006.00130.x
Dergaa, I., Fekih-Romdhane, F., Hallit, S., Loch, A. A., Glenn, J. M., Fessi, M. S., Ben Aissa, M., Souissi, N., Guelmami, N., Swed, S., El Omri, A., Bragazzi, N. L., & Ben Saad, H. (2023). ChatGPT is not ready yet for use in providing mental health assessment and interventions. Frontiers in Psychiatry, 14, 1277756. https://doi.org/10.3389/fpsyt.2023.1277756
Dixit, S., Malladi, I., Shankar, S., & Shah, A. (2025). Evaluating the efficacy of MamaLift Plus digital therapeutic mobile app for postpartum depression (SuMMER): randomized placebo-controlled trial. Journal of Medical Internet Research, 27(1), e69050. https://doi.org/10.2196/69050
Elyoseph Z, Levkovich I. (2024). Comparing the perspectives of generative AI, mental health experts, and the general public on schizophrenia recovery: case vignette study. JMIR Mental Health, 11(1), e53043. https://doi.org/10.2196/53043
Farías-Antúnez S, Xavier MO, Santos IS. (2018). Effect of maternal postpartum depression on offspring’s growth. Journal of Affective Disorders, 228, 143–152. https://doi.org/10.1016/j.jad.2017.12.013
Feldman N, Perret S. (2023). Digital mental health for postpartum women: perils, pitfalls, and promise. NPJ Digital Medicine, 6(1), 11. https://doi.org/10.1038/s41746-023-00756-4
Fernández-Pichel M, Pichel JC, Losada DE. (2025). Evaluating search engines and large language models for answering health questions. NPJ Digital Medicine, 8(1), 153. https://doi.org/10.1038/s41746-025-01546-w
Finney Rutten, L. J., Blake, K. D., Greenberg-Worisek, A. J., Allen, S. V., Moser, R. P., & Hesse, B. W. (2019). Online health information seeking among US adults: measuring progress toward a Healthy People 2020 objective. Public Health Reports, 134(6), 617–625. https://doi.org/10.1177/0033354919874074
Flesch R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221–233. https://doi.org/10.1037/h0057532
Fonseca A, Gorayeb R, Canavarro MC. (2015). Women’s help-seeking behaviours for depressive symptoms during the perinatal period. Midwifery, 31(12), 1177–1185. https://doi.org/10.1016/j.midw.2015.09.002
Franco D’Souza, R., Amanullah, S., Mathew, M., & Surapaneni, K. M. (2023). Appraising the performance of ChatGPT in psychiatry using clinical case vignettes. Asian Journal of Psychiatry, 89, 103770. https://doi.org/10.1016/j.ajp.2023.103770
Franco, P., Olhaberry, M., Kelders, S., Muzard, A., & Cuijpers, P. (2024). Guided web app intervention for reducing symptoms of depression in postpartum women: feasibility randomized controlled trial. Internet Interventions, 36, 100744. https://doi.org/10.1016/j.invent.2024.100744
Ghanem, Y. K., Rouhi, A. D., Al-Houssan, A., Saleh, Z., Moccia, M. C., Joshi, H., Dumon, K. R., Hong, Y., Spitz, F., Joshi, A. R., & Kwiatt, M. (2024). Dr. Google to Dr. ChatGPT: assessing AI-generated medical information on appendicitis. Surgical Endoscopy, 38(5), 2887–2893. https://doi.org/10.1007/s00464-024-10739-5
Hahn-Holbrook J, Cornwell-Hinrichs T, Anaya I. (2018). Economic and health predictors of national postpartum depression prevalence: systematic review and meta-analysis. Frontiers in Psychiatry, 8, 248. https://doi.org/10.3389/fpsyt.2017.00248
Health NI. (2003). Clear and simple: developing effective print materials for low-literate readers. National Cancer Institute.
Incerti Parenti, S., Bartolucci, M. L., Biondi, E., Maglioni, A., Corazza, G., Gracco, A., & Alessandri-Bonetti, G. (2024). Online patient education in obstructive sleep apnea: ChatGPT versus Google search. Healthcare, 12(17), 1781. https://doi.org/10.3390/healthcare12171781
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12). https://doi.org/10.1145/3571730 Jindal P, Macdermid JC. (2017). Assessing reading levels of health information. Education for Health, 30(1), 84–88. https://doi.org/10.4103/1357-6283.210517
Kingston, D., Austin, M.-P., Heaman, M., McDonald, S., Lasiuk, G., Sword, W., Giallo, R., Hegadoren, K., Vermeyden, L., Veldhuyzen van Zanten, S., Kingston, J., Jarema, K., & Biringer, A. (2015). Barriers and facilitators of mental health screening in pregnancy. Journal of Affective Disorders, 186, 350–357. https://doi.org/10.1016/j.jad.2015.06.029
Lang, S., Vitale, J., Fekete, T. F., Haschtmann, D., Reitmeir, R., Ropelato, M., Puhakka, J., Galbusera, F., & Loibl, M. (2024). Are large language models valid tools for patient information on lumbar disc herniation? Brain and Spine, 4, 102804. https://doi.org/10.1016/j.bas.2024.102804
Lee, Y.-L., Tien, Y., Bai, Y.-S., Lin, C.-K., Yin, C.-S., Chung, C.-H., Sun, C.-A., Huang, S.-H., Huang, Y.-C., Chien, W.-C., Kang, C.-Y., & Wu, G.-J. (2022). Association of postpartum depression with maternal suicide. International Journal of Environmental Research and Public Health, 19(9), 5118. https://doi.org/10.3390/ijerph19095118
Levkovich I. (2025). Evaluating diagnostic accuracy and treatment efficacy in mental health. European Journal of Investigation in Health, Psychology and Education, 15(1), 9. https://doi.org/10.3390/ejihpe15010009
Lindahl V, Pearson JL, Colpe L. (2005). Prevalence of suicidality during pregnancy and postpartum. Archives of Women’s Mental Health, 8(2), 77–87. https://doi.org/10.1007/s00737-005-0080-1
Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., & Mian, A. (2025). A comprehensive overview of large language models. ACM Transactions on Intelligent Systems and Technology, 16(5). https://doi.org/10.1145/3744746
O’Connor, E., Rossom, R. C., Henninger, M., Groom, H. C., & Burda, B. U. (2016). Primary care screening and treatment of depression in pregnant and postpartum women. JAMA, 315(4), 388–406. https://doi.org/10.1001/jama.2015.18948
Omar, M., Soffer, S., Charney, A. W., Landi, I., Nadkarni, G. N., & Klang, E. (2024). Applications of large language models in psychiatry: systematic review. Frontiers in Psychiatry, 15, 1422807. https://doi.org/10.3389/fpsyt.2024.1422807
Omar, M., Sorin, V., Collins, J. D., Reich, D., Freeman, R., Gavin, N., Charney, A., Stump, L., Bragazzi, N. L., Nadkarni, G. N., & Klang, E. (2025). Large language models and adversarial hallucinations in clinical decision support. Communications Medicine, 5(1), 330. https://doi.org/10.1038/s43856-025-01021-3
Özer Aslan İ, Aslan MT. (2025). Benchmarking AI chatbots for maternal lactation support. Healthcare (Switzerland), 13(14), 1756. https://doi.org/10.3390/healthcare13141756
Sari F, Çelik Z, Mirza Y. (2025). ChatGPT-4 vs DeepSeek-V3: comparative study of response quality and readability. Clinical Rheumatology, 45(1), 187–195. https://doi.org/10.1007/s10067-025-07789-y
Savran A. (2026). Performance and reliability of large language models in hand surgery scenarios. Journal of Orthopaedic Surgery, 34(1). https://doi.org/10.1177/10225536261416605
Shao, X., Ruan, T., Ju, X., Sun, Y., & Cui, J. (2025). Evaluating AI chatbots’ responses to gynecomastia inquiries. Digital Health, 11. https://doi.org/10.1177/20552076251367645
Stewart, W. F., Ricci, J. A., Chee, E., Hahn, S. R., & Morganstein, D. (2003). Cost of lost productive work time among US workers with depression. JAMA, 289(23), 3135–3144. https://doi.org/10.1001/jama.289.23.3135
Stokel-Walker C, Van Noorden R. (2023). What ChatGPT and generative AI mean for science. Nature, 614(7947), 214–216. https://doi.org/10.1038/d41586-023-00340-6
Tan, K. S., Cervin, M., Leman, P., Nielsen, K., Vasantha Kumar, P., & Medvedev, O. (2025). AI meets psychology: large language models in psychotherapy contexts. Journal of Psychology and AI, 1(1). https://doi.org/10.1080/29974100.2025.2545258
Tan SSL, Goonawardene N. (2017). Internet health information seeking and patient-physician relationship: systematic review. Journal of Medical Internet Research, 19(1), e5729. https://doi.org/10.2196/jmir.5729
Travis, L. M., Prasad, S., Deiparine, S., Marmor, W. A., & Rizzo, M. G. (2025). Evaluation of ChatGPT as a patient information tool in orthopaedics. JAAOS Global Research and Reviews, 9(12). https://doi.org/10.5435/JAAOSGlobal-D-25-00341
Urizar GG, Muñoz RF. (2021). Role of maternal depression on child development. Child Psychiatry & Human Development, 53(3), 502–514. https://doi.org/10.1007/s10578-021-01138-1
Viveiros CJ, Darling EK. (2018). Barriers and facilitators of accessing perinatal mental health services. Midwifery, 65, 8–15. https://doi.org/10.1016/j.midw.2018.06.018
Walters KA, Hamrell MR. (2008). Consent forms and readability levels. Drug Information Journal, 42(4), 385–394. https://doi.org/10.1177/009286150804200411
Weis B. (2003). Health literacy: a manual for clinicians. Chicago: American Medical Association.
Zhou, M., Pan, Y., Zhang, Y., Song, X., & Zhou, Y. (2025). Evaluating AI-generated patient education materials for spinal surgeries. International Journal of Medical Informatics, 198, 105871. https://doi.org/10.1016/j.ijmedinf.2025.105871

Details

Primary Language

English

Subjects

Psychosocial Aspects of Childbirth and Perinatal Mental Health

Journal Section

Research Article

Authors

Hazan Tomar Bozkurt ^*
0000-0002-7060-0576
Türkiye

Abdullah Bozkurt
0000-0002-8359-6131
Türkiye

Publication Date

July 1, 2026

Submission Date

March 17, 2026

Acceptance Date

June 15, 2026

Published in Issue

Year 2026 Volume: 9 Number: 2

DOI

https://doi.org/10.62425/esbder.1911961

IZ

https://izlik.org/JA74YE29YC

Cite

RIS / Bibtex

APA

Tomar Bozkurt, H., & Bozkurt, A. (2026). Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis. Journal of Midwifery and Health Sciences, 9(2), 123-133. https://doi.org/10.62425/esbder.1911961

AMA

1.Tomar Bozkurt H, Bozkurt A. Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis. Journal of Midwifery and Health Sciences. 2026;9(2):123-133. doi:10.62425/esbder.1911961

Chicago

Tomar Bozkurt, Hazan, and Abdullah Bozkurt. 2026. “Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis”. Journal of Midwifery and Health Sciences 9 (2): 123-33. https://doi.org/10.62425/esbder.1911961.

EndNote

Tomar Bozkurt H, Bozkurt A (July 1, 2026) Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis. Journal of Midwifery and Health Sciences 9 2 123–133.

IEEE

[1]H. Tomar Bozkurt and A. Bozkurt, “Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis”, Journal of Midwifery and Health Sciences, vol. 9, no. 2, pp. 123–133, July 2026, doi: 10.62425/esbder.1911961.

ISNAD

Tomar Bozkurt, Hazan - Bozkurt, Abdullah. “Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis”. Journal of Midwifery and Health Sciences 9/2 (July 1, 2026): 123-133. https://doi.org/10.62425/esbder.1911961.

JAMA

1.Tomar Bozkurt H, Bozkurt A. Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis. Journal of Midwifery and Health Sciences. 2026;9:123–133.

MLA

Tomar Bozkurt, Hazan, and Abdullah Bozkurt. “Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis”. Journal of Midwifery and Health Sciences, vol. 9, no. 2, July 2026, pp. 123-3, doi:10.62425/esbder.1911961.

Vancouver

1.Hazan Tomar Bozkurt, Abdullah Bozkurt. Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis. Journal of Midwifery and Health Sciences. 2026 Jul. 1;9(2):123-3. doi:10.62425/esbder.1911961