Research Article

Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis

Volume: 9 Number: 2 July 1, 2026
TR EN

Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis

Abstract

Objective: This study aimed to systematically evaluate and compare the information quality, scientific reliability, and readability of the responses provided by four widely used large language models (LLMs)—ChatGPT, Google Gemini, DeepSeek, and Claude—to frequently asked questions regarding postpartum depression (PPD).

Methods: This descriptive cross-sectional study assessed 40 LLM-generated responses concerning PPD. Information quality was assessed using the DISCERN tool, scientific reliability was evaluated using a 5-point Likert scale, and readability was measured using the Flesch Reading Ease Score (FRES) and Flesch–Kincaid Grade Level (FKGL).

Results: All four models met the adequacy thresholds for information quality and scientific reliability. DeepSeek achieved the highest mean DISCERN score, which was significantly higher than that of Claude. No statistically significant difference was observed in scientific reliability scores across the four models. Regarding readability, Claude produced significantly more complex texts than ChatGPT and Google Gemini based on FKGL scores. Mean FRES values for all models fell within the "difficult" to "fairly difficult" range, with no significant between-group difference.

Conclusion: All four LLMs demonstrated adequate information quality and scientific reliability regarding PPD, with DeepSeek exhibiting the highest information quality. However, substantial deficiencies were identified in readability across all models, with Claude producing the most linguistically complex outputs. These findings suggest that while LLMs show promising potential as complementary health information sources for PPD, their outputs require simplification to meet recommended readability standards for patient education, particularly for postpartum populations who may experience cognitive and emotional barriers to comprehending complex health information.

Keywords

References

  1. Agarwal, V., Jin, Y., Chandra, M., De Choudhury, M., Kumar, S., & Sastry, N. (2025). MedHalu: hallucinations in responses to healthcare queries by large language models.
  2. Alamleh, S., Mavedatnia, D., Francis, G., Le, T., Davies, J., Lin, V., & Lee, J. J. W. (2025). Readability, reliability, and quality analysis of internet-based patient education materials and large language models on Meniere’s disease. Journal of Otolaryngology Head & Neck Surgery, 54. https://doi.org/10.1177/19160216251360651
  3. Arakawa, Y., Haseda, M., Inoue, K., Nishioka, D., Kino, S., Nishi, D., Hashimoto, H., & Kondo, N. (2023). Effectiveness of mHealth consultation services for preventing postpartum depressive symptoms: a randomized clinical trial. BMC Medicine, 21(1), 221. https://doi.org/10.1186/s12916-023-02918-3
  4. Behers, B. J., Vargas, I. A., Behers, B. M., Rosario, M. A., Wojtas, C. N., Deevers, A. C., & Hamad, K. M. (2024). Assessing the readability of patient education materials on cardiac catheterization from artificial intelligence chatbots: an observational cross-sectional study. Cureus, 16(7). https://doi.org/10.7759/cureus.63865
  5. Charnock, D., Shepperd, S., Needham, G., & Gann, R. (1999). DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. Journal of Epidemiology and Community Health, 53(2), 105–111. https://doi.org/10.1136/jech.53.2.105
  6. Cherrez-Ojeda, I., Zuberbier, T., Rodas-Valero, G., Sanchez, J., Rudenko, M., Dramburg, S., Demoly, P., Caimmi, D., Gómez, R. M., Ramon, G. D., Fouda, G. E., Quimby, K. R., Chong-Neto, H., Calderon Llosa, O., Larco, J. I., Monge Ortega, O. P., Faytong-Haro, M., Pfaar, O., Bousquet, J., & Robles-Velasco, K. (2025). Evaluation of the quality and reliability of ChatGPT-4 responses on allergen immunotherapy using validated instruments. Clinical and Translational Allergy, 15(12), e70130. https://doi.org/10.1002/clt2.70130
  7. Curry, S. J., Krist, A. H., Owens, D. K., Barry, M. J., Caughey, A. B., Davidson, K. W., Doubeni, C. A., Epling, J. W., Jr., Grossman, D. C., Kemper, A. R., Kubik, M., Landefeld, C. S., Mangione, C. M., Silverstein, M., Simon, M. A., Tseng, C.-W., & Wong, J. B. (2019). Interventions to prevent perinatal depression: US Preventive Services Task Force recommendation statement. JAMA, 321(6), 580–587. https://doi.org/10.1001/jama.2019.0007
  8. Danaher, B. G., Seeley, J. R., Silver, R. K., Tyler, M. S., Kim, J. J., La Porte, L. M., Cleveland, E., Smith, D. R., Milgrom, J., & Gau, J. M. (2023). Trial of a patient-directed eHealth program to ameliorate perinatal depression: the MomMoodBooster2 study. American Journal of Obstetrics and Gynecology, 228(4), 453.e1–453.e10. https://doi.org/10.1016/j.ajog.2022.09.027

Details

Primary Language

English

Subjects

Psychosocial Aspects of Childbirth and Perinatal Mental Health

Journal Section

Research Article

Publication Date

July 1, 2026

Submission Date

March 17, 2026

Acceptance Date

June 15, 2026

Published in Issue

Year 2026 Volume: 9 Number: 2

APA
Tomar Bozkurt, H., & Bozkurt, A. (2026). Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis. Journal of Midwifery and Health Sciences, 9(2), 123-133. https://doi.org/10.62425/esbder.1911961
AMA
1.Tomar Bozkurt H, Bozkurt A. Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis. Journal of Midwifery and Health Sciences. 2026;9(2):123-133. doi:10.62425/esbder.1911961
Chicago
Tomar Bozkurt, Hazan, and Abdullah Bozkurt. 2026. “Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis”. Journal of Midwifery and Health Sciences 9 (2): 123-33. https://doi.org/10.62425/esbder.1911961.
EndNote
Tomar Bozkurt H, Bozkurt A (July 1, 2026) Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis. Journal of Midwifery and Health Sciences 9 2 123–133.
IEEE
[1]H. Tomar Bozkurt and A. Bozkurt, “Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis”, Journal of Midwifery and Health Sciences, vol. 9, no. 2, pp. 123–133, July 2026, doi: 10.62425/esbder.1911961.
ISNAD
Tomar Bozkurt, Hazan - Bozkurt, Abdullah. “Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis”. Journal of Midwifery and Health Sciences 9/2 (July 1, 2026): 123-133. https://doi.org/10.62425/esbder.1911961.
JAMA
1.Tomar Bozkurt H, Bozkurt A. Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis. Journal of Midwifery and Health Sciences. 2026;9:123–133.
MLA
Tomar Bozkurt, Hazan, and Abdullah Bozkurt. “Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis”. Journal of Midwifery and Health Sciences, vol. 9, no. 2, July 2026, pp. 123-3, doi:10.62425/esbder.1911961.
Vancouver
1.Hazan Tomar Bozkurt, Abdullah Bozkurt. Can AI-Based Large Language Models Provide Reliable Information on Postpartum Depression? A Systematic Content Analysis. Journal of Midwifery and Health Sciences. 2026 Jul. 1;9(2):123-3. doi:10.62425/esbder.1911961

Content of this journal is licensed under a Creative Commons Attribution NonCommercial 4.0 International License

29929download?token=eyJhdXRoX3JvbGVzIjpbXSwiZW5kcG9pbnQiOiJqb3VybmFsIiwib3JpZ2luYWxuYW1lIjoiSk1IUy5wbmciLCJwYXRoIjoiYzQ1Zi8wN2U4L2E4OGEvNmEyMDA3NjMyMWU0MjMuNjcxMTk5NTUucG5nIiwiZXhwIjoxNzgwNDg3NTM5LCJub25jZSI6IjYyODJmNTA1MDJiMGE2ZWYwYTcyNGE1MzQ0MjIxZDI2In0.TS9bf-iM0_LZ57xqmnZUIqxcywEVCxnCBIInRK4yWnI