Machine Learning Based Identification of LLM Generated Scientific Research Article Abstracts
Abstract
Heart disease is one of the leading causes of death worldwide, making early detection and diagnosis essential for effective treatment. With advancements in machine learning (ML) and artificial intelligence (AI), these technologies are being increasingly applied in the medical field, particularly for detecting and predicting heart disease. As AI systems become more complex, it becomes important to distinguish between abstracts generated by AI algorithms and those prepared by human experts. This study aims to develop and assess ML approaches to distinguish between human-written and AI-generated (ChatGPT and NLTK) heart disease abstracts. Using a dataset of 15,000 abstracts (5,000 written by humans, 5,000 reworded by ChatGPT, and 5,000 generated using NLTK), various Natural Language Processing (NLP) techniques, such as tokenization, stop word removal, stemming and lemmatization were applied. The text data was transformed into numerical form using TF-IDF vectorization. Different ML models, including K-nearest neighbors (KNN), support vector machines (SVMs), logistic regression, random forest, decision tree were trained and tested for their classification accuracy. This study highlights the significant potential of ML techniques in ensuring transparency and reliability in AI-driven medical decision-making, especially in the area of heart disease diagnosis.
Keywords
References
- Internet: World diseases(CVDs), Health Organization, Cardiovascular https://www.who.int/news-room/fact sheets/detail/cardiovascular-diseases-(cvds), 22.09.2024.
- LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
- Bhatt, A. (2020). Healthcare predictive analytics using machine learning and deep learning techniques: a survey. Journal of Electrical Systems and Information Technology, 7(2), 13–19.
- Russell, S., & Norvig, P. (2010). Artificial Intelligence: A Modern Approach (3rd ed.). Prentice Hall, Upper Saddle River, NJ.
- Mitchell, T. M. (1997). Machine Learning. McGraw-Hill, New York.
- Jurafsky, D., & Martin, J. H. (2021). Speech and Language Processing (3rd ed.). Pearson, San Francisco, CA.
- Krittanawong, C., Zhang, H., Wang, Z., et al. (2017). Artificial Intelligence in Precision Cardiovascular Medicine. Journal of the American College of Cardiology, 69(21), 2657–2664. https://doi.org/10.1016/j.jacc.2017.03.571
- Ouyang, D., He, B., Ghorbani, A., Yuan, N., Ebinger, J., Langlotz, P., Heidenreich, P. A., Harrington, R. A., Liang, D. H., Ashley, E. A., & Zou, J. Y. (2020). Video-based AI for beat-to-beat assessment of cardiac function. Nature, 580(7802), 252–256. https://doi.org/10.1038/s41586-020-2145-8
Details
Primary Language
English
Subjects
Natural Language Processing
Journal Section
Research Article
Early Pub Date
September 1, 2025
Publication Date
December 26, 2025
Submission Date
June 30, 2025
Acceptance Date
August 11, 2025
Published in Issue
Year 2025 Volume: 8 Number: 2
APA
Baştürk, B., & Onan, A. (2025). Machine Learning Based Identification of LLM Generated Scientific Research Article Abstracts. Scientific Journal of Mehmet Akif Ersoy University, 8(2), 57-70. https://doi.org/10.70030/sjmakeu.1730246
AMA
1.Baştürk B, Onan A. Machine Learning Based Identification of LLM Generated Scientific Research Article Abstracts. Techno-Science. 2025;8(2):57-70. doi:10.70030/sjmakeu.1730246
Chicago
Baştürk, Burcu, and Aytuğ Onan. 2025. “Machine Learning Based Identification of LLM Generated Scientific Research Article Abstracts”. Scientific Journal of Mehmet Akif Ersoy University 8 (2): 57-70. https://doi.org/10.70030/sjmakeu.1730246.
EndNote
Baştürk B, Onan A (December 1, 2025) Machine Learning Based Identification of LLM Generated Scientific Research Article Abstracts. Scientific Journal of Mehmet Akif Ersoy University 8 2 57–70.
IEEE
[1]B. Baştürk and A. Onan, “Machine Learning Based Identification of LLM Generated Scientific Research Article Abstracts”, Techno-Science, vol. 8, no. 2, pp. 57–70, Dec. 2025, doi: 10.70030/sjmakeu.1730246.
ISNAD
Baştürk, Burcu - Onan, Aytuğ. “Machine Learning Based Identification of LLM Generated Scientific Research Article Abstracts”. Scientific Journal of Mehmet Akif Ersoy University 8/2 (December 1, 2025): 57-70. https://doi.org/10.70030/sjmakeu.1730246.
JAMA
1.Baştürk B, Onan A. Machine Learning Based Identification of LLM Generated Scientific Research Article Abstracts. Techno-Science. 2025;8:57–70.
MLA
Baştürk, Burcu, and Aytuğ Onan. “Machine Learning Based Identification of LLM Generated Scientific Research Article Abstracts”. Scientific Journal of Mehmet Akif Ersoy University, vol. 8, no. 2, Dec. 2025, pp. 57-70, doi:10.70030/sjmakeu.1730246.
Vancouver
1.Burcu Baştürk, Aytuğ Onan. Machine Learning Based Identification of LLM Generated Scientific Research Article Abstracts. Techno-Science. 2025 Dec. 1;8(2):57-70. doi:10.70030/sjmakeu.1730246