Performance Evaluation of the Extractive Methods in Automatic Text Summarization Using Medical Papers

Anıl Kuş; Çiğdem İnan Acı

TR EN

Çıkarımsal Otomatik Metin Özetleme Yöntemlerinin Tıp Makaleleri Kullanılarak Performans Değerlendirmesi

Abstract

Teknolojinin hızlı ilerlemesi, mevcut dijital veri miktarında büyük bir artışa neden olmuştur. Bu durum, bu kapsamlı veri koleksiyonu içinde belirli bilgileri bulmada yardıma ihtiyaç duyan kullanıcılar için bir sorun yaratır ve zaman alıcı bir süreçle sonuçlanır. Bu sorunu ele almak ve kullanıcıların bilgilere daha etkili bir şekilde erişmelerini sağlamak amacıyla Otomatik Metin Özetleme sistemleri, geleneksel özetleme tekniklerine bir alternatif olarak geliştirilmiştir. Bilindiği üzere, sağlık bilimleri alanındaki araştırmacılar, yoğun programları nedeniyle güncel literatürü takip etmekte zorlanmaktadır. Bu çalışmanın amacı, sağlık bilimleri alanındaki Türkçe bilimsel makalelerin kapsamlı özetlerini oluşturmaktır. Bilimsel makalelerde hâli hazırda özetler bulunmasına rağmen, daha detaylı özetlere ihtiyaç duyulmaktadır. Bildiğimiz kadarıyla, Türkçe yazılan akademik tıp makalelerini otomatik olarak özetleyen bir çalışma daha önce yapılmamıştır. Bu amaçla, DergiPark'tan 105 adet Türkçe makale içeren bir veri kümesi toplanmıştır. Çıkarımsal metin özetleme algoritmaları olarak, bu alanda sıkça kullanılan Terim Frekansı, Terim Frekansı-Tersine Doküman Frekansı, Gizli Anlamsal Analiz, TextRank ve Gizli Dirichlet Ayırımı algoritmaları seçilmiştir. Metin özetleme modellerinin başarımı Duyarlılık, Kesinlik ve F-skor metrikleri kullanılarak değerlendirilmiş ve algoritmalar tatmin edici sonuçlar vermiştir.

Keywords

Performance Evaluation of the Extractive Methods in Automatic Text Summarization Using Medical Papers

Abstract

The rapid advancement of technology has resulted in a surge in the volume of digital data available. This situation creates a problem for users who need assistance locating specific information inside this massive data collection, resulting in a time-consuming process. Automatic Text Summarizing systems have been developed as a more effective solution to conventional summary techniques to address this issue and improve users' access to relevant information. It is well known that, because of their busy schedules, researchers in the field of health sciences find it challenging to keep up with the most recent literature. The goal of this study is to generate comprehensive summaries of Turkish-language scientific papers in the field of health sciences. Although abstracts are already present in scientific papers, more thorough summaries are still required. To the best of our knowledge, no previous attempt has been made to automatically summarize academic papers on health in the Turkish language. For this, a dataset of 105 Turkish papers from DergiPark was collected. Term Frequency, Term Frequency-Inverse Document Frequency, Latent Semantic Analysis, TextRank, and Latent Dirichlet Allocation algorithms were chosen as extractive text summarization methods due to their frequent usage in this field. The performance of the text summarization models was evaluated using Recall, Precision, and F-score metrics, and the algorithms gave satisfying results for Turkish.

Keywords

automatic text summarization; extractive method; scientific papers; health sciences

References

[1] J. P. Andersen, M. W. Nielsen, N. L. Simone, R. E. Lewiss, and R. Jagsi, “COVID-19 medical papers have fewer women first authors than expected,” eLife, vol. 9, June 2020. doi:10.7554/elife.58807
[2] A. See, P. J. Liu, and C. D. Manning, “Get to the point: Summarization with pointer-generator networks,” arxiv.org, April 2017 [Online]. Available: arXiv, https://arxiv.org/abs/1704.04368. [Accessed: 15 Dec. 2023].
[3} S. Narayan, S. B. Cohen, and M. Lapata, “Ranking sentences for extractive summarization with reinforcement learning,” arxiv.org, April 2018 [Online]. Available: arXiv, https://arxiv.org/abs/1802.08636. [Accessed: 15 Dec. 2023].
[4] E. Erdağı, “Extractive based automatic text summarization in Turkish texts,” Ph.D. dissertation, Maltepe University, İstanbul, Turkey, 2023.
[5] Ö. E. Gündoğdu and N. Duru, “Turkish text summarization and methods,” in Proc. of 18th Academic Computing Conference -AB 2016, Aydın, Turkey, January 30-February 5, 2016, pp. 69–76.
[6] DergiPark, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/. [Accessed: 16 Dec. 2023].
[7] J. Beel, B. Gipp, S. Langer, and C. Breitinger, “Research-paper recommender systems: A literature survey,” International Journal on Digital Libraries, vol. 17, no. 4, pp. 305–338, July 2015. doi:10.1007/s00799-015-0156-0
[8] A. Güran, "Automatic text summarization system," Ph.D. dissertation, Yıldız Technical University, Istanbul, Turkey, 2013.

[9] O. Kaynar, Y. Işık, Y. Görmez, and F. Demirkoparan “Genetic algorithm based sentence extraction for automatic text summarization,” Journal of Management Information Systems, vol. 3, no. 2, pp. 62-75, December 2017.
[10] H. Torun and A. B. Inner, "Detecting similar news by summarizing Turkish news," in Proc. of 2018 26th Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey, 2018, pp. 1-4. doi: 10.1109/SIU.2018.8404826
[11] A. A. Karcioğlu and A. C. Yaşa, "Automatic summary extraction in texts using genetic algorithms,"in Proc. of 2020 28th Signal Processing and Communications Applications Conference (SIU), Gaziantep, Turkey, 2020, pp. 1-4. doi: 10.1109/SIU49456.2020.9302205
[12] Ankara Journal of Health Services Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/pub/ashd. [Accessed: 16 Dec. 2023].
[13] Çukurova Medical Journal Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/pub/cumj. [Accessed: 16 Dec. 2023].
[14] Ege Journal of Medicine Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/pub/egetbd. [Accessed: 16 Dec. 2023].
[15] Gazi Journal of Health Sciences Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/pub/gsbdergi. [Accessed: 16 Dec. 2023].
[16]Journal of Hacettepe University Faculty of Health Sciences Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik, https://dergipark.org.tr/tr/pub/husbfd. [Accessed: 16 Dec. 2023].
[17] Journal of Istanbul Faculty of Medicine Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik https://dergipark.org.tr/tr/pub/iuitfd. [Accessed: 16 Dec. 2023].
[18] Mersin University Journal of Health Sciences Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik https://dergipark.org.tr/tr/pub/mersinsbd. [Accessed: 16 Dec. 2023].
[19] Journal of Samsun Health Sciences Homepage, DergiPark Akademik [Online]. Available: DergiPark Akademik https://dergipark.org.tr/tr/pub/jshs. [Accessed: 16 Dec. 2023].
[20] S. Bal, “New methods for improving the performance of extractive Turkish text summarization,” Ph.D. dissertation, Eskisehir Osmangazi University, Eskisehir, Turkey, 2022.
[21] F. Horasan F and B. Bilen “Extractive text summarization system for news texts,” International Journal of Applied Mathematics Electronics and Computers, vol. 8, no. 4, pp. 179-184, December 2020. doi:10.18100/ijamec.800905
[22] N. Abdi Omar, “A user based comparative study of automatic text summarization,” M.Sc. Dissertation, Institute of Science and Technology, Kocaeli University, Kocaeli, Turkey, 2018.
[23] E. Akulker, “Extractive text summarization for Turkish using TF-IDF and pagerank algorithms,” Ph.D. dissertation, The Graduate School of Natural and Applied Sciences of Atılım University, İstanbul, Turkey, 2019.
[24] M. G. Ozsoy, F. N. Alpaslan, and I. Cicekli, “Text summarization using latent semantic analysis,” Journal of Information Science, vol. 37, no. 4, pp. 405–417, June 2011. doi:10.1177/0165551511408848
[25] V. Gulati, D. Kumar, D. E. Popescu, and J. D. Hemanth, “Extractive article summarization using integrated TextRank and BM25+ algorithm,” Electronics, vol. 12, no. 2, p. 372, January 2023. doi:10.3390/electronics12020372
[26] M. Kar, S. Nunes, and C. Ribeiro, “Summarization of changes in dynamic text collections using Latent Dirichlet Allocation model,” Information Processing & Management, vol. 51, no. 6, pp. 809–833, November 2015. doi:10.1016/j.ipm.2015.06.002
[27] S.W. Kim and J.M. Gil, “Research paper classification systems based on TF-IDF and LDA schemes,” Human-Centric Computing and Information Sciences, vol. 9, no. 1, August 2019. doi:10.1186/s13673-019-0192-7
[28] M. Zhang, C. Li, M. Wan, X. Zhang, and Q. Zhao, “ROUGE-SEM: Better evaluation of summarization using ROUGE combined with semantics,” Expert Systems with Applications, vol. 237, p. 121364, March 2024. doi:10.1016/j.eswa.2023.121364
[29] M. D. Akın and A. A. Akın, “Türk dilleri için açık kaynaklı doğal dil işleme kütüphanesi: ZEMBEREK,” EMO Elektrik Mühendisliği Dergisi, vol. 431, no. 1, pp. 38–44, August 2007.
[30] B. Srinivasa-Desikan, Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras. Packt Publishing Ltd, 2018, pp. 250-287.
[31] R. Rani and D. K. Lobiyal, “An extractive text summarization approach using tagged-LDA based topic modeling,” Multimedia Tools and Applications, vol. 80, pp. 3275-3305, January 2021. doi:10.1007/s11042-020-09549-3

Details

Primary Language

English

Subjects

Computer Software, Software Engineering (Other)

Journal Section

Research Article

Authors

Anıl Kuş ^*
0000-0002-5964-3727
Türkiye

Çiğdem İnan Acı
0000-0002-0028-9890
Türkiye

Publication Date

December 31, 2023

Submission Date

November 16, 2023

Acceptance Date

December 16, 2023

Published in Issue

Year 2023 Volume: 9 Number: 4

IZ

https://izlik.org/JA95MW86HM

Cite

RIS / Bibtex

APA

Kuş, A., & Acı, Ç. İ. (2023). Performance Evaluation of the Extractive Methods in Automatic Text Summarization Using Medical Papers. Gazi Journal of Engineering Sciences, 9(4), 14-22. https://izlik.org/JA95MW86HM

AMA

1.Kuş A, Acı Çİ. Performance Evaluation of the Extractive Methods in Automatic Text Summarization Using Medical Papers. GJES. 2023;9(4):14-22. https://izlik.org/JA95MW86HM

Chicago

Kuş, Anıl, and Çiğdem İnan Acı. 2023. “Performance Evaluation of the Extractive Methods in Automatic Text Summarization Using Medical Papers”. Gazi Journal of Engineering Sciences 9 (4): 14-22. https://izlik.org/JA95MW86HM.

EndNote

Kuş A, Acı Çİ (December 1, 2023) Performance Evaluation of the Extractive Methods in Automatic Text Summarization Using Medical Papers. Gazi Journal of Engineering Sciences 9 4 14–22.

IEEE

[1]A. Kuş and Ç. İ. Acı, “Performance Evaluation of the Extractive Methods in Automatic Text Summarization Using Medical Papers”, GJES, vol. 9, no. 4, pp. 14–22, Dec. 2023, [Online]. Available: https://izlik.org/JA95MW86HM

ISNAD

Kuş, Anıl - Acı, Çiğdem İnan. “Performance Evaluation of the Extractive Methods in Automatic Text Summarization Using Medical Papers”. Gazi Journal of Engineering Sciences 9/4 (December 1, 2023): 14-22. https://izlik.org/JA95MW86HM.

JAMA

1.Kuş A, Acı Çİ. Performance Evaluation of the Extractive Methods in Automatic Text Summarization Using Medical Papers. GJES. 2023;9:14–22.

MLA

Kuş, Anıl, and Çiğdem İnan Acı. “Performance Evaluation of the Extractive Methods in Automatic Text Summarization Using Medical Papers”. Gazi Journal of Engineering Sciences, vol. 9, no. 4, Dec. 2023, pp. 14-22, https://izlik.org/JA95MW86HM.

Vancouver

1.Anıl Kuş, Çiğdem İnan Acı. Performance Evaluation of the Extractive Methods in Automatic Text Summarization Using Medical Papers. GJES [Internet]. 2023 Dec. 1;9(4):14-22. Available from: https://izlik.org/JA95MW86HM