An Extractive Text Summarization Model for Generating Extended Abstracts of Medical Papers in Turkish

Anıl Kuş; Çiğdem İnan Acı

doi:10.54047/bibted.1260697

TR EN

Tıp Makalelerinin Genişletilmiş Özetlerini Oluşturmak İçin Çıkarımsal Bir Türkçe Metin Özetleme Modeli

Abstract

Teknolojinin giderek büyümesi, dijital ortamdaki mevcut veri miktarının artmasına neden olmuştur. Bu durum, kullanıcıların bu geniş veri kümesi içinde aradıkları bilgiyi bulmalarını zorlaştırmakta ve zaman alıcı hale getirmektedir. Bu zorluğu hafifletmek için, klasik özetleme tekniklerine kıyasla daha verimli bir şekilde metinlerdeki ilgili bilgiye erişmenin bir yolu olarak otomatik metin özetleme sistemleri geliştirilmiştir. Bu çalışma, COVID-19 hakkında yazılmış Türkçe tıp makalelerinin genişletilmiş özetlerini çıkarmayı amaçlamaktadır. Bilimsel makalelerin hâli hazırda özetleri olmasına rağmen, daha kapsamlı özetlere de ihtiyaç duyulmaktadır. Türkçe dilinde COVID-19 ile ilgili akademik çalışmaların otomatik özetlemesi bildiğimiz kadarıyla daha önce yapılmamıştır. DergiPark'tan 84 adet Türkçe araştırma ve derleme makalesi alınarak bir veri kümesi oluşturulmuştur. Toplanan veri kümesinden, yaygın olarak kullanılan çıkarımsal yöntemlerden olan Terim Frekansı ve LexRank algoritmaları kullanılarak 2455 ve 1708 karakterlik genişletilmiş özetler elde edilmiştir. Metin özetleme modelinin performansı, Duyarlılık, Kesinlik ve F-skoru ölçütlerine göre değerlendirilmiş ve algoritmaların Türkçe için etkili olduğu gösterilmiştir. Çalışmanın sonuçları, literatürdeki önceki çalışmalarla benzer doğruluk oranları göstermiştir.

Keywords

Metin Özetleme, Genişletilmiş Özet, Tıp makalesi, COVID-19

An Extractive Text Summarization Model for Generating Extended Abstracts of Medical Papers in Turkish

Abstract

The rapid growth of technology has led to an increase in the amount of data available in the digital environment. This situation makes it difficult for users to find the information they are looking for within this vast dataset, making it time-consuming. To alleviate this difficulty, automatic text summarization systems have been developed as a more efficient way to access relevant information in texts compared to traditional summarization techniques. This study aims to extract extended summaries of Turkish medical papers written about COVID-19. Although scientific papers already have abstracts, more comprehensive summaries are still needed. To the best of our knowledge, automatic summarization of academic studies related to COVID-19 in the Turkish language has not been done before. A dataset was created by collecting 84 Turkish papers from DergiPark. Extended summaries of 2455 and 1708 characters were obtained using widely used extractive methods such as Term Frequency and LexRank algorithms, respectively. The performance of the text summarization model was evaluated based on Recall, Precision, and F-score criteria, and the algorithms were shown to be effective for Turkish. The results of the study showed similar accuracy rates to previous studies in the literature.

Keywords

Text Summarization, Extended Abstract, Medical paper, COVID-19

References

Akulker, E. (2019). Extractive Text Summarization For Turkish Using Tf-Idf And Pagerank Algorithms (Doctoral dissertation). The Graduate School Of Natural And Applıed Scıences Of Atılım Unıversity. Turkey.
Bal, S. and Sora Gunal, E. (2021). A New Model On Automatic Text Summarization For Turkish. Eskisehir Technical University Journal Of Science And Technology A- Applied Sciences And Engineering, 22(2), 189–198.
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with python. O’Reilly Media.
Celik Ozkan, A. E. (2021). Structured Abstract Extraction System for Turkish Academic Publications (Doctoral dissertation). Hacettepe University, Turkey.
Demirci F., Karabudak, E. and Ilgen, B. (2017). Multi-Document Summarization for Turkish News. International Artificial Intelligence and Data Processing Symposium (IDAP), pp. 1-5.
Hatipoglu, A. and Omurca, S.I. (2016). A Turkish Wikipedia Text Summarization System for Mobile Devices. I.J. Information Technology and Computer Science, vol.1, pp. 1–10.
Horasan, F. And Bilen, B. (2020). Extractive Text Summarization Systems For News Texts. International Journal Of Applied Mathematics Electronics and Computers, 8(4), 179-184.
Jurafsky, D., and Martin, J. H. (2008). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (2nd ed.). Prentice Hall.
Karayigit, H., İnan Aci, C. and Akdagli, A. (2021). Detecting abusive Instagram comments in Turkish using convolutional Neural network and machine learning methods. Expert Systems with Applications, 174(March).
Kaynar, O., Emre Isik, Y. and Gormez, Y. (2017). Genetic Algorithm Based Sentence Extraction for Automatic Text Summarization. Journal of Management Information Systems, 3 (2) , 62-75.

Details

Primary Language

English

Subjects

Artificial Intelligence

Journal Section

Research Article

Authors

Anıl Kuş
0000-0002-5964-3727
Türkiye

Çiğdem İnan Acı ^*
0000-0002-0028-9890
Türkiye

Early Pub Date

May 31, 2023

Publication Date

August 9, 2023

Submission Date

March 6, 2023

Acceptance Date

May 26, 2023

Published in Issue

Year 2023 Volume: 4 Number: 1

DOI

https://doi.org/10.54047/bibted.1260697

IZ

https://izlik.org/JA32GW46EY

APA

Kuş, A., & Acı, Ç. İ. (2023). An Extractive Text Summarization Model for Generating Extended Abstracts of Medical Papers in Turkish. Bilgisayar Bilimleri Ve Teknolojileri Dergisi, 4(1), 19-26. https://doi.org/10.54047/bibted.1260697

AMA

1.Kuş A, Acı Çİ. An Extractive Text Summarization Model for Generating Extended Abstracts of Medical Papers in Turkish. BIBTED. 2023;4(1):19-26. doi:10.54047/bibted.1260697

Chicago

Kuş, Anıl, and Çiğdem İnan Acı. 2023. “An Extractive Text Summarization Model for Generating Extended Abstracts of Medical Papers in Turkish”. Bilgisayar Bilimleri Ve Teknolojileri Dergisi 4 (1): 19-26. https://doi.org/10.54047/bibted.1260697.

EndNote

Kuş A, Acı Çİ (August 1, 2023) An Extractive Text Summarization Model for Generating Extended Abstracts of Medical Papers in Turkish. Bilgisayar Bilimleri ve Teknolojileri Dergisi 4 1 19–26.

IEEE

[1]A. Kuş and Ç. İ. Acı, “An Extractive Text Summarization Model for Generating Extended Abstracts of Medical Papers in Turkish”, BIBTED, vol. 4, no. 1, pp. 19–26, Aug. 2023, doi: 10.54047/bibted.1260697.

ISNAD

Kuş, Anıl - Acı, Çiğdem İnan. “An Extractive Text Summarization Model for Generating Extended Abstracts of Medical Papers in Turkish”. Bilgisayar Bilimleri ve Teknolojileri Dergisi 4/1 (August 1, 2023): 19-26. https://doi.org/10.54047/bibted.1260697.

JAMA

1.Kuş A, Acı Çİ. An Extractive Text Summarization Model for Generating Extended Abstracts of Medical Papers in Turkish. BIBTED. 2023;4:19–26.

MLA

Kuş, Anıl, and Çiğdem İnan Acı. “An Extractive Text Summarization Model for Generating Extended Abstracts of Medical Papers in Turkish”. Bilgisayar Bilimleri Ve Teknolojileri Dergisi, vol. 4, no. 1, Aug. 2023, pp. 19-26, doi:10.54047/bibted.1260697.

Vancouver

1.Anıl Kuş, Çiğdem İnan Acı. An Extractive Text Summarization Model for Generating Extended Abstracts of Medical Papers in Turkish. BIBTED. 2023 Aug. 1;4(1):19-26. doi:10.54047/bibted.1260697

Cited By

Graf Teorisi ve Malatya Merkezilik Algoritmasına Dayalı Haber Metinlerinin Özetlemesi

Bilişim Teknolojileri Dergisi

https://doi.org/10.17671/gazibtd.1463107

https://doi.org/