A Multi-Metric Model for analyzing and comparing extractive text summarization approaches and algorithms on scientific papers

Mehmet Ali Dursun; Soydan Serttaş

doi:10.24012/dumf.1376978

TR EN

Bilimsel makaleler üzerinde çıkarımsal metin özetleme yaklaşımlarını ve algoritmalarını analiz etmek ve karşılaştırmak için çok ölçütlü bir model

Abstract

Veri ve bilginin giderek çoğaldığı günümüz dünyasında, metin özetleme ve teknolojileri, büyük miktarlardaki metin verilerinin daha erişilebilir ve anlamlı hale getirilmesinde kritik bir rol oynamaktadır. İş dünyasında, haber endüstrisinde, akademik araştırmalarda ve diğer birçok alanda metin özetleme, hızlı kararlar alınmasına, bilgiye daha hızlı erişilmesine ve kaynakların daha etkin bir şekilde yönetilmesine yardımcı olmaktadır. Ayrıca, bu teknolojileri daha da iyileştirmek ve metinlerin daha iyi özetlenmesini sağlamak için yeni yöntemler ve algoritmalar geliştirmek amacıyla metin özetleme araştırmaları yürütülmektedir. Bu nedenle, metin özetleme ve bu alandaki araştırmalar bilgi çağında büyük önem taşımaktadır. Bu çalışmada, metin özetleme için farklı algoritmalara uygulanabilecek yeni bir işletim modeli önerilmiş ve değerlendirilmiştir. Altı yaklaşımı (istatistiksel, grafik tabanlı, içerik tabanlı, işaretçi tabanlı, konum tabanlı ve kullanıcı odaklı) kapsayan on altı özetleme algoritması uygulanmış ve 50 farklı tam metin makale veri kümesi üzerinde test edilmiştir. Oluşturulan özetler ile orijinal özetler arasındaki benzerliği değerlendirmek için dört değerlendirme kriteri (BLEU, Rouge-N, Rouge-L, METEOR) kullanılmıştır. Her bir yaklaşımdaki algoritmaların performansının ortalaması alınmış ve genel olarak en iyi performans gösteren algoritma seçilmiştir. Bu en iyi algoritma, özetlenen metin içindeki anahtar konuları ve anahtar kelimeleri belirlemek için Konu Modelleme ve Anahtar Kelime Çıkarma yoluyla daha fazla analize tabi tutulmuştur. Önerilen model, en uygun özetleme yaklaşımını belirlemek için veri kümeleri ve değerlendirme metrikleri arasında özetleme algoritmaları geliştirmek ve kapsamlı bir şekilde test etmek için standartlaştırılmış bir iş akışı sağlar. Bu çalışma, modelin çeşitli algoritma türleri ve metin kaynakları üzerindeki etkinliğini göstermektedir.

Keywords

A Multi-Metric Model for analyzing and comparing extractive text summarization approaches and algorithms on scientific papers

Abstract

In today's world, where data and information are increasingly proliferating, text summarization and technologies play a critical role in making large amounts of text data more accessible and meaningful. In business, the news industry, academic research, and many other fields, text summarization helps make quick decisions, access information faster, and manage resources more effectively. Additionally, text summarization research is conducted to further improve these technologies and develop new methods and algorithms to provide better summarization of texts. Therefore, text summarization and research in this field are of great importance in the information age. In this study, a new operating model for text summarization that can be applied to different algorithms is proposed and evaluated. Sixteen summarization algorithms covering six approaches (statistical, graph-based, content-based, pointer-based, position-based, and user-oriented) were implemented and tested on 50 different full-text article datasets. Four evaluation criteria (BLEU, Rouge-N, Rouge-L, METEOR) were used to assess the similarity between the generated summaries and the original summaries. The performance of the algorithms within each approach was averaged and the overall best-performing algorithm was selected. This best algorithm was subjected to further analysis through Topic Modelling and Keyword Extraction to identify key topics and keywords within the summarised text. The proposed model provides a standardized workflow for developing and thoroughly testing summarization algorithms across datasets and evaluation metrics to determine the most appropriate summarization approach. This study demonstrates the effectiveness of the model on a variety of algorithm types and text sources.

Keywords

References

[1] W. S. El-Kassas, C. R. Salama, A. A. Rafea, and H. K. Mohamed, “Automatic text summarization: A comprehensive survey,” Expert Syst Appl, vol. 165, p. 113679, 2021.
[2] A. Dash, A. Shandilya, A. Biswas, K. Ghosh, S. Ghosh, and A. Chakraborty, “Summarizing user-generated textual content: Motivation and methods for fairness in algorithmic summaries,” Proc ACM Hum Comput Interact, vol. 3, no. CSCW, pp. 1–28, 2019.
[3] N. Alami, M. El Mallahi, H. Amakdouf, and H. Qjidaa, “Hybrid method for text summarization based on statistical and semantic treatment,” Multimed Tools Appl, vol. 80, pp. 19567–19600, 2021.
[4] A. Kanapala, S. Pal, and R. Pamula, “Text summarization from legal documents: a survey,” Artif Intell Rev, vol. 51, pp. 371–402, 2019.
[5] S. Song, H. Huang, and T. Ruan, “Abstractive text summarization using LSTM-CNN based deep learning,” Multimed Tools Appl, vol. 78, pp. 857–875, 2019.
[6] T. Liu, “A Hybrid Automatic Text summarization Model for Judgment Documents”.
[7] D. Yadav, J. Desai, and A. K. Yadav, “Automatic text summarization methods: A comprehensive review,” arXiv preprint arXiv:2204.01849, 2022.
[8] G. Erkan and D. R. Radev, “Lexrank: Graph-based lexical centrality as salience in text summarization,” Journal of artificial intelligence research, vol. 22, pp. 457–479, 2004.

[9] R. Mihalcea and P. Tarau, “A language independent algorithm for single and multiple document summarization,” in Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts, 2005.
[10] C. Mallick, A. K. Das, M. Dutta, A. K. Das, and A. Sarkar, “Graph-based text summarization using modified TextRank,” in Soft Computing in Data Analytics: Proceedings of International Conference on SCDA 2018, Springer, 2019, pp. 137–146.
[11] K. Kireyev, “Using Latent Semantic Analysis for Extractive Summarization.,” in TAC, 2008.
[12] K. Srividya, S. K. Bommuluri, V. V. V. K. Asapu, T. R. Illa, V. R. Basa, and R. V. S. Chatradi, “A Hybrid Approach for Automatic Text Summarization and Translation based On Luhn, Pegasus, and Textrank Algorithms,” in 2022 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), IEEE, 2022, pp. 1–8.
[13] C. Fang, D. Mu, Z. Deng, and Z. Wu, “Word-sentence co-ranking for automatic extractive text summarization,” Expert Syst Appl, vol. 72, pp. 189–195, 2017.
[14] V. Gupta and G. S. Lehal, “A survey of text summarization extractive techniques,” Journal of emerging technologies in web intelligence, vol. 2, no. 3, pp. 258–268, 2010.
[15] A. Sinha, A. Yadav, and A. Gahlot, “Extractive text summarization using neural networks,” arXiv preprint arXiv:1802.10137, 2018.
[16] D. Miller, “Leveraging BERT for extractive text summarization on lectures,” arXiv preprint arXiv:1906.04165, 2019.
[17] J. Xu, Z. Gan, Y. Cheng, and J. Liu, “Discourse-aware neural extractive text summarization,” arXiv preprint arXiv:1910.14142, 2019.
[18] J. N. Madhuri and R. G. Kumar, “Extractive text summarization using sentence ranking,” in 2019 international conference on data science and communication (IconDSC), IEEE, 2019, pp. 1–3.
[19] R. Alguliev and R. Aliguliyev, “Evolutionary algorithm for extractive text summarization,” Intell Inf Manag, vol. 1, no. 02, p. 128, 2009.
[20] J. Xu and G. Durrett, “Neural extractive text summarization with syntactic compression,” arXiv preprint arXiv:1902.00863, 2019.
[21] N. S. Shirwandkar and S. Kulkarni, “Extractive text summarization using deep learning,” in 2018 fourth international conference on computing communication control and automation (ICCUBEA), IEEE, 2018, pp. 1–5.
[22] R. A. García-Hernández and Y. Ledeneva, “Single extractive text summarization based on a genetic algorithm,” in Pattern Recognition: 5th Mexican Conference, MCPR 2013, Querétaro, Mexico, June 26-29, 2013. Proceedings 5, Springer, 2013, pp. 374–383.
[23] Q. A. Al-Radaideh and D. Q. Bataineh, “A hybrid approach for arabic text summarization using domain knowledge and genetic algorithms,” Cognit Comput, vol. 10, pp. 651–669, 2018.
[24] R. C. Belwal, S. Rai, and A. Gupta, “A new graph-based extractive text summarization using keywords or topic modeling,” J Ambient Intell Humaniz Comput, vol. 12, no. 10, pp. 8975–8990, 2021.
[25] R. Rani and D. K. Lobiyal, “An extractive text summarization approach using tagged-LDA based topic modeling,” Multimed Tools Appl, vol. 80, pp. 3275–3305, 2021.
[26] J. He, W. Kryściński, B. McCann, N. Rajani, and C. Xiong, “Ctrlsum: Towards generic controllable text summarization,” arXiv preprint arXiv:2012.04281, 2020.
[27] M. Mohd, R. Jan, and M. Shah, “Text document summarization using word embedding,” Expert Syst Appl, vol. 143, p. 112958, 2020.
[28] D. Wang, P. Liu, M. Zhong, J. Fu, X. Qiu, and X. Huang, “Exploring domain shift in extractive text summarization,” arXiv preprint arXiv:1908.11664, 2019.
[29] M. Yousefi-Azar and L. Hamey, “Text summarization using unsupervised deep learning,” Expert Syst Appl, vol. 68, pp. 93–105, 2017.
[30] IEEE, “IEEE Xplore.” Accessed: Oct. 12, 2023. [Online]. Available: https://ieeexplore.ieee.org/Xplore/home.jsp
[31] C. Kruengkrai and C. Jaruskulchai, “Generic text summarization using local and global properties of sentences,” in Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003), IEEE, 2003, pp. 201–206.
[32] M. Moradi and N. Ghadiri, “Quantifying the informativeness for biomedical literature summarization: An itemset mining method,” Comput Methods Programs Biomed, vol. 146, pp. 77–89, 2017.
[33] Y. Ko and J. Seo, “An effective sentence-extraction technique using contextual information and statistical approaches for text summarization,” Pattern Recognit Lett, vol. 29, no. 9, pp. 1366–1371, 2008.
[34] C. Mallick, A. K. Das, M. Dutta, A. K. Das, and A. Sarkar, “Graph-based text summarization using modified TextRank,” in Soft Computing in Data Analytics: Proceedings of International Conference on SCDA 2018, Springer, 2019, pp. 137–146.
[35] R. C. Belwal, S. Rai, and A. Gupta, “A new graph-based extractive text summarization using keywords or topic modeling,” J Ambient Intell Humaniz Comput, vol. 12, no. 10, pp. 8975–8990, 2021.
[36] S. Beliga, A. Meštrović, and S. Martinčić-Ipšić, “An overview of graph-based keyword extraction methods and approaches,” Journal of information and organizational sciences, vol. 39, no. 1, pp. 1–20, 2015.
[37] O. Sornil and K. Gree-Ut, “An automatic text summarization approach using content-based and graph-based characteristics,” in 2006 IEEE Conference on Cybernetics and Intelligent Systems, IEEE, 2006, pp. 1–6.
[38] H. M. M. Hasan, F. Sanyal, and D. Chaki, “A novel approach to extract important keywords from documents applying latent semantic analysis,” in 2018 10th International Conference on Knowledge and Smart Technology (KST), IEEE, 2018, pp. 117–122.
[39] S. Gholamrezazadeh, M. A. Salehi, and B. Gholamzadeh, “A comprehensive survey on text summarization systems,” in 2009 2nd International Conference on Computer Science and its Applications, IEEE, 2009, pp. 1–6.
[40] A. S. Schwartz and M. A. Hearst, “A simple algorithm for identifying abbreviation definitions in biomedical text,” in Biocomputing 2003, World Scientific, 2002, pp. 451–462.
[41] A. Dash, A. Shandilya, A. Biswas, K. Ghosh, S. Ghosh, and A. Chakraborty, “Summarizing user-generated textual content: Motivation and methods for fairness in algorithmic summaries,” Proc ACM Hum Comput Interact, vol. 3, no. CSCW, pp. 1–28, 2019.
[42] N. Elhadad, M.-Y. Kan, J. L. Klavans, and K. R. McKeown, “Customization in a unified framework for summarizing medical literature,” Artif Intell Med, vol. 33, no. 2, pp. 179–198, 2005.
[43] M. Song, Y. Feng, and L. Jing, “HISum: Hyperbolic Interaction Model for Extractive Multi-Document Summarization,” in Proceedings of the ACM Web Conference 2023, 2023, pp. 1427–1436.
[44] D. Yadav, J. Desai, and A. K. Yadav, “Automatic text summarization methods: A comprehensive review,” arXiv preprint arXiv:2204.01849, 2022.
[45] S. Upasani, N. Amin, S. Damania, A. Jadhav, and A. M. Jagtap, “Automatic summary generation using textrank based extractive text summarization technique,” 2020.
[46] D. M. Victor, F. F. Eduardo, R. Biswas, E. Alegre, and L. Fernández-Robles, “Application of extractive text summarization algorithms to speech-to-text media,” in Hybrid Artificial Intelligent Systems: 14th International Conference, HAIS 2019, León, Spain, September 4–6, 2019, Proceedings 14, Springer, 2019, pp. 540–550.
[47] S. Sah, S. Kulhare, A. Gray, S. Venugopalan, E. Prud’Hommeaux, and R. Ptucha, “Semantic text summarization of long videos,” in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, 2017, pp. 989–997.
[48] G. Rossiello, P. Basile, and G. Semeraro, “Centroid-based text summarization through compositionality of word embeddings,” in Proceedings of the multiling 2017 workshop on summarization and summary evaluation across source types and genres, 2017, pp. 12–21.
[49] A. Hajjar and J. Tekli, “Unsupervised extractive text summarization using frequency-based sentence clustering,” in European Conference on Advances in Databases and Information Systems, Springer, 2022, pp. 245–255.
[50] P. Verma and H. Om, “Extraction based text summarization methods on user’s review data: A comparative study,” in Smart Trends in Information Technology and Computer Communications: First International Conference, SmartCom 2016, Jaipur, India, August 6–7, 2016, Revised Selected Papers 1, Springer, 2016, pp. 346–354.
[51] R. Hao, Y. Li, Y. Feng, and Z. Chen, “Are duplicates really harmful? An empirical study on bug report summarization techniques,” Journal of Software: Evolution and Process, p. e2424, 2022.
[52] A. Reunamo et al., “Text Classification Model Explainability for Keyword Extraction–Towards Keyword-Based Summarization of Nursing Care Episodes,” in MEDINFO 2021: One World, One Health–Global Partnership for Digital Innovation, IOS Press, 2022, pp. 632–636.
[53] A. Kumar, S. Seth, S. Gupta, and S. Maini, “Sentic computing for aspect-based opinion summarization using multi-head attention with feature pooled pointer generator network,” Cognit Comput, vol. 14, no. 1, pp. 130–148, 2022.
[54] N. Rahman and B. Borah, “Improvement of query-based text summarization using word sense disambiguation,” Complex & Intelligent Systems, vol. 6, pp. 75–85, 2020.
[55] N. Gu and R. H. R. Hahnloser, “SciLit: A Platform for Joint Scientific Literature Discovery, Summarization and Citation Generation,” arXiv preprint arXiv:2306.03535, 2023.
[56] P. Gupta, S. Nigam, and R. Singh, “A Statistical Language Modeling Framework for Extractive Summarization of Text Documents,” SN Comput Sci, vol. 4, no. 6, p. 750, 2023.
[57] H. C. Manh, H. Le Thanh, and T. L. Minh, “Extractive Multi-document Summarization using K-means, Centroid-based Method, MMR, and Sentence Position,” in Proceedings of the 10th International Symposium on Information and Communication Technology, 2019, pp. 29–35.
[58] R. Parimoo, R. Sharma, N. Gaur, N. Jain, and S. Bansal, “A review on text summarization techniques,“,” Int J Res Appl Sci Eng Technol, vol. 10, no. 5, pp. 871–873, 2022.
[59] W. Xiao and G. Carenini, “Extractive summarization of long documents by combining global and local context,” arXiv preprint arXiv:1909.08089, 2019.
[60] N. Giarelis, C. Mastrokostas, and N. Karacapilidis, “Abstractive vs. Extractive Summarization: An Experimental Review,” Applied Sciences, vol. 13, no. 13, p. 7620, 2023.
[61] A.-N. Dutulescu, M. Dascalu, and S. Ruseti, “Unsupervised Extractive Summarization with BERT,” in 2022 24th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), IEEE, 2022, pp. 158–164.

Details

Primary Language

English

Subjects

Natural Language Processing

Journal Section

Research Article

Authors

Mehmet Ali Dursun ^*
0000-0001-6370-1160
Türkiye

Soydan Serttaş
0000-0001-8887-8675
Türkiye

Early Pub Date

March 29, 2024

Publication Date

March 29, 2024

Submission Date

October 16, 2023

Acceptance Date

February 18, 2024

Published in Issue

Year 2024 Volume: 15 Number: 1

DOI

https://doi.org/10.24012/dumf.1376978

IZ

https://izlik.org/JA34BW58JN

Cite

RIS / Bibtex

APA

Dursun, M. A., & Serttaş, S. (2024). A Multi-Metric Model for analyzing and comparing extractive text summarization approaches and algorithms on scientific papers. Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi, 15(1), 31-48. https://doi.org/10.24012/dumf.1376978

AMA

1.Dursun MA, Serttaş S. A Multi-Metric Model for analyzing and comparing extractive text summarization approaches and algorithms on scientific papers. DUJE. 2024;15(1):31-48. doi:10.24012/dumf.1376978

Chicago

Dursun, Mehmet Ali, and Soydan Serttaş. 2024. “A Multi-Metric Model for Analyzing and Comparing Extractive Text Summarization Approaches and Algorithms on Scientific Papers”. Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi 15 (1): 31-48. https://doi.org/10.24012/dumf.1376978.

EndNote

Dursun MA, Serttaş S (March 1, 2024) A Multi-Metric Model for analyzing and comparing extractive text summarization approaches and algorithms on scientific papers. Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi 15 1 31–48.

IEEE

[1]M. A. Dursun and S. Serttaş, “A Multi-Metric Model for analyzing and comparing extractive text summarization approaches and algorithms on scientific papers”, DUJE, vol. 15, no. 1, pp. 31–48, Mar. 2024, doi: 10.24012/dumf.1376978.

ISNAD

Dursun, Mehmet Ali - Serttaş, Soydan. “A Multi-Metric Model for Analyzing and Comparing Extractive Text Summarization Approaches and Algorithms on Scientific Papers”. Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi 15/1 (March 1, 2024): 31-48. https://doi.org/10.24012/dumf.1376978.

JAMA

1.Dursun MA, Serttaş S. A Multi-Metric Model for analyzing and comparing extractive text summarization approaches and algorithms on scientific papers. DUJE. 2024;15:31–48.

MLA

Dursun, Mehmet Ali, and Soydan Serttaş. “A Multi-Metric Model for Analyzing and Comparing Extractive Text Summarization Approaches and Algorithms on Scientific Papers”. Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi, vol. 15, no. 1, Mar. 2024, pp. 31-48, doi:10.24012/dumf.1376978.

Vancouver

1.Mehmet Ali Dursun, Soydan Serttaş. A Multi-Metric Model for analyzing and comparing extractive text summarization approaches and algorithms on scientific papers. DUJE. 2024 Mar. 1;15(1):31-48. doi:10.24012/dumf.1376978

Cited By

Impact of Preprocessing on Indonesian Extractive Summarization Using LexRank, TextRank, DivRank, and Cosine Similarity

G-Tech: Jurnal Teknologi Terapan

https://doi.org/10.70609/g-tech.v9i4.8306