TR
EN
A Multi-Metric Model for analyzing and comparing extractive text summarization approaches and algorithms on scientific papers
Abstract
In today's world, where data and information are increasingly proliferating, text summarization and technologies play a critical role in making large amounts of text data more accessible and meaningful. In business, the news industry, academic research, and many other fields, text summarization helps make quick decisions, access information faster, and manage resources more effectively. Additionally, text summarization research is conducted to further improve these technologies and develop new methods and algorithms to provide better summarization of texts. Therefore, text summarization and research in this field are of great importance in the information age. In this study, a new operating model for text summarization that can be applied to different algorithms is proposed and evaluated. Sixteen summarization algorithms covering six approaches (statistical, graph-based, content-based, pointer-based, position-based, and user-oriented) were implemented and tested on 50 different full-text article datasets. Four evaluation criteria (BLEU, Rouge-N, Rouge-L, METEOR) were used to assess the similarity between the generated summaries and the original summaries. The performance of the algorithms within each approach was averaged and the overall best-performing algorithm was selected. This best algorithm was subjected to further analysis through Topic Modelling and Keyword Extraction to identify key topics and keywords within the summarised text. The proposed model provides a standardized workflow for developing and thoroughly testing summarization algorithms across datasets and evaluation metrics to determine the most appropriate summarization approach. This study demonstrates the effectiveness of the model on a variety of algorithm types and text sources.
Keywords
References
- [1] W. S. El-Kassas, C. R. Salama, A. A. Rafea, and H. K. Mohamed, “Automatic text summarization: A comprehensive survey,” Expert Syst Appl, vol. 165, p. 113679, 2021.
- [2] A. Dash, A. Shandilya, A. Biswas, K. Ghosh, S. Ghosh, and A. Chakraborty, “Summarizing user-generated textual content: Motivation and methods for fairness in algorithmic summaries,” Proc ACM Hum Comput Interact, vol. 3, no. CSCW, pp. 1–28, 2019.
- [3] N. Alami, M. El Mallahi, H. Amakdouf, and H. Qjidaa, “Hybrid method for text summarization based on statistical and semantic treatment,” Multimed Tools Appl, vol. 80, pp. 19567–19600, 2021.
- [4] A. Kanapala, S. Pal, and R. Pamula, “Text summarization from legal documents: a survey,” Artif Intell Rev, vol. 51, pp. 371–402, 2019.
- [5] S. Song, H. Huang, and T. Ruan, “Abstractive text summarization using LSTM-CNN based deep learning,” Multimed Tools Appl, vol. 78, pp. 857–875, 2019.
- [6] T. Liu, “A Hybrid Automatic Text summarization Model for Judgment Documents”.
- [7] D. Yadav, J. Desai, and A. K. Yadav, “Automatic text summarization methods: A comprehensive review,” arXiv preprint arXiv:2204.01849, 2022.
- [8] G. Erkan and D. R. Radev, “Lexrank: Graph-based lexical centrality as salience in text summarization,” Journal of artificial intelligence research, vol. 22, pp. 457–479, 2004.
Details
Primary Language
English
Subjects
Natural Language Processing
Journal Section
Research Article
Early Pub Date
March 29, 2024
Publication Date
March 29, 2024
Submission Date
October 16, 2023
Acceptance Date
February 18, 2024
Published in Issue
Year 2024 Volume: 15 Number: 1
IEEE
[1]M. A. Dursun and S. Serttaş, “A Multi-Metric Model for analyzing and comparing extractive text summarization approaches and algorithms on scientific papers”, DUJE, vol. 15, no. 1, pp. 31–48, Mar. 2024, doi: 10.24012/dumf.1376978.