Karcı Baskın Küme Algoritması ve Özvektör Merkeziliği Kullanarak Çizge Tabanlı Çıkarımsal Metin Özetleme için Yeni Bir Yaklaşım
Yıl 2025,
Cilt: 13 Sayı: 1, 81 - 94, 30.06.2025
Taner Uçkan
,
Abdulsamet Aydın
Öz
Son yıllarda metinsel veri kaynaklarının çok hızlı bir şekilde genişlemesiyle otomatik metin özetleme alanında birçok çalışma yapılmaktadır. Çalışmamızda çoklu belgelerinin çıkarımsal, genel özetlenmesi için yeni bir yöntem önerilmektedir. Bu çalışma kapsamında Karcı Baskın Küme Algoritması kullanılmıştır. Özetlenecek metne ait cümlelerin ortak kelime sayıları baz alınarak oluşturulan komşuluk matrisinden çizge oluşturulmuştur. Çizgeye ait baskın kümedeki düğümlerin temsil ettiği cümleler ana metinden çıkarılması ile geriye kalan cümlelerden oluşturulan yeni çizgenin özvektör merkeziliği değerlerine göre özet elde edilmiştir. Çalışma, Document Understanding Conference (DUC-2002 ve DUC-2004) veri seti üzerinde gerçekleştirilmiştir. ROUGE değerlendirme metrikleri ile performansı hesaplanmış ve elde edilen sonuçlar diğer rekabetçi yöntemler ile karşılaştırılmıştır. Geliştirilen model 100 kelimelik özet için 0.35748, 200 kelimelik özet için 0.49049 ve 400 kelimelik özet için 0.57586 ROUGE performans değerine ulaşmıştır. Çalışmanın deneysel süreçleri sırasında raporlanan değerler, bu yenilikçi yöntemin literatüre katkısını açıkça ortaya koymaktadır.
Kaynakça
-
M. R. Amini, N. Usunier, and P. Gallinari, ‘Automatic text summarization based on word-clusters and ranking algorithms’, Lecture Notes in Computer Science, vol. 3408, pp. 142–156, 2005, doi: 10.1007/978-3-540-31865-1_11/COVER.
-
[2] A. Khan, N. Salim, and Y. Jaya Kumar, ‘A framework for multi-document abstractive summarization based on semantic role labelling’, Appl Soft Comput, vol. 30, pp. 737–747, May 2015, doi: 10.1016/J.ASOC.2015.01.070.
-
[3] L. Ermakova, J. V. Cossu, and J. Mothe, ‘A survey on evaluation of summarization methods’, Inf Process Manag, vol. 56, no. 5, pp. 1794–1814, 2019.
-
[4] H. Cengiz, T. Uckan, E. Seyyarer, and A. Karci, ‘Graph-based suggestion for text summarization’, in 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), Ieee, 2018, pp. 1–6.
-
[5] W. S. El-Kassas, C. R. Salama, A. A. Rafea, and H. K. Mohamed, ‘Automatic text summarization: A comprehensive survey’, 2021. doi: 10.1016/j.eswa.2020.113679.
-
[6] T. Uçkan and C. Hark, ‘Çıkarımsal metin özetleme yöntemleri’, in Endüstride Dijitalleşme Örnekleri, S. Güldal, Ed., Ankara/ Turkey: Iksad Publications, 2022, pp. 31–46.
-
[7] A. Karci̇, ‘New algorithms for minimum dominating set in any graphs’, Computer Science, vol. 5, no. 2, pp. 62–70, 2020.
-
[8] H. P. Edmundson, ‘New methods in automatic extracting’, Journal of the ACM (JACM), vol. 16, no. 2, pp. 264–285, 1969.
-
[9] C. Y. Lin, ‘Rouge: A package for automatic evaluation of summaries’, Proceedings of the workshop on text summarization branches out (WAS 2004), no. 1, 2004.
-
[10] G. Salton and C. Buckley, ‘Term-weighting approaches in automatic text retrieval’, Inf Process Manag, vol. 24, no. 5, pp. 513–523, 1988.
-
[11] V. Gulati, D. Kumar, D. E. Popescu, and J. D. Hemanth, ‘Extractive article summarization using integrated TextRank and BM25+ algorithm’, Electronics (Basel), vol. 12, no. 2, p. 372, 2023.
-
[12] Y. A. AL-Khassawneh and E. S. Hanandeh, ‘Extractive Arabic text summarization-graph-based approach’, Electronics (Basel), vol. 12, no. 2, 2023, doi: 10.3390/electronics12020437.
-
[13] A. Joshi, E. Fidalgo, E. Alegre, and R. Alaiz-Rodriguez, ‘RankSum—An unsupervised extractive text summarization based on rank fusion’, Expert Syst Appl, vol. 200, p. 116846, 2022.
-
[14] R. C. Belwal, S. Rai, and A. Gupta, ‘A new graph-based extractive text summarization using keywords or topic modeling’, J Ambient Intell Humaniz Comput, vol. 12, no. 10, pp. 8975–8990, 2021.
-
[15] C. Hark and A. Karcı, ‘Karcı summarization: A simple and effective approach for automatic text summarization using Karcı entropy’, Inf Process Manag, vol. 57, no. 3, p. 102187, 2020.
-
[16] M. N. Azadani, N. Ghadiri, and E. Davoodijam, ‘Graph-based biomedical text summarization: An itemset mining and sentence clustering approach’, J Biomed Inform, vol. 84, pp. 42–58, 2018.
-
[17] C. Yalkın, ‘Çizge tabanlı metin özetleme’, Yüksek Lisans Tezi, Yıldız Teknik Üniversitesi Fen Bilimleri Enstitüsü, İstanbul, 2014.
-
[18] R. Jovanovic and M. Tuba, ‘Ant colony optimization algorithm with pheromone correction strategy for the minimum connected dominating set problem’, Computer Science and Information Systems, vol. 10, no. 1, pp. 133–149, 2013.
-
[19] C. Shen and T. Li, ‘Multi-document summarization via the minimum dominating set’, in Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), 2010, pp. 984–992.
-
[20] X. Xu et al., ‘An algorithm for the minimum dominating set problem based on a new energy function’, in SICE 2004 Annual Conference, IEEE, 2004, pp. 924–926.
-
[21] F. Öztemiz and A. Karci, ‘A New Approach to Determining Effective Nodes in Linked Graphs’, Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, vol. 24, no. 70, pp. 143–155, 2021, doi: 10.21205/deufmd.2022247014.
-
[22] F. Öztemiz, ‘Karmaşık ağlarda hakim düğümlerin belirlenmesi için yeni bir yöntem’, Doktora Tezi, İnönü Üniversitesi Fen Bilimleri Enstitüsü, Malatya, Türkiye, 2021.
-
[23] R. Uehara, S. Toda, and T. Nagoya, ‘Graph isomorphism completeness for chordal bipartite graphs and strongly chordal graphs’, Discrete Appl Math (1979), vol. 145, no. 3, 2005, doi: 10.1016/j.dam.2004.06.008.
-
[24] A. Kosorukoff and D. L. Passmore, Social network analysis: Theory and applications. Passmore, D. L, 2011. [Online]. Available: https://books.google.com.tr/books?id=LrAnswEACAAJ
-
[25] F. Boudin, ‘A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction’, in 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference, 2013.
-
[26] Anonymous, ‘DUC 2002 Guidelines’. Accessed: Feb. 13, 2023. [Online]. Available: https://www-nlpir.nist.gov/projects/duc/guidelines/2002.html
-
[27] D. Greene and P. Cunningham, ‘Practical solutions to the problem of diagonal dominance in kernel document clustering’, in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 377–384.
-
[28] H. P. Luhn, ‘The automatic creation of literature abstracts’, IBM J Res Dev, vol. 2, no. 2, pp. 159–165, 1958.
-
[29] T. K. Landauer, P. W. Foltz, and D. Laham, ‘An introduction to latent semantic analysis’, Discourse Process, vol. 25, no. 2–3, pp. 259–284, 1998.
-
[30] T. K. Landauer and S. T. Dumais, ‘A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge.’, Psychol Rev, vol. 104, no. 2, p. 211, 1997.
-
[31] R. Mihalcea, ‘Language independent extractive summarization’, ACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pp. 49–52, 2005, doi: 10.3115/1225753.1225766.
-
[32] R. Mihalcea and P. Tarau, ‘Textrank: Bringing order into text’, in Proceedings of the 2004 conference on empirical methods in natural language processing, 2004, pp. 404–411.
-
[33] L. Page and S. Brin, ‘The anatomy of a large-scale hypertextual Web search engine’, Computer Networks and ISDN Systems, vol. 30, no. 1–7, pp. 107–117, Apr. 1998, doi: 10.1016/S0169-7552(98)00110-X.
-
[34] G. Erkan and D. R. Radev, ‘Lexrank: Graph-based lexical centrality as salience in text summarization’, Journal of artificial intelligence research, vol. 22, pp. 457–479, 2004.
-
[35] L. Vanderwende, H. Suzuki, C. Brockett, and A. Nenkova, ‘Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion’, Inf Process Manag, vol. 43, no. 6, pp. 1606–1618, 2007.
-
[36] A. Haghighi and L. Vanderwende, ‘Exploring content models for multi-document summarization’, in Proceedings of human language technologies: The 2009 annual conference of the North American Chapter of the Association for Computational Linguistics, 2009, pp. 362–370.
-
[37] A. Joshi, E. Fidalgo, E. Alegre, and L. Fernández-Robles, ‘DeepSumm: Exploiting topic models and sequence to sequence networks for extractive text summarization’, Expert Syst Appl, vol. 211, p. 118442, 2023.
-
[38] J. Cheng and M. Lapata, ‘Neural summarization by extracting sentences and words’, arXiv preprint arXiv:1603.07252, 2016.
-
[39] R. Nallapati, F. Zhai, and B. Zhou, ‘Summarunner: A recurrent neural network based sequence model for extractive summarization of documents’, in Proceedings of the AAAI conference on artificial intelligence, 2017.
-
[40] D. Parveen and M. Strube, ‘Integrating importance, non-redundancy and coherence in graph-based extractive summarization’, in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
-
[41] D. Parveen, H.-M. Ramsl, and M. Strube, ‘Topical coherence for graph-based extractive summarization’, in Proceedings of the 2015 conference on empirical methods in natural language processing, 2015, pp. 1949–1954.
-
[42] M. Tomer and M. Kumar, ‘Multi-document extractive text summarization based on firefly algorithm’, Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 8, pp. 6057–6065, 2022.
-
[43] R. Abbasi-ghalehtaki, H. Khotanlou, and M. Esmaeilpour, ‘Fuzzy evolutionary cellular learning automata model for text summarization’, Swarm Evol Comput, vol. 30, pp. 11–26, 2016.
-
[44] R. M. Alguliyev, R. M. Aliguliyev, N. R. Isazade, A. Abdi, and N. Idris, ‘A model for text summarization’, International Journal of Intelligent Information Technologies (IJIIT), vol. 13, no. 1, pp. 67–85, 2017.
-
[45] M. Mohamed and M. Oussalah, ‘SRL-ESA-TextSum: A text summarization approach based on semantic role labeling and explicit semantic analysis’, Inf Process Manag, vol. 56, no. 4, pp. 1356–1372, 2019.
A Novel Approach for Graph-based Extractive Text Summarization using Karcı Dominant Set Algorithm and Eigenvector Centrality
Yıl 2025,
Cilt: 13 Sayı: 1, 81 - 94, 30.06.2025
Taner Uçkan
,
Abdulsamet Aydın
Öz
With the rapid increase in textual data sources in recent years, it is seen that many studies have been carried out in the field of automatic text summarization. In this study, a new method is proposed for graph-based extractive text summarization. In addition, within the scope of the study, Karcı Dominant Cluster Algorithm is used for the first time in text summarization systems. In the proposed method, firstly, a graph is created from the neighborhood matrix based on the common word numbers of the sentences belonging to the text to be summarized. In the second step, the sentences represented by the nodes in the dominant cluster of the graph are determined using the Karcı Dominant Set Algorithm. In the third step, a new graph is created from the remaining sentences by removing the sentences belonging to the dominant clusters determined from the main text. According to the eigenvector centrality values of the new graph created in the last step, the central sentences were found and the sentences were selected to start with the most valuable sentence and summaries were obtained. The study was carried out on the Document Understanding Conference (DUC-2002 and DUC-2004) dataset. Its performance was calculated with ROUGE evaluation metrics and the results were compared with other competitive methods. The developed model reached a ROUGE performance value of 0.35748 for a 100-word summary, 0.49049 for a 200-word summary, and 0.57586 for a 400-word summary. The values reported during the experimental processes of the study clearly reveal the contribution of this innovative method to the literature.
Kaynakça
-
M. R. Amini, N. Usunier, and P. Gallinari, ‘Automatic text summarization based on word-clusters and ranking algorithms’, Lecture Notes in Computer Science, vol. 3408, pp. 142–156, 2005, doi: 10.1007/978-3-540-31865-1_11/COVER.
-
[2] A. Khan, N. Salim, and Y. Jaya Kumar, ‘A framework for multi-document abstractive summarization based on semantic role labelling’, Appl Soft Comput, vol. 30, pp. 737–747, May 2015, doi: 10.1016/J.ASOC.2015.01.070.
-
[3] L. Ermakova, J. V. Cossu, and J. Mothe, ‘A survey on evaluation of summarization methods’, Inf Process Manag, vol. 56, no. 5, pp. 1794–1814, 2019.
-
[4] H. Cengiz, T. Uckan, E. Seyyarer, and A. Karci, ‘Graph-based suggestion for text summarization’, in 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), Ieee, 2018, pp. 1–6.
-
[5] W. S. El-Kassas, C. R. Salama, A. A. Rafea, and H. K. Mohamed, ‘Automatic text summarization: A comprehensive survey’, 2021. doi: 10.1016/j.eswa.2020.113679.
-
[6] T. Uçkan and C. Hark, ‘Çıkarımsal metin özetleme yöntemleri’, in Endüstride Dijitalleşme Örnekleri, S. Güldal, Ed., Ankara/ Turkey: Iksad Publications, 2022, pp. 31–46.
-
[7] A. Karci̇, ‘New algorithms for minimum dominating set in any graphs’, Computer Science, vol. 5, no. 2, pp. 62–70, 2020.
-
[8] H. P. Edmundson, ‘New methods in automatic extracting’, Journal of the ACM (JACM), vol. 16, no. 2, pp. 264–285, 1969.
-
[9] C. Y. Lin, ‘Rouge: A package for automatic evaluation of summaries’, Proceedings of the workshop on text summarization branches out (WAS 2004), no. 1, 2004.
-
[10] G. Salton and C. Buckley, ‘Term-weighting approaches in automatic text retrieval’, Inf Process Manag, vol. 24, no. 5, pp. 513–523, 1988.
-
[11] V. Gulati, D. Kumar, D. E. Popescu, and J. D. Hemanth, ‘Extractive article summarization using integrated TextRank and BM25+ algorithm’, Electronics (Basel), vol. 12, no. 2, p. 372, 2023.
-
[12] Y. A. AL-Khassawneh and E. S. Hanandeh, ‘Extractive Arabic text summarization-graph-based approach’, Electronics (Basel), vol. 12, no. 2, 2023, doi: 10.3390/electronics12020437.
-
[13] A. Joshi, E. Fidalgo, E. Alegre, and R. Alaiz-Rodriguez, ‘RankSum—An unsupervised extractive text summarization based on rank fusion’, Expert Syst Appl, vol. 200, p. 116846, 2022.
-
[14] R. C. Belwal, S. Rai, and A. Gupta, ‘A new graph-based extractive text summarization using keywords or topic modeling’, J Ambient Intell Humaniz Comput, vol. 12, no. 10, pp. 8975–8990, 2021.
-
[15] C. Hark and A. Karcı, ‘Karcı summarization: A simple and effective approach for automatic text summarization using Karcı entropy’, Inf Process Manag, vol. 57, no. 3, p. 102187, 2020.
-
[16] M. N. Azadani, N. Ghadiri, and E. Davoodijam, ‘Graph-based biomedical text summarization: An itemset mining and sentence clustering approach’, J Biomed Inform, vol. 84, pp. 42–58, 2018.
-
[17] C. Yalkın, ‘Çizge tabanlı metin özetleme’, Yüksek Lisans Tezi, Yıldız Teknik Üniversitesi Fen Bilimleri Enstitüsü, İstanbul, 2014.
-
[18] R. Jovanovic and M. Tuba, ‘Ant colony optimization algorithm with pheromone correction strategy for the minimum connected dominating set problem’, Computer Science and Information Systems, vol. 10, no. 1, pp. 133–149, 2013.
-
[19] C. Shen and T. Li, ‘Multi-document summarization via the minimum dominating set’, in Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), 2010, pp. 984–992.
-
[20] X. Xu et al., ‘An algorithm for the minimum dominating set problem based on a new energy function’, in SICE 2004 Annual Conference, IEEE, 2004, pp. 924–926.
-
[21] F. Öztemiz and A. Karci, ‘A New Approach to Determining Effective Nodes in Linked Graphs’, Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, vol. 24, no. 70, pp. 143–155, 2021, doi: 10.21205/deufmd.2022247014.
-
[22] F. Öztemiz, ‘Karmaşık ağlarda hakim düğümlerin belirlenmesi için yeni bir yöntem’, Doktora Tezi, İnönü Üniversitesi Fen Bilimleri Enstitüsü, Malatya, Türkiye, 2021.
-
[23] R. Uehara, S. Toda, and T. Nagoya, ‘Graph isomorphism completeness for chordal bipartite graphs and strongly chordal graphs’, Discrete Appl Math (1979), vol. 145, no. 3, 2005, doi: 10.1016/j.dam.2004.06.008.
-
[24] A. Kosorukoff and D. L. Passmore, Social network analysis: Theory and applications. Passmore, D. L, 2011. [Online]. Available: https://books.google.com.tr/books?id=LrAnswEACAAJ
-
[25] F. Boudin, ‘A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction’, in 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference, 2013.
-
[26] Anonymous, ‘DUC 2002 Guidelines’. Accessed: Feb. 13, 2023. [Online]. Available: https://www-nlpir.nist.gov/projects/duc/guidelines/2002.html
-
[27] D. Greene and P. Cunningham, ‘Practical solutions to the problem of diagonal dominance in kernel document clustering’, in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 377–384.
-
[28] H. P. Luhn, ‘The automatic creation of literature abstracts’, IBM J Res Dev, vol. 2, no. 2, pp. 159–165, 1958.
-
[29] T. K. Landauer, P. W. Foltz, and D. Laham, ‘An introduction to latent semantic analysis’, Discourse Process, vol. 25, no. 2–3, pp. 259–284, 1998.
-
[30] T. K. Landauer and S. T. Dumais, ‘A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge.’, Psychol Rev, vol. 104, no. 2, p. 211, 1997.
-
[31] R. Mihalcea, ‘Language independent extractive summarization’, ACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pp. 49–52, 2005, doi: 10.3115/1225753.1225766.
-
[32] R. Mihalcea and P. Tarau, ‘Textrank: Bringing order into text’, in Proceedings of the 2004 conference on empirical methods in natural language processing, 2004, pp. 404–411.
-
[33] L. Page and S. Brin, ‘The anatomy of a large-scale hypertextual Web search engine’, Computer Networks and ISDN Systems, vol. 30, no. 1–7, pp. 107–117, Apr. 1998, doi: 10.1016/S0169-7552(98)00110-X.
-
[34] G. Erkan and D. R. Radev, ‘Lexrank: Graph-based lexical centrality as salience in text summarization’, Journal of artificial intelligence research, vol. 22, pp. 457–479, 2004.
-
[35] L. Vanderwende, H. Suzuki, C. Brockett, and A. Nenkova, ‘Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion’, Inf Process Manag, vol. 43, no. 6, pp. 1606–1618, 2007.
-
[36] A. Haghighi and L. Vanderwende, ‘Exploring content models for multi-document summarization’, in Proceedings of human language technologies: The 2009 annual conference of the North American Chapter of the Association for Computational Linguistics, 2009, pp. 362–370.
-
[37] A. Joshi, E. Fidalgo, E. Alegre, and L. Fernández-Robles, ‘DeepSumm: Exploiting topic models and sequence to sequence networks for extractive text summarization’, Expert Syst Appl, vol. 211, p. 118442, 2023.
-
[38] J. Cheng and M. Lapata, ‘Neural summarization by extracting sentences and words’, arXiv preprint arXiv:1603.07252, 2016.
-
[39] R. Nallapati, F. Zhai, and B. Zhou, ‘Summarunner: A recurrent neural network based sequence model for extractive summarization of documents’, in Proceedings of the AAAI conference on artificial intelligence, 2017.
-
[40] D. Parveen and M. Strube, ‘Integrating importance, non-redundancy and coherence in graph-based extractive summarization’, in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
-
[41] D. Parveen, H.-M. Ramsl, and M. Strube, ‘Topical coherence for graph-based extractive summarization’, in Proceedings of the 2015 conference on empirical methods in natural language processing, 2015, pp. 1949–1954.
-
[42] M. Tomer and M. Kumar, ‘Multi-document extractive text summarization based on firefly algorithm’, Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 8, pp. 6057–6065, 2022.
-
[43] R. Abbasi-ghalehtaki, H. Khotanlou, and M. Esmaeilpour, ‘Fuzzy evolutionary cellular learning automata model for text summarization’, Swarm Evol Comput, vol. 30, pp. 11–26, 2016.
-
[44] R. M. Alguliyev, R. M. Aliguliyev, N. R. Isazade, A. Abdi, and N. Idris, ‘A model for text summarization’, International Journal of Intelligent Information Technologies (IJIIT), vol. 13, no. 1, pp. 67–85, 2017.
-
[45] M. Mohamed and M. Oussalah, ‘SRL-ESA-TextSum: A text summarization approach based on semantic role labeling and explicit semantic analysis’, Inf Process Manag, vol. 56, no. 4, pp. 1356–1372, 2019.