An unsupervised hybrid model for keyphrase extraction
Abstract
Extracting the pertinent words from a text can be defined as keyphrase or keyword extraction. While a keyphrase consists of multiple words and a keyword is a single word, they can also be used interchangeably. Though there are different methods for keyword extraction in the literature, unsupervised methods come to the fore with their independence from the domain and not needing training with labeled data. Hence, in this work, a new unsupervised hybrid model is presented for the keyphrase extraction task. The proposed model consists of a graph-based and an embedding-based method. The proposed model is developed using the graph centrality criteria and the skip-gram embedding method created for each document. The model was evaluated on a dataset and compared with the literature. Following comprehensive experiments, it was observed that our model provided comparable performance with statistical models, while outperforming other graph-based and embedding-based models.
Keywords
Kaynakça
- [1] Merrouni ZA, Frikh B, Ouhbi B. “Automatic keyphrase extraction: a survey and trends”. Journal of Intelligent Information Systems, 54(2), 391-424, 2019.
- [2] Papagiannopoulou E, Tsoumakas G. “A review of keyphrase extraction”. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(2), 1-45, 2019.
- [3] Beliga S. "Keyword extraction: a review of methods and approaches". University of Rijeka, Department of Informatics, 1(9), 1-9, 2014.
- [4] Bougouin A, Boudin F, Daille B. "Topicrank: Graph-based topic ranking for keyphrase extraction". International Joint Conference on Natural Language Processing (IJCNLP), Nagoya, Japan, 14-18 October 2013.
- [5] Sun C, Hu L, Li S, Li T, Li H, Chi L. “A review of unsupervised keyphrase extraction methods using within-collection resources”. Symmetry, 12(11), 1-20, 2020.
- [6] Hasan KS, Ng V. "Automatic keyphrase extraction: A survey of the state of the art". Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA, 23-25 June 2014.
- [7] El-Beltagy SR, Rafea A. “KP-Miner: A keyphrase extraction system for English and Arabic documents”. Information Systems, 34(1), 132-144, 2009.
- [8] Liu Z, Li P, Zheng Y, Sun M. “Clustering to find exemplar terms for keyphrase extraction”. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, 6-7 August 2009.
Ayrıntılar
Birincil Dil
İngilizce
Konular
Yazılım Mühendisliği (Diğer)
Bölüm
Araştırma Makalesi
Erken Görünüm Tarihi
2 Kasım 2025
Yayımlanma Tarihi
16 Mart 2026
Gönderilme Tarihi
22 Ocak 2025
Kabul Tarihi
22 Temmuz 2025
Yayımlandığı Sayı
Yıl 2026 Cilt: 32 Sayı: 2