An unsupervised hybrid model for keyphrase extraction

Özlem Örnek; Efnan Şora Günal; Eyyüp Gülbandılar

doi:10.5505/pajes.2025.05400

EN TR

An unsupervised hybrid model for keyphrase extraction

Öz

Extracting the pertinent words from a text can be defined as keyphrase or keyword extraction. While a keyphrase consists of multiple words and a keyword is a single word, they can also be used interchangeably. Though there are different methods for keyword extraction in the literature, unsupervised methods come to the fore with their independence from the domain and not needing training with labeled data. Hence, in this work, a new unsupervised hybrid model is presented for the keyphrase extraction task. The proposed model consists of a graph-based and an embedding-based method. The proposed model is developed using the graph centrality criteria and the skip-gram embedding method created for each document. The model was evaluated on a dataset and compared with the literature. Following comprehensive experiments, it was observed that our model provided comparable performance with statistical models, while outperforming other graph-based and embedding-based models.

Anahtar Kelimeler

Anahtar kelime çıkarımı için denetimsiz hibrit bir model

Öz

Metinden ilgili sözcükleri çıkarmak, anahtar sözcük öbeği veya anahtar sözcük çıkarma olarak tanımlanabilir. Bir anahtar sözcük öbeği birden fazla sözcükten oluşurken, bir anahtar sözcük tek bir sözcük olsa da bunlar birbirinin yerine kullanılabilir. Literatürde anahtar sözcük çıkarma için farklı yöntemler bulunmasına rağmen, denetimsiz yöntemler, alandan bağımsız olmaları ve etiketli verilerle eğitim gerektirmemeleri nedeniyle ön plana çıkmaktadır. Bu nedenle, bu çalışmada, anahtar sözcük öbeği çıkarma görevi için yeni bir denetimsiz hibrit model sunulmuştur. Önerilen model, çizge tabanlı ve gömme tabanlı yöntemlerden oluşmaktadır. Önerilen model, çizge merkezilik ölçütleri ve her bir belge için oluşturulan skip-gram gömme yöntemi kullanılarak geliştirilmiştir. Model, bir veri kümesi üzerinde değerlendirilmiş ve literatürle karşılaştırılmıştır. Kapsamlı deneyler sonucunda, modelimizin istatistiksel modellerle karşılaştırılabilir performans sağladığı, diğer çizge tabanlı ve gömme tabanlı modellerden daha iyi performans gösterdiği gözlenmiştir

Anahtar Kelimeler

Kaynakça

[1] Merrouni ZA, Frikh B, Ouhbi B. “Automatic keyphrase extraction: a survey and trends”. Journal of Intelligent Information Systems, 54(2), 391-424, 2019.
[2] Papagiannopoulou E, Tsoumakas G. “A review of keyphrase extraction”. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(2), 1-45, 2019.
[3] Beliga S. "Keyword extraction: a review of methods and approaches". University of Rijeka, Department of Informatics, 1(9), 1-9, 2014.
[4] Bougouin A, Boudin F, Daille B. "Topicrank: Graph-based topic ranking for keyphrase extraction". International Joint Conference on Natural Language Processing (IJCNLP), Nagoya, Japan, 14-18 October 2013.
[5] Sun C, Hu L, Li S, Li T, Li H, Chi L. “A review of unsupervised keyphrase extraction methods using within-collection resources”. Symmetry, 12(11), 1-20, 2020.
[6] Hasan KS, Ng V. "Automatic keyphrase extraction: A survey of the state of the art". Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA, 23-25 June 2014.
[7] El-Beltagy SR, Rafea A. “KP-Miner: A keyphrase extraction system for English and Arabic documents”. Information Systems, 34(1), 132-144, 2009.
[8] Liu Z, Li P, Zheng Y, Sun M. “Clustering to find exemplar terms for keyphrase extraction”. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, 6-7 August 2009.

[9] Mikolov T, Chen K, Corrado G, Dean J. “Efficient estimation of word representations in vector space”. International Conference on Learning Representations, Scottsdale, Arizona, USA, 2-4 May 2013.
[10] Lau JH, Baldwin T. “An Empirical evaluation of doc2vec with practical insights into document embedding generation”. Proceedings of the 1st Workshop on Representation Learning for NLP, Berlin, Germany, 11 August 2016.
[11] Pagliardini M, Gupta P, Jaggi M. “Unsupervised learning of sentence embeddings using compositional n-gram features”. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans/Louisiana, USA, 1-6 June 2018.
[12] Pennington J, Socher R, Manning CD. "Glove: Global vectors for word representation". Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25-29 October 2014.
[13] Tomokiyo T, Hurst M. "A language model approach to keyphrase extraction". Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, Sapporo, Japan, 12 July 2003.
[14] Chi L, Hu L. “ISKE: An unsupervised automatic keyphrase extraction approach using the iterated sentences based on graph method”. Knowledge-Based Systems, 223, 107014-107026, 2021.
[15] Zhao L, Miao Z, Wang C, Kong W. “An unsupervised keyword extraction method based on text semantic graph”. 2022 IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Beijing, China, 03-05 October 2022.
[16] Liao S, Yang Z, Liao Q, Zheng Z. “TopicLPRank: a keyphrase extraction method based on improved TopicRank”. The Journal of supercomputing/Journal of supercomputing, 79(8), 9073-9092, 2023.
[17] Kumar N, Srinathan K, Varma V. “A graph-based unsupervised N-gram filtration technique for automatic keyphrase extraction”. International Journal of Data Mining, Modelling and Management, 8(2), 124-143, 2016.
[18] Ying Y, Qingping T, Qinzheng X, Ping Z, Panpan L. “A graph-based approach of automatic keyphrase extraction”. Procedia Computer Science, 107, 248-255, 2017.
[19] Song HJ, Go J, Park SB, Park SY, Kim KY. “A just-in-time keyword extraction from meeting transcripts using temporal and participant information”. Journal of Intelligent Information Systems, 48(1), 117-140, 2016.
[20] Li SQ, Du SM, Xing XZ. “A keyword extraction method for chinese scientific abstracts”. Proceedings of the 2017 International Conference on Wireless Communications, Networking and Applications, Shenzhen, China, 20-22 October 2017.
[21] Florescu C, Caragea C. “a new scheme for scoring phrases in unsupervised keyphrase extraction”. Advances in Information Retrieval: 39th European Conference on IR Research, Aberdeen, UK, 8-13 April 2017.
[22] Batsuren K, Batbaatar E, Munkhdalai T, Li M, Namsrai OE, Ryu KH. "A Dependency graph-based keyphrase extraction method using anti-patterns". Journal of Information Processing Systems, 14(5), 1254-1271, 2018.
[23] Hulth A. “Improved automatic keyword extraction given more linguistic knowledge”. Proceedings of the 2003 conference on Empirical methods in natural language processing, Sapporo, Japan, 11-12 July 2003.
[24] Biswas SK, Bordoloi M, Shreya J. “A graph based keyword extraction model using collective node weight”. Expert Systems with Applications, 97, 51-59, 2018.
[25] Mothe J, Ramiandrisoa F, Rasolomanana M. "Automatic keyphrase extraction using graph-based methods". Proceedings of the 33rd Annual ACM Symposium on Applied Computing, Pau, France, 9-13 April 2018.
[26] Vega-Oliveros DA, Gomes PS, Milios EE, Berton L. “A multi-centrality index for graph-based keyword extraction”. Information Processing & Management, 56(6), 102063-102080, 2019.
[27] Thushara MG, Anjali S, Nai MM. "An analysis on different document keyword extraction methods". 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 27-29 March 2019.
[28] Li TF, Hu L, Chu JF, Li HT, Chi L. “An unsupervised approach for keyphrase extraction using within-collection resources”. IEEE Access, 7, 126088-126097, 2019.
[29] Zhang M, Li X, Yue S, Yang L. “An empirical study of TextRank for keyword extraction”. IEEE Access, 8, 178849-178858, 2020.
[30] Brin S, Page L. “The anatomy of a large-scale hypertextual Web search engine”. Computer Networks and ISDN Systems, 30(1-7), 107-117, 1998.
[31] Mihalcea R, Tarau P. "Textrank: Bringing order into text". Proceedings of the 2004 Conference on Empirical Methods In Natural Language Processing, Barcelona, Spain, 25-26 July 2004.
[32] Wan X, Xiao J. "Single document keyphrase extraction using neighborhood knowledge". Proceedings of the 23rd National Conference on Artificial Intelligence, Chicago/Illinois, USA, 13-17 July 2008.
[33] Rose S, Engel D, Cramer N, Cowley W. “Automatic Keyword Extraction from Individual Documents”. Editors: Berry MW, Kogan J. Text Mining: Applications and Theory, 1-20, Hoboken, New Jersey, USA, John Wiley & Sons, Ltd, 2010.
[34] Danesh S, Sumner T, Martin JH. "Sgrank: Combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction". Proceedings of The Fourth Joint Conference on Lexical and Computational Semantics, Denver/Colorado, USA, 4-5 June 2015.
[35] Florescu C, Caragea C. “PositionRank: An unsupervised approach to keyphrase extraction from scholarly documents”. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, 30 July-4 August 2017.
[36] Sarracén GLDP, Rosso P. “Offensive keyword extraction based on the attention mechanism of BERT and the eigenvector centrality using a graph representation”. Personal and Ubiquitous Computing, 27(1), 45-57, 2023.
[37] Gupta A, Chadha A, Tewari V. “A natural language processing model on BERT and YAKE technique for keyword extraction on sustainability reports”. IEEE Access, 12, 7942-7951, 2024.
[38] Londhe RA, Nikam MV. “A survey on keyword extraction approaches”. International Journal of Advance Research and Innovative Ideas in Education, 3(3), 3549- 3555, 2017.
[39] Beliga S, Meštrović A, Martinčić-Ipšić S. “An overview of graph-based keyword extraction methods and approaches”. Journal of Information and Organizational Sciences, 39(1), 1-20, 2015.
[40] Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. “Distributed representations of words and phrases and their compositionality”. Advances in Neural Information Processing Systems 26 (NIPS 2013), Lake Tahoe/Nevada, USA, 5-8 December 2013.
[41] Kim SN, Medelyan O, Kan MY, Baldwin T, Pingar LP. “SemEval-2010 task 5: Automatic keyphrase extraction from scientific articles”. Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, 15-16 July 2010.
[42] Liuhuanyong. “GitHub-Liuhuanyong/TextGrapher: Text Content Grapher based on keyinfo extraction by NLP method 输入一篇文档，将文档进行关键信息提取，进行结构化，并最终组织成图谱组织形式，形成对文章语义信息的图谱化展示”. https://github.com/liuhuanyong/TextGrapher/ (03.02.2025).
[43] Hu J, Li S, Yao Y, Yu L, Yang G, Hu J. “Patent keyword extraction algorithm based on distributed representation for patent classification”. Entropy, 20(2), 104-123, 2018.
[44] Luo L, Zhang L, Peng H. “An unsupervised keyphrase extraction model by incorporating structural and semantic information”. Progress in Artificial Intelligence, 9(1), 77-83, 2019.
[45] Singh V, Bolla BK. “Hybrid approach to unsupervised keyphrase extraction”. Procedia Computer Science, 235, 1498-1511, 2024.
[46] Giarelis N, Karacapilidis N. “Deep learning and embeddings-based approaches for keyphrase extraction: a literature review”. Knowledge and Information Systems, 66(11), 6493-6526, 2024.
[47] Popova S, Cardiff J, Danilova V. “Rapid unsupervised keyphrase extraction from single document”. 2024 36th Conference of Open Innovations Association (FRUCT), Lappeenranta, Finland, 30 October-01 November 2024.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Yazılım Mühendisliği (Diğer)

Bölüm

Araştırma Makalesi

Yazarlar

Özlem Örnek ^*
Türkiye

Efnan Şora Günal
Türkiye

Eyyüp Gülbandılar
Türkiye

Erken Görünüm Tarihi

2 Kasım 2025

Yayımlanma Tarihi

16 Mart 2026

Gönderilme Tarihi

22 Ocak 2025

Kabul Tarihi

22 Temmuz 2025

Yayımlandığı Sayı

Yıl 2026 Cilt: 32 Sayı: 2

DOI

https://doi.org/10.5505/pajes.2025.05400

IZ

https://izlik.org/JA26YE87GR

Kaynak Göster

RIS / Bibtex

APA

Örnek, Ö., Şora Günal, E., & Gülbandılar, E. (2026). An unsupervised hybrid model for keyphrase extraction. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 32(2), 259-267. https://doi.org/10.5505/pajes.2025.05400

AMA

1.Örnek Ö, Şora Günal E, Gülbandılar E. An unsupervised hybrid model for keyphrase extraction. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2026;32(2):259-267. doi:10.5505/pajes.2025.05400

Chicago

Örnek, Özlem, Efnan Şora Günal, ve Eyyüp Gülbandılar. 2026. “An unsupervised hybrid model for keyphrase extraction”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 32 (2): 259-67. https://doi.org/10.5505/pajes.2025.05400.

EndNote

Örnek Ö, Şora Günal E, Gülbandılar E (01 Mart 2026) An unsupervised hybrid model for keyphrase extraction. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 32 2 259–267.

IEEE

[1]Ö. Örnek, E. Şora Günal, ve E. Gülbandılar, “An unsupervised hybrid model for keyphrase extraction”, Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, c. 32, sy 2, ss. 259–267, Mar. 2026, doi: 10.5505/pajes.2025.05400.

ISNAD

Örnek, Özlem - Şora Günal, Efnan - Gülbandılar, Eyyüp. “An unsupervised hybrid model for keyphrase extraction”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 32/2 (01 Mart 2026): 259-267. https://doi.org/10.5505/pajes.2025.05400.

JAMA

1.Örnek Ö, Şora Günal E, Gülbandılar E. An unsupervised hybrid model for keyphrase extraction. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2026;32:259–267.

MLA

Örnek, Özlem, vd. “An unsupervised hybrid model for keyphrase extraction”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, c. 32, sy 2, Mart 2026, ss. 259-67, doi:10.5505/pajes.2025.05400.

Vancouver

1.Özlem Örnek, Efnan Şora Günal, Eyyüp Gülbandılar. An unsupervised hybrid model for keyphrase extraction. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 01 Mart 2026;32(2):259-67. doi:10.5505/pajes.2025.05400