ON THE EFFECT OF WORD POSITIONS IN GRAPH-BASED KEYWORD EXTRACTION

Osman Kabasakal; Alev Mutlu

Research Article

BibTex

RIS

Cite

ÇİZGE TABANLI ANAHTAR KELİME ÇIKARIMINDA KELİME POZİSYONLARININ ETKİSİ

Year 2021, Volume: 17 Issue: 2, 217 - 239, 08.11.2021

Osman Kabasakal , Alev Mutlu

Abstract

Bu çalışmada gözetimsiz, çizge tabanlı anahtar kelime çıkarma yöntemlerinde kelime pozisyonlarının etkisine odaklanılmaktadır. Bu amaçla, düğümler için; Kelime Pozisyonu (WP), Kelime Pozisyonu Çift Yönlü (WPB), Cümle Pozisyonu (SP) ve Cümle Pozisyonu Çift Yönlü (SPB) isimli ilk ağırlıklandırma yöntemleri üzerinde durulmakta ve bunların performans üzerindeki etkileri tartışılmaktadır. WP, bir metnin başında yer alan kelimelere daha fazla ağırlık vermektedir. WPB, bir metnin başında ya da sonunda bulunan kelimelere daha fazla ağırlık vermektedir. SP, metnin ilk cümlelerinde geçen kelimelere daha fazla ağırlık vermektedir. SPB ise metnin başında ve sonunda yer alan cümlelerdeki kelimelere daha fazla ağırlık vermektedir. Altı veri kümesi üzerinde yapılan deneylerde, WP ve SP ağırlıklandırmalarına istatistiksel bir fark gözelemlenmemiştir. Ancak anahtar kelimelerin metnin başında geçen veri kümelerinde WP daha yüksek başarım göstermekle birlikte SP’den istatistiksel olarak ayrılmamaktadır. Anahtar kelimelerin metin içinde dağıtılmış olan veri kümelerinde SP, WP’den daha başarılı olmakta ve istatistiksel fark göstermektedir.

Keywords

Anahtar Kelime Çıkarımı, Cümle Konumu, Kelime Konumu

Supporting Institution

TÜBİTAK

Project Number

117E566

References

Armouty, B., & Tedmori, S. (2019). “Automated Keyword Extraction using Support Vector Machine from Arabic News Documents”. In 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), IEEE, 342-346. doi:10.1109/JEEIT.2019.8717420.
Anju, R. C., Ramesh, S. H., & Rafeeque, P. C. (2018). “Keyphrase and Relation Extraction from Scientific Publications”. In Damodar Reddy Edla, Pawan Lingras, and Venkatanareshbabu K. (Eds), Advances in Machine Learning and Data Science: Recent Achievements and Research Directives (pp. 113-120). Vol. 705, Singapore, Springer.
Augenstein, I., Das, M., Riedel, S., Vikraman, L., & McCallum, A. (2017). “Semeval 2017 Task 10: ScienceIE-Extracting Keyphrases and Relations from Scientific Publications”. arXiv preprint arXiv:1704.02853. Retrieved from https://arxiv.org/pdf/1704.02853.pdf
Azcarraga, A., Liu, M. D., & Setiono, R. (2012, June). “Keyword extraction using backpropagation neural networks and rule extraction”. In the 2012 International Joint Conference on Neural Networks (IJCNN), IEEE, 1-7. doi:10.1109/IJCNN.2012.6252618.
Beliga, S. (2014). “Keyword Extraction: A Review of Methods and Approaches”. University of Rijeka, Department of Informatics, Rijeka, 1-9.
Bellaachia, A., & Al-Dhelaan, M. (2014). “HG-Rank: A Hypergraph-based Keyphrase Extraction for Short Documents in Dynamic Genre”. In #MSM, #Microposts2014, 4th Workshop on Making Sense of Micropost, 42-49.
Biswas, S. K. (2019). “Keyword Extraction from Tweets Using Weighted Graph”. In Pradeep Kumar Mallick, Valentina Emilia Balas, Akash Kumar Bhoi, and Ahmed F. Zobaa (Eds.), Cognitive Informatics and Soft Computing: Proceeding of CISC 2017 (pp. 475-483). Vol 768, Singapore, Springer.
Biswas, S. K., Bordoloi, M., & Shreya, J. (2018). “A graph based keyword extraction model using collective node weight”. Expert Systems with Applications, Vol. 97, 51-59. doi:10.1016/j.eswa.2017.12.025.
Brin, S., & Page, L. (1998). “The anatomy of a large-scale hypertextual web search engine”. Computer Networks and ISDN Systems. Vol. 30, Issues 1-7, 107-117.
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., & Jatowt, A. (2020). “YAKE! Keyword extraction from single documents using multiple local features”. Information Sciences, Vol. 509, 257-289. doi:10.1016/j.ins.2019.09.013.
Ercan, G., & Cicekli, I. (2007). “Using lexical chains for keyword extraction”. Information Processing & Management, 43(6), 1705-1714. doi:10.1016/j.ipm.2007.01.015.
Florescu, C., & Caragea, C. (2017). “Positionrank: An unsupervised approach to keyphrase extraction from scholarly documents”. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vol. 1, 1105-1115.
Gollapalli, S. D., & Caragea, C. (2014). “Extracting keyphrases from research papers using citation networks”. In AAAI'14: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 28(1), 1629-1635.
Hulth, A. (2003). “Improved automatic keyword extraction given more linguistic knowledge”. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, 216-223.
Kim, S. N., Medelyan, O., Kan, M.-Y., & Baldwin, T. (2010). “SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles”. In Proceedings of the 5th International Workshop on Semantic Evaluation, 21-26.
Kleinberg, J. M. (1999). “Authoritative sources in a hyperlinked environment”. Journal of the ACM (JACM), 46(5), 604-632.
Lynn, H. M., Lee, E., Choi, C., & Kim, P. (2017). “SwiftRank: An Unsupervised Statistical Approach of Keyword and Salient Sentence Extraction for Individual Documents”. Procedia Computer Science, Vol. 113 , 472-477.
Mihalcea, R., & Tarau, P. (2004). “Textrank: Bringing order into text”. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 404-411.
Nguyen, T. D., & Kan, M.-Y. (2007). “Keyphrase Extraction in Scientific Publications”. In 10th International Conference on Asian Digital Libraries, ICADL 2007, DBLP, 317-326.
Ni, W., Liu, T., & Zeng, Q. (2012). “Extracting keyphrase set with high diversity and coverage using structural svm”, In APWeb'12: Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications, 122-133.
Patel, K., & Caragea, C. (2019). “Exploring word embeddings in crf-based keyphrase extraction from research papers”. In K-CAP ’19: Proceedings of the 10th International Conference on Knowledge Capture, 37-44.
Pereira, D. G., Afonso, A., & Medeiros, F. M. (2015). “Overview of Friedman's test and post-hoc analysis”. Communications in Statistics-Simulation and Computation, 44(10), 2636-2653.
Pohlert, T. (2016). “The Pairwise Multiple Comparison of Mean Ranks Package (PMCMR)”. R package. Retrieved from https://cran.r-project.org/web/packages/PMCMR/vignettes/PMCMR.pdf
Sun, P., Wang, L., & Xia, Q. (2017). “The Keyword Extraction of Chinese Medical Web Page Based on WF-TF-IDF Algorithm”. In 2017 International Conference on Cyber-enabled Distributed Computing and Knowledge Discovery (CYBERC), 193-198. Retrieved from https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8250358&tag=1
Tafti, A. P., Wang, Y., Shen, F., Sagheb, E., Kingsbury, P., & Liu, H. (2019). “Integrating word embedding neural networks with pubmed abstracts to extract keyword proximity of chronic diseases”. In 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), 1-4.
Thushara, M. G., Krishnapriya, M. S., & Nair, S. S. (2018). “Domain Classification of Research Papers Using Hybrid Keyphrase Extraction Method”. In Pankaj Kumar Sa, Sambit Bakshi, Ioannis K. Hatzilygeroudis, and Manmath Narayan Sahoo (Eds.), Recent Findings in Intelligent Computing Techniques : Proceedings of the 5th ICACNI (pp. 387-398). Vol. 708, Singapore, Springer.
Tixier, A., Malliaros, F., & Vazirgiannis, M. (2016). “A Graph Degeneracy-based Approach to Keyword Extraction”. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 1860-1870.
Yao, L., Pengzhou, Z., & Chi, Z. (2019). “Research on News Keyword Extraction Technology Based on TF-IDF and TextRank”. In 2019 IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS), 452-455.

ON THE EFFECT OF WORD POSITIONS IN GRAPH-BASED KEYWORD EXTRACTION

Year 2021, Volume: 17 Issue: 2, 217 - 239, 08.11.2021

Osman Kabasakal , Alev Mutlu

Abstract

In this study, we focus on the effect of word positions in unsupervised, graph-based keyword extraction. To this aim, we discuss the performance of four node-weighting procedures, namely Word Position (WP), Word Position Bidirectional (WPB), Sentence Position (SP), and Sentence Position Bidirectional (SPB). WP assigns higher weights to words that appear at the beginning of a text. WPB assigns higher weights to words that appear either at the beginning or end of a text. SP assigns higher weights to words that appear in the very first sentences of a text. SPB assigns higher weights to words that appear in sentences that are either close to the beginning or end of a text. Experiments conducted on six benchmark datasets show that WP and SP do not statistically differ. However, for datasets whose keywords appear early in the text WP performs better than SP with no statistical difference, while for datasets where keywords are evenly distributed in text SP statistically performs better than WP.

Keywords

Keyword Extraction, Sentence Position, Word Position

Project Number

117E566

References

Armouty, B., & Tedmori, S. (2019). “Automated Keyword Extraction using Support Vector Machine from Arabic News Documents”. In 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), IEEE, 342-346. doi:10.1109/JEEIT.2019.8717420.
Anju, R. C., Ramesh, S. H., & Rafeeque, P. C. (2018). “Keyphrase and Relation Extraction from Scientific Publications”. In Damodar Reddy Edla, Pawan Lingras, and Venkatanareshbabu K. (Eds), Advances in Machine Learning and Data Science: Recent Achievements and Research Directives (pp. 113-120). Vol. 705, Singapore, Springer.
Augenstein, I., Das, M., Riedel, S., Vikraman, L., & McCallum, A. (2017). “Semeval 2017 Task 10: ScienceIE-Extracting Keyphrases and Relations from Scientific Publications”. arXiv preprint arXiv:1704.02853. Retrieved from https://arxiv.org/pdf/1704.02853.pdf
Azcarraga, A., Liu, M. D., & Setiono, R. (2012, June). “Keyword extraction using backpropagation neural networks and rule extraction”. In the 2012 International Joint Conference on Neural Networks (IJCNN), IEEE, 1-7. doi:10.1109/IJCNN.2012.6252618.
Beliga, S. (2014). “Keyword Extraction: A Review of Methods and Approaches”. University of Rijeka, Department of Informatics, Rijeka, 1-9.
Bellaachia, A., & Al-Dhelaan, M. (2014). “HG-Rank: A Hypergraph-based Keyphrase Extraction for Short Documents in Dynamic Genre”. In #MSM, #Microposts2014, 4th Workshop on Making Sense of Micropost, 42-49.
Biswas, S. K. (2019). “Keyword Extraction from Tweets Using Weighted Graph”. In Pradeep Kumar Mallick, Valentina Emilia Balas, Akash Kumar Bhoi, and Ahmed F. Zobaa (Eds.), Cognitive Informatics and Soft Computing: Proceeding of CISC 2017 (pp. 475-483). Vol 768, Singapore, Springer.
Biswas, S. K., Bordoloi, M., & Shreya, J. (2018). “A graph based keyword extraction model using collective node weight”. Expert Systems with Applications, Vol. 97, 51-59. doi:10.1016/j.eswa.2017.12.025.
Brin, S., & Page, L. (1998). “The anatomy of a large-scale hypertextual web search engine”. Computer Networks and ISDN Systems. Vol. 30, Issues 1-7, 107-117.
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., & Jatowt, A. (2020). “YAKE! Keyword extraction from single documents using multiple local features”. Information Sciences, Vol. 509, 257-289. doi:10.1016/j.ins.2019.09.013.
Ercan, G., & Cicekli, I. (2007). “Using lexical chains for keyword extraction”. Information Processing & Management, 43(6), 1705-1714. doi:10.1016/j.ipm.2007.01.015.
Florescu, C., & Caragea, C. (2017). “Positionrank: An unsupervised approach to keyphrase extraction from scholarly documents”. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vol. 1, 1105-1115.
Gollapalli, S. D., & Caragea, C. (2014). “Extracting keyphrases from research papers using citation networks”. In AAAI'14: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 28(1), 1629-1635.
Hulth, A. (2003). “Improved automatic keyword extraction given more linguistic knowledge”. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, 216-223.
Kim, S. N., Medelyan, O., Kan, M.-Y., & Baldwin, T. (2010). “SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles”. In Proceedings of the 5th International Workshop on Semantic Evaluation, 21-26.
Kleinberg, J. M. (1999). “Authoritative sources in a hyperlinked environment”. Journal of the ACM (JACM), 46(5), 604-632.
Lynn, H. M., Lee, E., Choi, C., & Kim, P. (2017). “SwiftRank: An Unsupervised Statistical Approach of Keyword and Salient Sentence Extraction for Individual Documents”. Procedia Computer Science, Vol. 113 , 472-477.
Mihalcea, R., & Tarau, P. (2004). “Textrank: Bringing order into text”. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 404-411.
Nguyen, T. D., & Kan, M.-Y. (2007). “Keyphrase Extraction in Scientific Publications”. In 10th International Conference on Asian Digital Libraries, ICADL 2007, DBLP, 317-326.
Ni, W., Liu, T., & Zeng, Q. (2012). “Extracting keyphrase set with high diversity and coverage using structural svm”, In APWeb'12: Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications, 122-133.
Patel, K., & Caragea, C. (2019). “Exploring word embeddings in crf-based keyphrase extraction from research papers”. In K-CAP ’19: Proceedings of the 10th International Conference on Knowledge Capture, 37-44.
Pereira, D. G., Afonso, A., & Medeiros, F. M. (2015). “Overview of Friedman's test and post-hoc analysis”. Communications in Statistics-Simulation and Computation, 44(10), 2636-2653.
Pohlert, T. (2016). “The Pairwise Multiple Comparison of Mean Ranks Package (PMCMR)”. R package. Retrieved from https://cran.r-project.org/web/packages/PMCMR/vignettes/PMCMR.pdf
Sun, P., Wang, L., & Xia, Q. (2017). “The Keyword Extraction of Chinese Medical Web Page Based on WF-TF-IDF Algorithm”. In 2017 International Conference on Cyber-enabled Distributed Computing and Knowledge Discovery (CYBERC), 193-198. Retrieved from https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8250358&tag=1
Tafti, A. P., Wang, Y., Shen, F., Sagheb, E., Kingsbury, P., & Liu, H. (2019). “Integrating word embedding neural networks with pubmed abstracts to extract keyword proximity of chronic diseases”. In 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), 1-4.
Thushara, M. G., Krishnapriya, M. S., & Nair, S. S. (2018). “Domain Classification of Research Papers Using Hybrid Keyphrase Extraction Method”. In Pankaj Kumar Sa, Sambit Bakshi, Ioannis K. Hatzilygeroudis, and Manmath Narayan Sahoo (Eds.), Recent Findings in Intelligent Computing Techniques : Proceedings of the 5th ICACNI (pp. 387-398). Vol. 708, Singapore, Springer.
Tixier, A., Malliaros, F., & Vazirgiannis, M. (2016). “A Graph Degeneracy-based Approach to Keyword Extraction”. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 1860-1870.
Yao, L., Pengzhou, Z., & Chi, Z. (2019). “Research on News Keyword Extraction Technology Based on TF-IDF and TextRank”. In 2019 IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS), 452-455.

There are 28 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Articles
Authors	Osman Kabasakal 0000-0003-1187-5147 Alev Mutlu 0000-0003-0547-0653
Project Number	117E566
Publication Date	November 8, 2021
Published in Issue	Year 2021 Volume: 17 Issue: 2

Cite

APA	Kabasakal, O., & Mutlu, A. (2021). ON THE EFFECT OF WORD POSITIONS IN GRAPH-BASED KEYWORD EXTRACTION. Journal of Naval Sciences and Engineering, 17(2), 217-239.

Article Files

Full Text