Research Article

Keyword Extraction from Kazakh News Dataset with BERT

Volume: 9 Number: 4 December 31, 2022
TR EN

Keyword Extraction from Kazakh News Dataset with BERT

Abstract

Keywords provide a concise and precise description of the document's content. Due to the importance of the keyword and the difficulty of manual markup, automatic keyword extraction makes this process easy and fast. In this paper, Keyword Extraction from Kazakh News Dataset was presented. Model performance results were obtained by using the BERT base - uncased and BERT-base-multilingual-uncased pre-trained language model for the newly compiled Kazakh News Dataset-KND. Compiled Kazakh news data set consists of 7060 data. Data were collected from the web pages anatili.kazgazeta.kz, Bilimdinews.kz, and zhasalash.kz using the BeautifulSoap and Requests libraries. These web pages mostly contain news, history, and literary texts. The dataset includes the publication name or news title, the author of the publication or news subject, and the URL of the Kazakh news site. In the evaluation of the training results, it was observed that the BERT base-multilingual-uncased F-score performance was higher than the BERT model.

Keywords

References

  1. [1]. Birdevrim, S. A., Boyacı, A., Al Thani, D. A. S., “İyileştirilmiş otomatik anahtar kelime çıkarımı (BRAKE).” İstanbul Ticaret Üniversitesi Teknoloji ve Uygulamalı Bilimler Dergisi. 2018, 1(1): 11-19.
  2. [2]. Siddiqi, S., Sharan, A., “Keyword and keyphrase extraction techniques: a literature review”. International Journal of Computer Applications, 2015, 109 (2).
  3. [3]. Bekbulatov, E., Kartbayev, A., “A study of certain morphological structures of Kazakh and their impact on the machine translation quality”. In: 2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), 2014, 1-5.
  4. [4]. Myrzakhmetov, B., Kozhirbayev, Zh., “Extended language modeling experiments for kazakh.” the proceedings of 2018 International Workshop on Computational Models in Language and Speech, 2018.
  5. [5]. Nugumanova, A., Mansurova, M., “Tabigi til matinderindegi terminderdi avtomatti turde tanu” Monografiya, Oskemen, ShQMU, 2019.
  6. [6]. Raximova, D.R,. Qasimova, D.T, İsabaeva D.N., “Qazaq tiline arnalgan BERT modeli negizinde suraq-jauap juyesin zertteu jane azirleu.” Abay atındagı QazUPU-nin XABARSHISI, «Fizika-matematika gılımdarı» seriyası, 2021, 4 (76).
  7. [7]. Alzaidy, R., Caragea, C., Giles, C., “Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents.” In: The world wide web conference, 2019, 2551-255.
  8. [8]. [8]. Santosh, T.Y., Sanyal, D.K., Bhowmick, P.K., Das, P.P., “Dake: Document-level attention for keyphrase extraction.” In Proceedings of the European Conference on Information Retrieval, 2020, 392–401.

Details

Primary Language

English

Subjects

Engineering

Journal Section

Research Article

Publication Date

December 31, 2022

Submission Date

June 16, 2022

Acceptance Date

September 7, 2022

Published in Issue

Year 2022 Volume: 9 Number: 4

APA
Abibullayeva, A., & Çetin, A. (2022). Keyword Extraction from Kazakh News Dataset with BERT. El-Cezeri, 9(4), 1193-1200. https://doi.org/10.31202/ecjse.1131826
AMA
1.Abibullayeva A, Çetin A. Keyword Extraction from Kazakh News Dataset with BERT. El-Cezeri Journal of Science and Engineering. 2022;9(4):1193-1200. doi:10.31202/ecjse.1131826
Chicago
Abibullayeva, Aiman, and Aydın Çetin. 2022. “Keyword Extraction from Kazakh News Dataset With BERT”. El-Cezeri 9 (4): 1193-1200. https://doi.org/10.31202/ecjse.1131826.
EndNote
Abibullayeva A, Çetin A (December 1, 2022) Keyword Extraction from Kazakh News Dataset with BERT. El-Cezeri 9 4 1193–1200.
IEEE
[1]A. Abibullayeva and A. Çetin, “Keyword Extraction from Kazakh News Dataset with BERT”, El-Cezeri Journal of Science and Engineering, vol. 9, no. 4, pp. 1193–1200, Dec. 2022, doi: 10.31202/ecjse.1131826.
ISNAD
Abibullayeva, Aiman - Çetin, Aydın. “Keyword Extraction from Kazakh News Dataset With BERT”. El-Cezeri 9/4 (December 1, 2022): 1193-1200. https://doi.org/10.31202/ecjse.1131826.
JAMA
1.Abibullayeva A, Çetin A. Keyword Extraction from Kazakh News Dataset with BERT. El-Cezeri Journal of Science and Engineering. 2022;9:1193–1200.
MLA
Abibullayeva, Aiman, and Aydın Çetin. “Keyword Extraction from Kazakh News Dataset With BERT”. El-Cezeri, vol. 9, no. 4, Dec. 2022, pp. 1193-00, doi:10.31202/ecjse.1131826.
Vancouver
1.Aiman Abibullayeva, Aydın Çetin. Keyword Extraction from Kazakh News Dataset with BERT. El-Cezeri Journal of Science and Engineering. 2022 Dec. 1;9(4):1193-200. doi:10.31202/ecjse.1131826

Cited By

Creative Commons License El-Cezeri is licensed to the public under a Creative Commons Attribution 4.0 license.
88x31.png