Araştırma Makalesi

Keyword Extraction from Kazakh News Dataset with BERT

Cilt: 9 Sayı: 4 31 Aralık 2022
PDF İndir
TR EN

Keyword Extraction from Kazakh News Dataset with BERT

Öz

Keywords provide a concise and precise description of the document's content. Due to the importance of the keyword and the difficulty of manual markup, automatic keyword extraction makes this process easy and fast. In this paper, Keyword Extraction from Kazakh News Dataset was presented. Model performance results were obtained by using the BERT base - uncased and BERT-base-multilingual-uncased pre-trained language model for the newly compiled Kazakh News Dataset-KND. Compiled Kazakh news data set consists of 7060 data. Data were collected from the web pages anatili.kazgazeta.kz, Bilimdinews.kz, and zhasalash.kz using the BeautifulSoap and Requests libraries. These web pages mostly contain news, history, and literary texts. The dataset includes the publication name or news title, the author of the publication or news subject, and the URL of the Kazakh news site. In the evaluation of the training results, it was observed that the BERT base-multilingual-uncased F-score performance was higher than the BERT model.

Anahtar Kelimeler

Kaynakça

  1. [1]. Birdevrim, S. A., Boyacı, A., Al Thani, D. A. S., “İyileştirilmiş otomatik anahtar kelime çıkarımı (BRAKE).” İstanbul Ticaret Üniversitesi Teknoloji ve Uygulamalı Bilimler Dergisi. 2018, 1(1): 11-19.
  2. [2]. Siddiqi, S., Sharan, A., “Keyword and keyphrase extraction techniques: a literature review”. International Journal of Computer Applications, 2015, 109 (2).
  3. [3]. Bekbulatov, E., Kartbayev, A., “A study of certain morphological structures of Kazakh and their impact on the machine translation quality”. In: 2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), 2014, 1-5.
  4. [4]. Myrzakhmetov, B., Kozhirbayev, Zh., “Extended language modeling experiments for kazakh.” the proceedings of 2018 International Workshop on Computational Models in Language and Speech, 2018.
  5. [5]. Nugumanova, A., Mansurova, M., “Tabigi til matinderindegi terminderdi avtomatti turde tanu” Monografiya, Oskemen, ShQMU, 2019.
  6. [6]. Raximova, D.R,. Qasimova, D.T, İsabaeva D.N., “Qazaq tiline arnalgan BERT modeli negizinde suraq-jauap juyesin zertteu jane azirleu.” Abay atındagı QazUPU-nin XABARSHISI, «Fizika-matematika gılımdarı» seriyası, 2021, 4 (76).
  7. [7]. Alzaidy, R., Caragea, C., Giles, C., “Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents.” In: The world wide web conference, 2019, 2551-255.
  8. [8]. [8]. Santosh, T.Y., Sanyal, D.K., Bhowmick, P.K., Das, P.P., “Dake: Document-level attention for keyphrase extraction.” In Proceedings of the European Conference on Information Retrieval, 2020, 392–401.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Mühendislik

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

31 Aralık 2022

Gönderilme Tarihi

16 Haziran 2022

Kabul Tarihi

7 Eylül 2022

Yayımlandığı Sayı

Yıl 2022 Cilt: 9 Sayı: 4

Kaynak Göster

APA
Abibullayeva, A., & Çetin, A. (2022). Keyword Extraction from Kazakh News Dataset with BERT. El-Cezeri, 9(4), 1193-1200. https://doi.org/10.31202/ecjse.1131826
AMA
1.Abibullayeva A, Çetin A. Keyword Extraction from Kazakh News Dataset with BERT. ECJSE. 2022;9(4):1193-1200. doi:10.31202/ecjse.1131826
Chicago
Abibullayeva, Aiman, ve Aydın Çetin. 2022. “Keyword Extraction from Kazakh News Dataset with BERT”. El-Cezeri 9 (4): 1193-1200. https://doi.org/10.31202/ecjse.1131826.
EndNote
Abibullayeva A, Çetin A (01 Aralık 2022) Keyword Extraction from Kazakh News Dataset with BERT. El-Cezeri 9 4 1193–1200.
IEEE
[1]A. Abibullayeva ve A. Çetin, “Keyword Extraction from Kazakh News Dataset with BERT”, ECJSE, c. 9, sy 4, ss. 1193–1200, Ara. 2022, doi: 10.31202/ecjse.1131826.
ISNAD
Abibullayeva, Aiman - Çetin, Aydın. “Keyword Extraction from Kazakh News Dataset with BERT”. El-Cezeri 9/4 (01 Aralık 2022): 1193-1200. https://doi.org/10.31202/ecjse.1131826.
JAMA
1.Abibullayeva A, Çetin A. Keyword Extraction from Kazakh News Dataset with BERT. ECJSE. 2022;9:1193–1200.
MLA
Abibullayeva, Aiman, ve Aydın Çetin. “Keyword Extraction from Kazakh News Dataset with BERT”. El-Cezeri, c. 9, sy 4, Aralık 2022, ss. 1193-00, doi:10.31202/ecjse.1131826.
Vancouver
1.Aiman Abibullayeva, Aydın Çetin. Keyword Extraction from Kazakh News Dataset with BERT. ECJSE. 01 Aralık 2022;9(4):1193-200. doi:10.31202/ecjse.1131826

Cited By