Araştırma Makalesi

Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis

Cilt: 27 Sayı: 79 23 Ocak 2025
PDF İndir
TR EN

Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis

Öz

Nowadays, it is feasible to analyze text data that is being generated at an exponential rate by transforming it into a sparse matrix of big size using a certain weighting method. A comprehensive text weighting approach consists of three fundamental components: Term Frequency, Document Frequency, and Vector Normalization. The multiplication of these three components yields numerical values that indicate the significance of a word for a text. Nevertheless, the unprocessed state of these values is unsuitable for the semantic analysis of textual material. There are multiple techniques available for this objective, and Topic Analysis, which seeks to identify subjects discussed in extensive text collections, is one of these techniques. The Non-Negative Matrix Factorization (NMF) approach is commonly employed in topic analysis. It involves transforming an input matrix into the product of two or more matrices, using both random and deterministic beginning values. This study involved conducting tests on a dataset of 20,000 articles sourced from Wikipedia, the online encyclopedia, with the aim of investigating the impact of text weighting methods and initial value approaches commonly employed in the literature on the NMF method. The number of clusters to be used in the studies was determined using an analytical procedure, which employed an upper limit. The results indicate that the “lnc” and “nnc” weighting schemes yielded the highest performance in NMF. These findings demonstrate that employing the “lnc” or “nnc” weighting scheme will lead to more favorable outcomes in the domain of topic analysis.

Anahtar Kelimeler

Kaynakça

  1. [1] “Reports & Content — Kepios.” Accessed: Aug. 25, 2023. [Online]. Available: https://kepios.com/reports.
  2. [2] Vayansky, I., Kumar, S.A.P., 2020. A review of topic modeling methods. Information Systems, Vol. 94, p. 101582. DOI: 10.1016/J.IS.2020.101582.
  3. [3] Blei, D.M., 2012. Probabilistic topic models. Communications of the ACM, Vol. 55, No. 4, pp. 77–84. DOI: 10.1145/2133806.2133826.
  4. [4] Schachtner, R., 2010. Extensions of Non-negative Matrix Factorization and Their Application to the Analysis of Wafer Test Data. PhD Thesis, Universität Regensburg, Regensburg.
  5. [5] Shen, J., Israël, G.W., 1989. A receptor model using a specific non-negative transformation technique for ambient aerosol. Atmospheric Environment, Vol. 23, No. 10, pp. 2289–2298. DOI: 10.1016/0004-6981(89)90190-X.
  6. [6] Boutsidis, C., Gallopoulos, E., 2008. SVD-based initialization: A head start for nonnegative matrix factorization. Pattern Recognition, Vol. 41, No. 4, pp. 1350–1362. DOI: 10.1016/J.PATCOG.2007.09.010.
  7. [7] Yamashita, A., Nagata, T., Yagyu, S., Asahi, T., Chikyow, T., 2022. Direct feature extraction from two-dimensional X-ray diffraction images of semiconductor thin films for fabrication analysis. Manufacturing Letters, Vol. 2, No. 1, pp. 23–37. DOI: 10.1080/27660400.2022.2029222.
  8. [8] Wang, Z., Yu, Y., 2022. Revealing the spatial and temporal distribution of different chemical states of lithium by EELS analysis using non-negative matrix factorization. Micron, Vol. 154, p. 103213. DOI: 10.1016/J.MICRON.2022.103213.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Performans Değerlendirmesi

Bölüm

Araştırma Makalesi

Erken Görünüm Tarihi

15 Ocak 2025

Yayımlanma Tarihi

23 Ocak 2025

Gönderilme Tarihi

31 Ocak 2024

Kabul Tarihi

2 Nisan 2024

Yayımlandığı Sayı

Yıl 2025 Cilt: 27 Sayı: 79

Kaynak Göster

APA
Berber, T., & Eriş Büyükkaya, M. (2025). Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, 27(79), 46-53. https://doi.org/10.21205/deufmd.2025277907
AMA
1.Berber T, Eriş Büyükkaya M. Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis. DEUFMD. 2025;27(79):46-53. doi:10.21205/deufmd.2025277907
Chicago
Berber, Tolga, ve Melek Eriş Büyükkaya. 2025. “Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27 (79): 46-53. https://doi.org/10.21205/deufmd.2025277907.
EndNote
Berber T, Eriş Büyükkaya M (01 Ocak 2025) Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27 79 46–53.
IEEE
[1]T. Berber ve M. Eriş Büyükkaya, “Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis”, DEUFMD, c. 27, sy 79, ss. 46–53, Oca. 2025, doi: 10.21205/deufmd.2025277907.
ISNAD
Berber, Tolga - Eriş Büyükkaya, Melek. “Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27/79 (01 Ocak 2025): 46-53. https://doi.org/10.21205/deufmd.2025277907.
JAMA
1.Berber T, Eriş Büyükkaya M. Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis. DEUFMD. 2025;27:46–53.
MLA
Berber, Tolga, ve Melek Eriş Büyükkaya. “Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, c. 27, sy 79, Ocak 2025, ss. 46-53, doi:10.21205/deufmd.2025277907.
Vancouver
1.Tolga Berber, Melek Eriş Büyükkaya. Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis. DEUFMD. 01 Ocak 2025;27(79):46-53. doi:10.21205/deufmd.2025277907

Bu dergi, Creative Commons Atıf-GayriTicari 4.0 Uluslararası Lisansı (CC BY-NC 4.0) altında lisanslanmıştır.

download?token=eyJhdXRoX3JvbGVzIjpbXSwiZW5kcG9pbnQiOiJmaWxlIiwicGF0aCI6IjliNTAvMDBjMi8xZmIxLzY5MjZmZDIyOGE1NzgyLjA3MzU5MTk2LnBuZyIsImV4cCI6MTc2NDE2OTE1Nywibm9uY2UiOiJhZDRmNjNlNzdhOWYwOWQ4YTNjNGVmNGIxOTFlZWViNyJ9.4Dxgc9mc-p4Tyti8NTU5pxEfGUWeuJud1fPWxu2mUy8