TR
EN
Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis
Abstract
Nowadays, it is feasible to analyze text data that is being generated at an exponential rate by transforming it into a sparse matrix of big size using a certain weighting method. A comprehensive text weighting approach consists of three fundamental components: Term Frequency, Document Frequency, and Vector Normalization. The multiplication of these three components yields numerical values that indicate the significance of a word for a text. Nevertheless, the unprocessed state of these values is unsuitable for the semantic analysis of textual material. There are multiple techniques available for this objective, and Topic Analysis, which seeks to identify subjects discussed in extensive text collections, is one of these techniques. The Non-Negative Matrix Factorization (NMF) approach is commonly employed in topic analysis. It involves transforming an input matrix into the product of two or more matrices, using both random and deterministic beginning values. This study involved conducting tests on a dataset of 20,000 articles sourced from Wikipedia, the online encyclopedia, with the aim of investigating the impact of text weighting methods and initial value approaches commonly employed in the literature on the NMF method. The number of clusters to be used in the studies was determined using an analytical procedure, which employed an upper limit. The results indicate that the “lnc” and “nnc” weighting schemes yielded the highest performance in NMF. These findings demonstrate that employing the “lnc” or “nnc” weighting scheme will lead to more favorable outcomes in the domain of topic analysis.
Keywords
References
- [1] “Reports & Content — Kepios.” Accessed: Aug. 25, 2023. [Online]. Available: https://kepios.com/reports.
- [2] Vayansky, I., Kumar, S.A.P., 2020. A review of topic modeling methods. Information Systems, Vol. 94, p. 101582. DOI: 10.1016/J.IS.2020.101582.
- [3] Blei, D.M., 2012. Probabilistic topic models. Communications of the ACM, Vol. 55, No. 4, pp. 77–84. DOI: 10.1145/2133806.2133826.
- [4] Schachtner, R., 2010. Extensions of Non-negative Matrix Factorization and Their Application to the Analysis of Wafer Test Data. PhD Thesis, Universität Regensburg, Regensburg.
- [5] Shen, J., Israël, G.W., 1989. A receptor model using a specific non-negative transformation technique for ambient aerosol. Atmospheric Environment, Vol. 23, No. 10, pp. 2289–2298. DOI: 10.1016/0004-6981(89)90190-X.
- [6] Boutsidis, C., Gallopoulos, E., 2008. SVD-based initialization: A head start for nonnegative matrix factorization. Pattern Recognition, Vol. 41, No. 4, pp. 1350–1362. DOI: 10.1016/J.PATCOG.2007.09.010.
- [7] Yamashita, A., Nagata, T., Yagyu, S., Asahi, T., Chikyow, T., 2022. Direct feature extraction from two-dimensional X-ray diffraction images of semiconductor thin films for fabrication analysis. Manufacturing Letters, Vol. 2, No. 1, pp. 23–37. DOI: 10.1080/27660400.2022.2029222.
- [8] Wang, Z., Yu, Y., 2022. Revealing the spatial and temporal distribution of different chemical states of lithium by EELS analysis using non-negative matrix factorization. Micron, Vol. 154, p. 103213. DOI: 10.1016/J.MICRON.2022.103213.
Details
Primary Language
English
Subjects
Performance Evaluation
Journal Section
Research Article
Early Pub Date
January 15, 2025
Publication Date
January 23, 2025
Submission Date
January 31, 2024
Acceptance Date
April 2, 2024
Published in Issue
Year 2025 Volume: 27 Number: 79
APA
Berber, T., & Eriş Büyükkaya, M. (2025). Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi, 27(79), 46-53. https://doi.org/10.21205/deufmd.2025277907
AMA
1.Berber T, Eriş Büyükkaya M. Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis. DEUFMD. 2025;27(79):46-53. doi:10.21205/deufmd.2025277907
Chicago
Berber, Tolga, and Melek Eriş Büyükkaya. 2025. “Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi 27 (79): 46-53. https://doi.org/10.21205/deufmd.2025277907.
EndNote
Berber T, Eriş Büyükkaya M (January 1, 2025) Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27 79 46–53.
IEEE
[1]T. Berber and M. Eriş Büyükkaya, “Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis”, DEUFMD, vol. 27, no. 79, pp. 46–53, Jan. 2025, doi: 10.21205/deufmd.2025277907.
ISNAD
Berber, Tolga - Eriş Büyükkaya, Melek. “Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27/79 (January 1, 2025): 46-53. https://doi.org/10.21205/deufmd.2025277907.
JAMA
1.Berber T, Eriş Büyükkaya M. Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis. DEUFMD. 2025;27:46–53.
MLA
Berber, Tolga, and Melek Eriş Büyükkaya. “Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi, vol. 27, no. 79, Jan. 2025, pp. 46-53, doi:10.21205/deufmd.2025277907.
Vancouver
1.Tolga Berber, Melek Eriş Büyükkaya. Performance Comparison of Text Weighting Schemas on NMF-Based Topic Analysis. DEUFMD. 2025 Jan. 1;27(79):46-53. doi:10.21205/deufmd.2025277907