Term weighting plays a critical role in text classification tasks. Traditional methods, with a few exceptions, make limited or inadequate use of distributional characteristics of terms across classes. The core hypothesis in this study is that a term’s weight should be proportional to its uneven distribution across classes. Therefore, the proposed methods prioritize terms concentrated around one or a few classes rather than terms almost evenly distrusted across all classes. To implement this idea, we introduce a family of novel term weighting methods based on economic inequality metrics. These metrics are typically used to measure the unfairness of income distribution in a population, and adapt them to characterize term distributions. To quantify distributional unevenness or imbalance to assess term significance, we select one representative method from each of three major categories of inequality indices: Lorenz curve-based (Schultz), entropy-based (Theil with two variants), and social welfare-based (Atkinson). Experiments with four benchmark datasets (20NG, R8, R52, and WebKB) using two classifiers (Multinomial Naïve Bayes and Support Vector Machines) on f1-micro and f1-macro evaluation metrics have been conducted. The experimental results demonstrate that the proposed term weighting methods, particularly the method based on Schultz index, consistently demonstrate superior or highly competitive performances compared to both traditional and state-of-the-art term weighting approaches. Experimental findings confirm the validity of exploiting economic inequality principles for quantifying inter-class distributional characteristics of terms in term weighting. Thus, this work not only validates the effectives of proposed methods but also demonstrate the value of interdisciplinary work in term weighting literature.
Term weighting plays a critical role in text classification tasks. Traditional methods, with a few exceptions, make limited or inadequate use of distributional characteristics of terms across classes. The core hypothesis in this study is that a term’s weight should be proportional to its uneven distribution across classes. Therefore, the proposed methods prioritize terms concentrated around one or a few classes rather than terms almost evenly distrusted across all classes. To implement this idea, we introduce a family of novel term weighting methods based on economic inequality metrics. These metrics are typically used to measure the unfairness of income distribution in a population, and adapt them to characterize term distributions. To quantify distributional unevenness or imbalance to assess term significance, we select one representative method from each of three major categories of inequality indices: Lorenz curve-based (Schultz), entropy-based (Theil with two variants), and social welfare-based (Atkinson). Experiments with four benchmark datasets (20NG, R8, R52, and WebKB) using two classifiers (Multinomial Naïve Bayes and Support Vector Machines) on f1-micro and f1-macro evaluation metrics have been conducted. The experimental results demonstrate that the proposed term weighting methods, particularly the method based on Schultz index, consistently demonstrate superior or highly competitive performances compared to both traditional and state-of-the-art term weighting approaches. Experimental findings confirm the validity of exploiting economic inequality principles for quantifying inter-class distributional characteristics of terms in term weighting. Thus, this work not only validates the effectives of proposed methods but also demonstrate the value of interdisciplinary work in term weighting literature.
| Primary Language | English |
|---|---|
| Subjects | Supervised Learning, Classification Algorithms, Natural Language Processing |
| Journal Section | Research Article |
| Authors | |
| Submission Date | September 15, 2025 |
| Acceptance Date | March 17, 2026 |
| Publication Date | March 27, 2026 |
| DOI | https://doi.org/10.18038/estubtda.1784468 |
| IZ | https://izlik.org/JA88BD99ZT |
| Published in Issue | Year 2026 Volume: 27 Issue: 1 |