<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.4 20241031//EN"
        "https://jats.nlm.nih.gov/publishing/1.4/JATS-journalpublishing1-4.dtd">
<article  article-type="research-article"        dtd-version="1.4">
            <front>

                <journal-meta>
                                                                <journal-id>estuscience - se</journal-id>
            <journal-title-group>
                                                                                    <journal-title>Eskişehir Technical University Journal of Science and Technology A - Applied Sciences and Engineering</journal-title>
            </journal-title-group>
                            <issn pub-type="ppub">2667-4211</issn>
                                                                                                        <publisher>
                    <publisher-name>Eskisehir Technical University</publisher-name>
                </publisher>
                    </journal-meta>
                <article-meta>
                                        <article-id pub-id-type="doi">10.18038/estubtda.1784468</article-id>
                                                                <article-categories>
                                            <subj-group  xml:lang="en">
                                                            <subject>Supervised Learning</subject>
                                                            <subject>Classification Algorithms</subject>
                                                            <subject>Natural Language Processing</subject>
                                                    </subj-group>
                                            <subj-group  xml:lang="tr">
                                                            <subject>Denetimli Öğrenme</subject>
                                                            <subject>Sınıflandırma algoritmaları</subject>
                                                            <subject>Doğal Dil İşleme</subject>
                                                    </subj-group>
                                    </article-categories>
                                                                                                                                                        <title-group>
                                                                                                                        <article-title>NOVEL TERM WEIGHTING METHODS FOR TEXT CLASSIFICATION BASED ON ECONOMIC INEQUALITY METRICS</article-title>
                                                                                                                                                                                                <trans-title-group xml:lang="tr">
                                    <trans-title>NOVEL TERM WEIGHTING METHODS FOR TEXT CLASSIFICATION BASED ON ECONOMIC INEQUALITY METRICS</trans-title>
                                </trans-title-group>
                                                                                                    </title-group>
            
                                                    <contrib-group content-type="authors">
                                                                        <contrib contrib-type="author">
                                                                    <contrib-id contrib-id-type="orcid">
                                        https://orcid.org/0000-0002-2137-5253</contrib-id>
                                                                <name>
                                    <surname>Okkalıoğlu</surname>
                                    <given-names>Murat</given-names>
                                </name>
                                                                    <aff>YALOVA ÜNİVERSİTESİ</aff>
                                                            </contrib>
                                                                                </contrib-group>
                        
                                        <pub-date pub-type="pub" iso-8601-date="20260327">
                    <day>03</day>
                    <month>27</month>
                    <year>2026</year>
                </pub-date>
                                        <volume>27</volume>
                                        <issue>1</issue>
                                        <fpage>125</fpage>
                                        <lpage>148</lpage>
                        
                        <history>
                                    <date date-type="received" iso-8601-date="20250915">
                        <day>09</day>
                        <month>15</month>
                        <year>2025</year>
                    </date>
                                                    <date date-type="accepted" iso-8601-date="20260317">
                        <day>03</day>
                        <month>17</month>
                        <year>2026</year>
                    </date>
                            </history>
                                        <permissions>
                    <copyright-statement>Copyright © 2000, Eskişehir Technical University Journal of Science and Technology A - Applied Sciences and Engineering</copyright-statement>
                    <copyright-year>2000</copyright-year>
                    <copyright-holder>Eskişehir Technical University Journal of Science and Technology A - Applied Sciences and Engineering</copyright-holder>
                </permissions>
            
                                                                                                <abstract><p>Term weighting plays a critical role in text classification tasks. Traditional methods, with a few exceptions, make limited or inadequate use of distributional characteristics of terms across classes. The core hypothesis in this study is that a term’s weight should be proportional to its uneven distribution across classes. Therefore, the proposed methods prioritize terms concentrated around one or a few classes rather than terms almost evenly distrusted across all classes. To implement this idea, we introduce a family of novel term weighting methods based on economic inequality metrics. These metrics are typically used to measure the unfairness of income distribution in a population, and adapt them to characterize term distributions. To quantify distributional unevenness or imbalance to assess term significance, we select one representative method from each of three major categories of inequality indices: Lorenz curve-based (Schultz), entropy-based (Theil with two variants), and social welfare-based (Atkinson). Experiments with four benchmark datasets (20NG, R8, R52, and WebKB) using two classifiers (Multinomial Naïve Bayes and Support Vector Machines) on f1-micro and f1-macro evaluation metrics have been conducted. The experimental results demonstrate that the proposed term weighting methods, particularly the method based on Schultz index, consistently demonstrate superior or highly competitive performances compared to both traditional and state-of-the-art term weighting approaches. Experimental findings confirm the validity of exploiting economic inequality principles for quantifying inter-class distributional characteristics of terms in term weighting. Thus, this work not only validates the effectives of proposed methods but also demonstrate the value of interdisciplinary work in term weighting literature.</p></abstract>
                                                                                                                                    <trans-abstract xml:lang="tr">
                            <p>Term weighting plays a critical role in text classification tasks. Traditional methods, with a few exceptions, make limited or inadequate use of distributional characteristics of terms across classes. The core hypothesis in this study is that a term’s weight should be proportional to its uneven distribution across classes. Therefore, the proposed methods prioritize terms concentrated around one or a few classes rather than terms almost evenly distrusted across all classes. To implement this idea, we introduce a family of novel term weighting methods based on economic inequality metrics. These metrics are typically used to measure the unfairness of income distribution in a population, and adapt them to characterize term distributions. To quantify distributional unevenness or imbalance to assess term significance, we select one representative method from each of three major categories of inequality indices: Lorenz curve-based (Schultz), entropy-based (Theil with two variants), and social welfare-based (Atkinson). Experiments with four benchmark datasets (20NG, R8, R52, and WebKB) using two classifiers (Multinomial Naïve Bayes and Support Vector Machines) on f1-micro and f1-macro evaluation metrics have been conducted. The experimental results demonstrate that the proposed term weighting methods, particularly the method based on Schultz index, consistently demonstrate superior or highly competitive performances compared to both traditional and state-of-the-art term weighting approaches. Experimental findings confirm the validity of exploiting economic inequality principles for quantifying inter-class distributional characteristics of terms in term weighting. Thus, this work not only validates the effectives of proposed methods but also demonstrate the value of interdisciplinary work in term weighting literature.</p></trans-abstract>
                                                            
            
                                                            <kwd-group>
                                                    <kwd>Economic inequality</kwd>
                                                    <kwd>  Class distribution</kwd>
                                                    <kwd>  Term weighting</kwd>
                                                    <kwd>  Text classification</kwd>
                                            </kwd-group>
                                                        
                                                                            <kwd-group xml:lang="tr">
                                                    <kwd>Economic inequality</kwd>
                                                    <kwd>  Class distribution</kwd>
                                                    <kwd>  Term weighting</kwd>
                                                    <kwd>  Text classification</kwd>
                                            </kwd-group>
                                                                                                            </article-meta>
    </front>
    <back>
                            <ref-list>
                                    <ref id="ref1">
                        <label>1</label>
                        <mixed-citation publication-type="journal">[1]	ITU. Measuring digital development: Facts and Figures 2024. International Telecommunication Union, 2024.</mixed-citation>
                    </ref>
                                    <ref id="ref2">
                        <label>2</label>
                        <mixed-citation publication-type="journal">[2]	Mao Y, Liu Q, Zhang Y. Sentiment analysis methods, applications, and challenges: A systematic literature review. J King Saud Univ Comput Inf Sci 2024; 36(4).</mixed-citation>
                    </ref>
                                    <ref id="ref3">
                        <label>3</label>
                        <mixed-citation publication-type="journal">[3]	Ahmed N, Amin R, Aldabbas H, Koundal D, Alouffi B, Shah T. Machine learning techniques for spam detection in email and iot platforms: analysis and research challenges. Secur Commun Netw 2022.</mixed-citation>
                    </ref>
                                    <ref id="ref4">
                        <label>4</label>
                        <mixed-citation publication-type="journal">[4]	Wu H, Zhang Z, Wu W. Exploring syntactic and semantic features for authorship attribution. App Soft Comput 2021; 111.</mixed-citation>
                    </ref>
                                    <ref id="ref5">
                        <label>5</label>
                        <mixed-citation publication-type="journal">[5]	Alnabhan MQ, Branco P. Fake news detection using deep learning: a systematic literature review. IEEE Access 2024; 12: 114435-114459.</mixed-citation>
                    </ref>
                                    <ref id="ref6">
                        <label>6</label>
                        <mixed-citation publication-type="journal">[6]	Sun G, Cheng Y, Zhang Z, Tong X, Chai T. Text classification with improved word embedding and adaptive segmentation. Expert Syst Appl 2024; 238.</mixed-citation>
                    </ref>
                                    <ref id="ref7">
                        <label>7</label>
                        <mixed-citation publication-type="journal">[7]	Schutz RR. On the measurement of income inequality. Am Econ Rev 1951; 41(1): 107-122.</mixed-citation>
                    </ref>
                                    <ref id="ref8">
                        <label>8</label>
                        <mixed-citation publication-type="journal">[8]	Hoover EM. The measurement of industrial localization. Rev Econ Stat 1936; 18(4): 162-171.</mixed-citation>
                    </ref>
                                    <ref id="ref9">
                        <label>9</label>
                        <mixed-citation publication-type="journal">[9]	Theil H. Economics and Information Theory. Amsterdam: North-Holland, 1967.</mixed-citation>
                    </ref>
                                    <ref id="ref10">
                        <label>10</label>
                        <mixed-citation publication-type="journal">[10]	Atkinson AB. On the measurement of inequality. J Econ Theory 1970; 2(3): 244–263.</mixed-citation>
                    </ref>
                                    <ref id="ref11">
                        <label>11</label>
                        <mixed-citation publication-type="journal">[11]	Luhn HP. The automatic creation of literature abstracts. IBM J Res Dev 1958; 2(2): 159-165.</mixed-citation>
                    </ref>
                                    <ref id="ref12">
                        <label>12</label>
                        <mixed-citation publication-type="journal">[12]	Jones SK. A statistical interpretation of term specificity and its application in retrieval. J Doc 1972; 28(1): 11-21.</mixed-citation>
                    </ref>
                                    <ref id="ref13">
                        <label>13</label>
                        <mixed-citation publication-type="journal">[13]	Salton G, Wong A, Yang CS. A vector space model for automatic indexing. Commun ACM 1975; 18(11): 613-620.</mixed-citation>
                    </ref>
                                    <ref id="ref14">
                        <label>14</label>
                        <mixed-citation publication-type="journal">[14]	Robertson S. Understanding inverse document frequency: on theoretical arguments for IDF. J Doc 2004; 60(5): 503-520.</mixed-citation>
                    </ref>
                                    <ref id="ref15">
                        <label>15</label>
                        <mixed-citation publication-type="journal">[15]	Tokunaga T, Makoto I. Text categorization based on weighted inverse document frequency. Tokyo: Dept of Computer Science, Tokyo Institute of Technology; Technical Report 94-TR00001, 1994.</mixed-citation>
                    </ref>
                                    <ref id="ref16">
                        <label>16</label>
                        <mixed-citation publication-type="journal">[16]	Debole F, Sebastiani F. Supervised term weighting for automated text categorization. In: 2003 ACM Symposium on Applied Computing; 9-12 March 2003; Melbourne, FL, USA. New York, NY, USA: ACM. pp. 784–788.</mixed-citation>
                    </ref>
                                    <ref id="ref17">
                        <label>17</label>
                        <mixed-citation publication-type="journal">[17]	Lan M, Tan CL, Su J, Lu Y. Supervised and traditional term weighting methods for automatic text categorization. IEEE T Pattern Anal 2009; 31(4).</mixed-citation>
                    </ref>
                                    <ref id="ref18">
                        <label>18</label>
                        <mixed-citation publication-type="journal">[18]	Parlak B, Uysal AK. The effects of globalisation techniques on feature selection for text classification. J Inf Sci 2021; 47(6): 727-739.</mixed-citation>
                    </ref>
                                    <ref id="ref19">
                        <label>19</label>
                        <mixed-citation publication-type="journal">[19]	Parlak B. A novel feature and class-based globalization technique for text classification. Multimed Tools Appl 2023; 82(24).</mixed-citation>
                    </ref>
                                    <ref id="ref20">
                        <label>20</label>
                        <mixed-citation publication-type="journal">[20]	Liu Y, Loh HT, Sun A. Imbalanced text classification: A term weighting approach. Expert Syst Appl 2009; 36(1): 690–701.</mixed-citation>
                    </ref>
                                    <ref id="ref21">
                        <label>21</label>
                        <mixed-citation publication-type="journal">[21]	Wang D, Zhang H. Inverse-Category-Frequency based supervised term weighting scheme for text categorization. J Inf Sci Eng 2010; 29(2): 209–225.</mixed-citation>
                    </ref>
                                    <ref id="ref22">
                        <label>22</label>
                        <mixed-citation publication-type="journal">[22]	Ren F, Sohrab MG. Class-indexing-based term weighting for automatic text classification. Inform Sciences 2013; 236: 109–125.</mixed-citation>
                    </ref>
                                    <ref id="ref23">
                        <label>23</label>
                        <mixed-citation publication-type="journal">[23]	Chen K, Zhang Z, Long J, Zhang H. Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Syst Appl 2016; 66: 245–260.</mixed-citation>
                    </ref>
                                    <ref id="ref24">
                        <label>24</label>
                        <mixed-citation publication-type="journal">[24]	Dogan T, Uysal AK. Improved inverse gravity moment term weighting for text classification. Expert Syst Appl 2019; 130: 45–59.</mixed-citation>
                    </ref>
                                    <ref id="ref25">
                        <label>25</label>
                        <mixed-citation publication-type="journal">[25]	Okkalioglu M. TF-IGM revisited: imbalanced text classification with relative imbalance ratio. Expert Syst Appl 2023; 217.</mixed-citation>
                    </ref>
                                    <ref id="ref26">
                        <label>26</label>
                        <mixed-citation publication-type="journal">[26] 	Ko Y. A new term-weighting scheme for text classification using the odds of positive and negative class probabilities. J Assoc Inf Sci Tech 2015; 66 (12): 2397-2722.</mixed-citation>
                    </ref>
                                    <ref id="ref27">
                        <label>27</label>
                        <mixed-citation publication-type="journal">[27]	Allison PD. Measures of inequality. Am Sociol Rev 1978; 43(6): 865-880.</mixed-citation>
                    </ref>
                                    <ref id="ref28">
                        <label>28</label>
                        <mixed-citation publication-type="journal">[28]	Atkinson BA, Bourguignon F. Handbook of Income Distribution Volume 2B. 1st ed. Amsterdam, Holland: North-Holland, 2015.</mixed-citation>
                    </ref>
                                    <ref id="ref29">
                        <label>29</label>
                        <mixed-citation publication-type="journal">[29]	Idrees M, Ahmad E. Measurement of income inequality: A survey. Forman Journal of Economic Studies 2017; 13: 1-32.</mixed-citation>
                    </ref>
                                    <ref id="ref30">
                        <label>30</label>
                        <mixed-citation publication-type="journal">[30]	Lorenz MO. Methods of measuring concentration of wealth. Publ Am Stat Assoc 1905; 9(70): 209-219.</mixed-citation>
                    </ref>
                                    <ref id="ref31">
                        <label>31</label>
                        <mixed-citation publication-type="journal">[31]	Felman J. Income inequality measures. Theoretical Ecomomic Letters 2018; 8: 557-574.</mixed-citation>
                    </ref>
                                    <ref id="ref32">
                        <label>32</label>
                        <mixed-citation publication-type="journal">[32]	Okkalioglu M. A novel redistribution-based feature selection for text classification. Expert Syst Appl 2024; 246.</mixed-citation>
                    </ref>
                                    <ref id="ref33">
                        <label>33</label>
                        <mixed-citation publication-type="journal">[33]	Wang T, Cai Y, Leung Hf, Raymond YK, Haoran X, Qing L. On entropy-based term weighting schemes for text categorization. Knowl Inf Syst 2021; 63: 2313–2346.</mixed-citation>
                    </ref>
                                    <ref id="ref34">
                        <label>34</label>
                        <mixed-citation publication-type="journal">[34]	Kullback S, Leibler RA. On information and sufficiency. Ann Math Statist 1951; 22(1): 79-86.</mixed-citation>
                    </ref>
                                    <ref id="ref35">
                        <label>35</label>
                        <mixed-citation publication-type="journal">[35]	Cardoso-Cachopo A. Improving methods for single-label text categorization. PhD Thesis, Instituto Superior Tecnico, Universidade Tecnica de Lisboa, Lisbon, Portugal 2007.</mixed-citation>
                    </ref>
                                    <ref id="ref36">
                        <label>36</label>
                        <mixed-citation publication-type="journal">[36]	Craven M, DiPasquo D, Freitag D, McCallum A, Mitchell T, Nigam K, Slattery S. In: AAAI &#039;98/IAAI &#039;98: Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence; Madison, Wisconsin, USA: American Association for Artificial Intelligence. pp. 509-516.</mixed-citation>
                    </ref>
                                    <ref id="ref37">
                        <label>37</label>
                        <mixed-citation publication-type="journal">[37]	Forman G. An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 2003; 3: 1289-1305.</mixed-citation>
                    </ref>
                                    <ref id="ref38">
                        <label>38</label>
                        <mixed-citation publication-type="journal">[38]	Parlak B, Uysal AK. A novel filter feature selection method for text classification: Extensive Feature Selector. J Inf Sci 2023; 49(1): 59-78.</mixed-citation>
                    </ref>
                                    <ref id="ref39">
                        <label>39</label>
                        <mixed-citation publication-type="journal">[39]	Parlak B. The effects of preprocessing on Turkish and English News Data. Sakarya University Journal of Computer and Information Sciences 2023; 6(1): 59-66.</mixed-citation>
                    </ref>
                                    <ref id="ref40">
                        <label>40</label>
                        <mixed-citation publication-type="journal">[40]	Demsar J. Statistical comparison of classifiers over multiple data sets. J Mach Learn Res 2006; 7: 1-30.</mixed-citation>
                    </ref>
                                    <ref id="ref41">
                        <label>41</label>
                        <mixed-citation publication-type="journal">[41]	Iman RL, Davenport JM. Approximations of the critical region of the Friedman statistics. Commun Stat 1980; 571-595.</mixed-citation>
                    </ref>
                                    <ref id="ref42">
                        <label>42</label>
                        <mixed-citation publication-type="journal">[42]	Nemeyni PB. Distribution-free multiple comparisons. PhD Thesis, Princeton University, 1963.</mixed-citation>
                    </ref>
                                    <ref id="ref43">
                        <label>43</label>
                        <mixed-citation publication-type="journal">[43]	Dun OJ. Multiple comparisons among means. J Am Stat Assoc 1961; 56: 52-64.</mixed-citation>
                    </ref>
                                    <ref id="ref44">
                        <label>44</label>
                        <mixed-citation publication-type="journal">[44]	Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat 1979; 6: 65-70.</mixed-citation>
                    </ref>
                            </ref-list>
                    </back>
    </article>
