Optimization Based Undersampling for Imbalanced Classes

Fatih Sağlam; Mervenur Sözen; Mehmet Ali Cengiz

doi:10.37094/adyujsci.884120

Araştırma Makalesi

Dengesiz Sınıflamada Optimizasyona Dayalı Azörnekleme

Yıl 2021, Cilt: 11 Sayı: 2, 385 - 409, 31.12.2021

Fatih Sağlam , Mervenur Sözen , Mehmet Ali Cengiz

https://doi.org/10.37094/adyujsci.884120

Öz

Sınıflama yöntemleri, sınıf gözlemlerinin sayısı farklı olduğunda çoğunluk sınıfını tahmin etme olasılığının yüksek olduğunu düşünür. Bu sorunu gidermek için literatürde yeniden örnekleme yöntemleri gibi bazı yöntemler bulunmaktadır. Yeniden örnekleme yöntemlerinden biri olan azörnekleme, çoğunluk sınıfından verileri silerek denge oluşturur. Bu çalışma, az örnekleme yapılırken çoğunluk sınıftan alınacak en uygun gözlemleri belirlemek için farklı optimizasyon yöntemlerini karşılaştırmayı amaçlamaktadır. İlk olarak, basit bir simülasyon çalışması yapılmış ve yeniden örneklenen veri setleri arasındaki farklılığı analiz etmek için grafikler kullanılmıştır. Daha sonra, farklı dengesiz veri setleri için farklı sınıflayıcı modelleri oluşturulmuştur. Bu modellerde rastgele azörnekleme, genetik algoritma ile azörnekleme, diferansiyel evrim algoritması ile azörnekleme, yapay arı kolonisi ile azörnekleme ve parçacık sürüsü optimizasyonu ile azörnekleme karşılaştırılmıştır. Sonuçlara sınıflandırıcılara ve veri setlerine göre değişen sıra numaraları verilmiş ve genel bir ortalama sıra elde edilmiştir. Sonuç olarak, yetersiz örnekleme yapıldığında, yapay arı kolonisinin diğer optimizasyon yöntemlerinden daha iyi performans gösterdiği görülmüştür.

Anahtar Kelimeler

Dengesiz sınıflar, Sınıflama, Azörnekleme, Optimizasyon

Kaynakça

[1] Chen, L., Bao, L., Li, J., Cai, S., Cai, C., Chen, Z., An aliasing artifacts reducing approach with random undersampling for spatiotemporally encoded single-shot MRI, Journal of Magnetic Resonance, 237, 115-124, 2013.
[2] Liu, B., Tsoumakas, G., Dealing with class imbalance in classifier chains via random undersampling, Knowledge-Based Systems, 192:105292, 2019.
[3] Tomek, I., Two modifications of CNN, IEEE Transactions on Systems, Man, and Cybernetics, SMC-6 (11), 769-772, 1976.
[4] Elhassan, T., Aljourf, M., Al-Mohanna, F., Shoukri, M., Classification of imbalance data using tomek link (t-link) combined with random under-sampling (rus) as a data reduction method, Global Journal of Technology and Optimization, S1, 2017.
[5] Pereira, R.M., Costa, Y.M., Silla Jr, C.N., MLTL: A multi-label approach for the Tomek Link undersampling algorithm, Neurocomputing, 383, 95-105, 2020.
[6] Devi, D., Purkayastha, B., Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance, Pattern Recognition Letters, 93, 3-12, 2017.
[7] Wilson, D. L., Asymptotic properties of nearest neighbor rules using edited data, IEEE Transactions on Systems, Man, and Cybernetics, 3, 408-421, 1972. https://doi.org/10.1109/TSMC.1972.4309137.
[8] Laurikkala, J., Improving identification of difficult small classes by balancing class distribution, In Conference on Artificial Intelligence in Medicine in Europe (pp. 63-66), Springer, Berlin, Heidelberg, 2001. https://doi.org/10.1007/3-540-48229-6_9.
[9] Bach, M., Werner, A., Palt, M., The Proposal of Undersampling Method for Learning from Imbalanced Datasets, Procedia Computer Science, 159, 125-134, 2019.
[10] Lu, W., Li, Z., Chu, J., Adaptive ensemble undersampling-boost: a novel learning framework for imbalanced data, Journal of Systems and Software, 132, 272-282, 2017.
[11] Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S., Clustering-based undersampling in class-imbalanced data, Information Sciences, 409, 17-26, 2017.
[12] Ofek, N., Rokach, L., Stern, R., Shabtai, A., Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem, Neurocomputing, 243, 88-102, 2017.
[13] Körzdörfer, G., Pfeuffer, J., Kluge, T., Gebhardt, M., Hensel, B., Meyer, C.H., Nittka, M., Effect of spiral undersampling patterns on FISP MRF parameter maps, Magnetic Resonance Imaging, 2019.
[14] Holland, J.H., Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence, MIT press, 1992.
[15] Eberhart, R., Kennedy, J., Particle swarm optimization, In Proceedings of the IEEE International Conference on Neural Networks 4, 1942-1948, 1995.
[16] Storn, R., Price, K., Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces, Journal of Global Optimization, 11(4), 341-359, 1997.
[17] Karaboga, D., An idea based on honey bee swarm for numerical optimization, Technical Report-tr06, Erciyes University, Engineering Faculty, Computer Engineering Department, 200, 1-10, 2005.
[18] García, S., Herrera, F., Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy, Evolutionary computation, 17(3), 275-306, 2009.
[19] Roshan, S.E., Asadi, S., Improvement of Bagging performance for classification of imbalanced datasets using evolutionary multi-objective optimization, Engineering Applications of Artificial Intelligence, 87, 103319, 2020.
[20] Yu, H., Mu, C., Sun, C., Yang, W., Yang, X., Zuo, X., Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data, Knowledge-Based Systems, 76, 67-78, 2015.
[21] Lavine, B.K., Clustering and classification of analytical data, Encyclopedia of Analytical Chemistry: Instrumentation and Applications, 2000.
[22] He, H., Garcia, E.A., Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284, 2009.
[23] Das, B., Krishnan, N.C., Cook, D.J., RACOG and wRACOG: Two probabilistic oversampling techniques, IEEE Transactions on Knowledge and Data Engineering, 27(1), 222-234, 2014.
[24] Fernàndes, E. R., de Carvalho, A.C., Evolutionary inversion of class distribution in overlapping areas for multi-class imbalanced learning, Information Sciences, 494, 141-154, 2019.
[25] Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F., Learning from imbalanced data sets, Berlin: Springer, 1-377, 2018.
[26] Chawla, N.V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W.P., SMOTE: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, 321-357, 2002.
[27] Kamalov, F., Kernel density estimation based sampling for imbalanced class distribution, Information Sciences, 2019.
[28] Maldonado, S., Weber, R., Famili, F., Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines, Information Sciences, 286, 228-246, 2014.
[29] Moayedikia, A., Ong, K.L., Boo, Y.L., Yeoh, W.G., Jensen, R., Feature selection for high dimensional imbalanced class data using harmony search, Engineering Applications of Artificial Intelligence, 57, 38-49, 2017.
[30] Wong, M.L., Seng, K., Wong, P.K., Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain, Expert Systems with Applications, 141, 112918, 2020.
[31] Dua, D., Graff, C., UCI Machine Learning Repository [http://archive.ics.uci.edu/ml], Irvine, CA: University of California, School of Information and Computer Science, 2019.

Optimization Based Undersampling for Imbalanced Classes

Yıl 2021, Cilt: 11 Sayı: 2, 385 - 409, 31.12.2021

Fatih Sağlam , Mervenur Sözen , Mehmet Ali Cengiz

https://doi.org/10.37094/adyujsci.884120

Öz

The classification methods consider the probability of predicting the majority class to be high when the number of class observations is different. To address this problem, there are some methods such as resampling methods in the literature. Undersampling, one of the resampling methods, creates balance by removing data from the majority class. This study aims to compare different optimization methods to determine the most suitable observations to be taken from the majority class while undersampling. Firstly, a simple simulation study was conducted and graphs were used to analyze the discrepancy between the resampled datasets. Then, different classifier models were constructed for different imbalanced data sets. In these models, random undersampling, undersampling with genetic algorithm, undersampling with differential evolution algorithm, undersampling with an artificial bee colony, and under-sampling with particle herd optimization were compared. The results were given rank numbers differing depending on the classifiers and data sets and a general mean rank was obtained. As a result, when undersampling, artificial bee colony was seen to perform better than other methods of optimization.

Anahtar Kelimeler

Imbalanced classes, Classification, Undersampling, Optimization

Kaynakça

[1] Chen, L., Bao, L., Li, J., Cai, S., Cai, C., Chen, Z., An aliasing artifacts reducing approach with random undersampling for spatiotemporally encoded single-shot MRI, Journal of Magnetic Resonance, 237, 115-124, 2013.
[2] Liu, B., Tsoumakas, G., Dealing with class imbalance in classifier chains via random undersampling, Knowledge-Based Systems, 192:105292, 2019.
[3] Tomek, I., Two modifications of CNN, IEEE Transactions on Systems, Man, and Cybernetics, SMC-6 (11), 769-772, 1976.
[4] Elhassan, T., Aljourf, M., Al-Mohanna, F., Shoukri, M., Classification of imbalance data using tomek link (t-link) combined with random under-sampling (rus) as a data reduction method, Global Journal of Technology and Optimization, S1, 2017.
[5] Pereira, R.M., Costa, Y.M., Silla Jr, C.N., MLTL: A multi-label approach for the Tomek Link undersampling algorithm, Neurocomputing, 383, 95-105, 2020.
[6] Devi, D., Purkayastha, B., Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance, Pattern Recognition Letters, 93, 3-12, 2017.
[7] Wilson, D. L., Asymptotic properties of nearest neighbor rules using edited data, IEEE Transactions on Systems, Man, and Cybernetics, 3, 408-421, 1972. https://doi.org/10.1109/TSMC.1972.4309137.
[8] Laurikkala, J., Improving identification of difficult small classes by balancing class distribution, In Conference on Artificial Intelligence in Medicine in Europe (pp. 63-66), Springer, Berlin, Heidelberg, 2001. https://doi.org/10.1007/3-540-48229-6_9.
[9] Bach, M., Werner, A., Palt, M., The Proposal of Undersampling Method for Learning from Imbalanced Datasets, Procedia Computer Science, 159, 125-134, 2019.
[10] Lu, W., Li, Z., Chu, J., Adaptive ensemble undersampling-boost: a novel learning framework for imbalanced data, Journal of Systems and Software, 132, 272-282, 2017.
[11] Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S., Clustering-based undersampling in class-imbalanced data, Information Sciences, 409, 17-26, 2017.
[12] Ofek, N., Rokach, L., Stern, R., Shabtai, A., Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem, Neurocomputing, 243, 88-102, 2017.
[13] Körzdörfer, G., Pfeuffer, J., Kluge, T., Gebhardt, M., Hensel, B., Meyer, C.H., Nittka, M., Effect of spiral undersampling patterns on FISP MRF parameter maps, Magnetic Resonance Imaging, 2019.
[14] Holland, J.H., Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence, MIT press, 1992.
[15] Eberhart, R., Kennedy, J., Particle swarm optimization, In Proceedings of the IEEE International Conference on Neural Networks 4, 1942-1948, 1995.
[16] Storn, R., Price, K., Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces, Journal of Global Optimization, 11(4), 341-359, 1997.
[17] Karaboga, D., An idea based on honey bee swarm for numerical optimization, Technical Report-tr06, Erciyes University, Engineering Faculty, Computer Engineering Department, 200, 1-10, 2005.
[18] García, S., Herrera, F., Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy, Evolutionary computation, 17(3), 275-306, 2009.
[19] Roshan, S.E., Asadi, S., Improvement of Bagging performance for classification of imbalanced datasets using evolutionary multi-objective optimization, Engineering Applications of Artificial Intelligence, 87, 103319, 2020.
[20] Yu, H., Mu, C., Sun, C., Yang, W., Yang, X., Zuo, X., Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data, Knowledge-Based Systems, 76, 67-78, 2015.
[21] Lavine, B.K., Clustering and classification of analytical data, Encyclopedia of Analytical Chemistry: Instrumentation and Applications, 2000.
[22] He, H., Garcia, E.A., Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284, 2009.
[23] Das, B., Krishnan, N.C., Cook, D.J., RACOG and wRACOG: Two probabilistic oversampling techniques, IEEE Transactions on Knowledge and Data Engineering, 27(1), 222-234, 2014.
[24] Fernàndes, E. R., de Carvalho, A.C., Evolutionary inversion of class distribution in overlapping areas for multi-class imbalanced learning, Information Sciences, 494, 141-154, 2019.
[25] Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F., Learning from imbalanced data sets, Berlin: Springer, 1-377, 2018.
[26] Chawla, N.V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W.P., SMOTE: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, 321-357, 2002.
[27] Kamalov, F., Kernel density estimation based sampling for imbalanced class distribution, Information Sciences, 2019.
[28] Maldonado, S., Weber, R., Famili, F., Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines, Information Sciences, 286, 228-246, 2014.
[29] Moayedikia, A., Ong, K.L., Boo, Y.L., Yeoh, W.G., Jensen, R., Feature selection for high dimensional imbalanced class data using harmony search, Engineering Applications of Artificial Intelligence, 57, 38-49, 2017.
[30] Wong, M.L., Seng, K., Wong, P.K., Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain, Expert Systems with Applications, 141, 112918, 2020.
[31] Dua, D., Graff, C., UCI Machine Learning Repository [http://archive.ics.uci.edu/ml], Irvine, CA: University of California, School of Information and Computer Science, 2019.

Toplam 31 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Uygulamalı Matematik
Bölüm	Matematik
Yazarlar	Fatih Sağlam 0000-0002-2084-2008 Mervenur Sözen 0000-0001-5603-5382 Mehmet Ali Cengiz 0000-0002-1271-2588
Yayımlanma Tarihi	31 Aralık 2021
Gönderilme Tarihi	21 Şubat 2021
Kabul Tarihi	25 Ekim 2021
Yayımlandığı Sayı	Yıl 2021 Cilt: 11 Sayı: 2

Kaynak Göster

APA	Sağlam, F., Sözen, M., & Cengiz, M. A. (2021). Optimization Based Undersampling for Imbalanced Classes. Adıyaman University Journal of Science, 11(2), 385-409. https://doi.org/10.37094/adyujsci.884120
AMA	Sağlam F, Sözen M, Cengiz MA. Optimization Based Undersampling for Imbalanced Classes. ADYU J SCI. Aralık 2021;11(2):385-409. doi:10.37094/adyujsci.884120
Chicago	Sağlam, Fatih, Mervenur Sözen, ve Mehmet Ali Cengiz. “Optimization Based Undersampling for Imbalanced Classes”. Adıyaman University Journal of Science 11, sy. 2 (Aralık 2021): 385-409. https://doi.org/10.37094/adyujsci.884120.
EndNote	Sağlam F, Sözen M, Cengiz MA (01 Aralık 2021) Optimization Based Undersampling for Imbalanced Classes. Adıyaman University Journal of Science 11 2 385–409.
IEEE	F. Sağlam, M. Sözen, ve M. A. Cengiz, “Optimization Based Undersampling for Imbalanced Classes”, ADYU J SCI, c. 11, sy. 2, ss. 385–409, 2021, doi: 10.37094/adyujsci.884120.
ISNAD	Sağlam, Fatih vd. “Optimization Based Undersampling for Imbalanced Classes”. Adıyaman University Journal of Science 11/2 (Aralık 2021), 385-409. https://doi.org/10.37094/adyujsci.884120.
JAMA	Sağlam F, Sözen M, Cengiz MA. Optimization Based Undersampling for Imbalanced Classes. ADYU J SCI. 2021;11:385–409.
MLA	Sağlam, Fatih vd. “Optimization Based Undersampling for Imbalanced Classes”. Adıyaman University Journal of Science, c. 11, sy. 2, 2021, ss. 385-09, doi:10.37094/adyujsci.884120.
Vancouver	Sağlam F, Sözen M, Cengiz MA. Optimization Based Undersampling for Imbalanced Classes. ADYU J SCI. 2021;11(2):385-409.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

...