COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING

Volume: 2 Number: 2 October 19, 2016
EN

COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING

Abstract

In recent years, huge increase in the number of people using Internet accompanied massive amounts of human and machine generated data recently called Big Data, where handling it efficiently is a challenging task. Along with that, valuable information that can be extracted from this data to perform data-driven decision making has attracted increased attention both from industry and academia. One of the important tasks in knowledge extraction is the classification task. However, in some of the real-world applications, dataset is either inherently skewed or collected dataset has imbalanced class distribution. Imbalance class distribution degrades the performance of several classification algorithms which generally expect balanced class distributions and assume that the cost of misclassifying an instance from both of the classes is equivalent. To tackle with this so called imbalanced learning problem, several sampling algorithms has been proposed in the literature. In this study, we compare sampling algorithms with respect to their running times and classification accuracies obtained from running classifiers trained with the sampled datasets. We find out that classification accuracies of the over-sampling methods are superior to the under-sampling methods. Sampling times are found to be similar whereas classification can be done more efficiently with under-sampling methods. Among the proposed sampling algorithms, the ADASYN method should be the preferred choice considering both execution times, increase in the data size and classification performance.

Keywords: Imbalanced Learning, Sampling Methods, Data Mining, Big Data

References

  1. A. Asuncion and D. J. Newman. UCI Machine Learning Repository. University of California at Irvine, School of Information and Computer Science, 2007.
  2. Barua, Simul, Md Minarul Islam, Xin Yao, and Kazuyuki Murase. "MWMOTE--majority weighted minority oversampling technique for imbalanced data set learning." Knowledge and Data Engineering, IEEE Transactions on 26, no. 2 (2014): 405-425.
  3. Batista, Gustavo EAPA, Ronaldo C. Prati, and Maria Carolina Monard. "A study of the behavior of several methods for balancing machine learning training data." ACM Sigkdd Explorations Newsletter 6, no. 1 (2004): 20-29.
  4. B.X. Wang and N. Japkowicz, “Imbalanced Data Set Learning with Synthetic Samples,” Proc. IRIS Machine Learning Workshop, 2004.
  5. Chawla, Nitesh V., Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. "SMOTE: synthetic minority over-sampling technique." Journal of artificial intelligence research (2002): 321-357.
  6. Dal Pozzolo, Andrea, Olivier Caelen, Serge Waterschoot, and Gianluca Bontempi. "Racing for unbalanced methods selection." In Intelligent Data Engineering and Automated Learning–IDEAL 2013, pp. 24-31. Springer Berlin Heidelberg, 2013.
  7. Dittman, David J., Taghi M. Khoshgoftaar, Randall Wald, and Amri Napolitano. "Comparison of data sampling approaches for imbalanced bioinformatics data." In The Twenty-Seventh International Flairs Conference. 2014
  8. Fatourechi, Mehrdad, Rabab K. Ward, Steven G. Mason, Jane Huggins, A. Schlogl, and Gary E. Birch. "Comparison of evaluation metrics in classification applications with imbalanced datasets." In Machine Learning and Applications, 2008. ICMLA'08. Seventh International Conference on, pp. 777-782. IEEE, 2008.

Details

Primary Language

Turkish

Subjects

-

Journal Section

-

Publication Date

October 19, 2016

Submission Date

October 18, 2016

Acceptance Date

-

Published in Issue

Year 2016 Volume: 2 Number: 2

APA
Durahim, A. O. (2016). COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING. Yönetim Bilişim Sistemleri Dergisi, 2(2), 181-191. https://izlik.org/JA99BE29ST
AMA
1.Durahim AO. COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING. Yönetim Bilişim Sistemleri Dergisi. 2016;2(2):181-191. https://izlik.org/JA99BE29ST
Chicago
Durahim, Ahmet Onur. 2016. “COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING”. Yönetim Bilişim Sistemleri Dergisi 2 (2): 181-91. https://izlik.org/JA99BE29ST.
EndNote
Durahim AO (October 1, 2016) COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING. Yönetim Bilişim Sistemleri Dergisi 2 2 181–191.
IEEE
[1]A. O. Durahim, “COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING”, Yönetim Bilişim Sistemleri Dergisi, vol. 2, no. 2, pp. 181–191, Oct. 2016, [Online]. Available: https://izlik.org/JA99BE29ST
ISNAD
Durahim, Ahmet Onur. “COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING”. Yönetim Bilişim Sistemleri Dergisi 2/2 (October 1, 2016): 181-191. https://izlik.org/JA99BE29ST.
JAMA
1.Durahim AO. COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING. Yönetim Bilişim Sistemleri Dergisi. 2016;2:181–191.
MLA
Durahim, Ahmet Onur. “COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING”. Yönetim Bilişim Sistemleri Dergisi, vol. 2, no. 2, Oct. 2016, pp. 181-9, https://izlik.org/JA99BE29ST.
Vancouver
1.Ahmet Onur Durahim. COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING. Yönetim Bilişim Sistemleri Dergisi [Internet]. 2016 Oct. 1;2(2):181-9. Available from: https://izlik.org/JA99BE29ST