COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING

Cilt: 2 Sayı: 2 19 Ekim 2016
PDF İndir
EN

COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING

Öz

In recent years, huge increase in the number of people using Internet accompanied massive amounts of human and machine generated data recently called Big Data, where handling it efficiently is a challenging task. Along with that, valuable information that can be extracted from this data to perform data-driven decision making has attracted increased attention both from industry and academia. One of the important tasks in knowledge extraction is the classification task. However, in some of the real-world applications, dataset is either inherently skewed or collected dataset has imbalanced class distribution. Imbalance class distribution degrades the performance of several classification algorithms which generally expect balanced class distributions and assume that the cost of misclassifying an instance from both of the classes is equivalent. To tackle with this so called imbalanced learning problem, several sampling algorithms has been proposed in the literature. In this study, we compare sampling algorithms with respect to their running times and classification accuracies obtained from running classifiers trained with the sampled datasets. We find out that classification accuracies of the over-sampling methods are superior to the under-sampling methods. Sampling times are found to be similar whereas classification can be done more efficiently with under-sampling methods. Among the proposed sampling algorithms, the ADASYN method should be the preferred choice considering both execution times, increase in the data size and classification performance.

Keywords: Imbalanced Learning, Sampling Methods, Data Mining, Big Data

Kaynakça

  1. A. Asuncion and D. J. Newman. UCI Machine Learning Repository. University of California at Irvine, School of Information and Computer Science, 2007.
  2. Barua, Simul, Md Minarul Islam, Xin Yao, and Kazuyuki Murase. "MWMOTE--majority weighted minority oversampling technique for imbalanced data set learning." Knowledge and Data Engineering, IEEE Transactions on 26, no. 2 (2014): 405-425.
  3. Batista, Gustavo EAPA, Ronaldo C. Prati, and Maria Carolina Monard. "A study of the behavior of several methods for balancing machine learning training data." ACM Sigkdd Explorations Newsletter 6, no. 1 (2004): 20-29.
  4. B.X. Wang and N. Japkowicz, “Imbalanced Data Set Learning with Synthetic Samples,” Proc. IRIS Machine Learning Workshop, 2004.
  5. Chawla, Nitesh V., Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. "SMOTE: synthetic minority over-sampling technique." Journal of artificial intelligence research (2002): 321-357.
  6. Dal Pozzolo, Andrea, Olivier Caelen, Serge Waterschoot, and Gianluca Bontempi. "Racing for unbalanced methods selection." In Intelligent Data Engineering and Automated Learning–IDEAL 2013, pp. 24-31. Springer Berlin Heidelberg, 2013.
  7. Dittman, David J., Taghi M. Khoshgoftaar, Randall Wald, and Amri Napolitano. "Comparison of data sampling approaches for imbalanced bioinformatics data." In The Twenty-Seventh International Flairs Conference. 2014
  8. Fatourechi, Mehrdad, Rabab K. Ward, Steven G. Mason, Jane Huggins, A. Schlogl, and Gary E. Birch. "Comparison of evaluation metrics in classification applications with imbalanced datasets." In Machine Learning and Applications, 2008. ICMLA'08. Seventh International Conference on, pp. 777-782. IEEE, 2008.

Ayrıntılar

Birincil Dil

Türkçe

Konular

-

Bölüm

-

Yayımlanma Tarihi

19 Ekim 2016

Gönderilme Tarihi

18 Ekim 2016

Kabul Tarihi

-

Yayımlandığı Sayı

Yıl 2016 Cilt: 2 Sayı: 2

Kaynak Göster

APA
Durahim, A. O. (2016). COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING. Yönetim Bilişim Sistemleri Dergisi, 2(2), 181-191. https://izlik.org/JA99BE29ST
AMA
1.Durahim AO. COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING. Yönetim Bilişim Sistemleri Dergisi. 2016;2(2):181-191. https://izlik.org/JA99BE29ST
Chicago
Durahim, Ahmet Onur. 2016. “COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING”. Yönetim Bilişim Sistemleri Dergisi 2 (2): 181-91. https://izlik.org/JA99BE29ST.
EndNote
Durahim AO (01 Ekim 2016) COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING. Yönetim Bilişim Sistemleri Dergisi 2 2 181–191.
IEEE
[1]A. O. Durahim, “COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING”, Yönetim Bilişim Sistemleri Dergisi, c. 2, sy 2, ss. 181–191, Eki. 2016, [çevrimiçi]. Erişim adresi: https://izlik.org/JA99BE29ST
ISNAD
Durahim, Ahmet Onur. “COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING”. Yönetim Bilişim Sistemleri Dergisi 2/2 (01 Ekim 2016): 181-191. https://izlik.org/JA99BE29ST.
JAMA
1.Durahim AO. COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING. Yönetim Bilişim Sistemleri Dergisi. 2016;2:181–191.
MLA
Durahim, Ahmet Onur. “COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING”. Yönetim Bilişim Sistemleri Dergisi, c. 2, sy 2, Ekim 2016, ss. 181-9, https://izlik.org/JA99BE29ST.
Vancouver
1.Ahmet Onur Durahim. COMPARISON OF SAMPLING TECHNIQUES FOR IMBALANCED LEARNING. Yönetim Bilişim Sistemleri Dergisi [Internet]. 01 Ekim 2016;2(2):181-9. Erişim adresi: https://izlik.org/JA99BE29ST