Research Article

Combination of PCA with SMOTE Oversampling for Classification of High-Dimensional Imbalanced Data

Volume: 10 Number: 3 September 17, 2021
EN TR

Combination of PCA with SMOTE Oversampling for Classification of High-Dimensional Imbalanced Data

Abstract

Imbalanced data classification is a common issue in data mining where the classifiers are skewed towards the larger data class. Classification of high-dimensional skewed (imbalanced) data is of great interest to decision-makers as it is more difficult to. The dimension reduction method, a process in which variables are reduced, allows high dimensional datasets to be interpreted more easily with a certain loss. This study, a method combining SMOTE oversampling with principal component analysis is proposed to solve the imbalance problem in high dimensional data. Three classification algorithms consisting of Logistic Regression, K-Nearest Neighbor, Decision Tree methods and two separate datasets were utilized to evaluate the suggested method's efficacy and determine the classifiers' performance. Respectively, raw datasets, converted datasets by PCA, SMOTE and SMOTE+PCA (SMOTE and PCA) methods, were analyzed with the given algorithms. Analyzes were made using WEKA. Analysis results suggest that almost all classification algorithms improve their classification performance using PCA, SOMTE, and SMOTE+PCA methods. However, the SMOTE method gave more efficient results than PCA and PCA+SMOTE methods for data rebalancing. Experimental results also suggest that K-Nearest Neighbor classifier provided higher classification performance compared to other algorithms.

Keywords

References

  1. Baran M. 2020. Maki̇ne Öğrenmesi̇ Yöntemleri̇yle Çoklu Eti̇ketli̇ Veri̇leri̇n Sınıflandırılması. Yüksek Lisans Tezi, Sivas Cumhuriyet Üniversitesi, Sosya Bilimler Enstitüsü, Sivas.
  2. Lorena A.C., Garcia L.P.F., Lehmann J., Souto M.C.P., Ho T.K. 2019. How Complex is Your Classification Problem?: A Survey on Measuring Classification Complexity. ACM Computing Surveys, 52 (5): 1–34.
  3. Tahir M.A.U.H., Asghar S., Manzoor A., Noor M.A. 2019. A Classification Model for Class Imbalance Dataset Using Genetic Programming. IEEE Access, 7: 71013-71037.
  4. Mustafa N., Li J.P., Memon E.R.A., Omer M.Z. 2017. A Classification Model for Imbalanced Medical Data based on PCA and Farther Distance based Synthetic Minority Oversampling Technique. International Journal of Advanced Computer Science and Applications, 8 (1): 61-67.
  5. Kambhatla N., Leen, T.K. 1997. Dimension Reduction by Local Principal Component Analysis. Neural Computation, 9 (7): 1493-1516.
  6. Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., Witten I.H. 2009. The WEKA Data Mining Software: An Uptade. SIGKDD Explorations, 11 (1): 10-18.
  7. Sun Y., Wong A.K.C., Kamel M.S. 2009. Classification of Imbalanced Data: A Review. International Journal of Pattern Recognition and Artificial Intelligence, 23 (4): 687-719.
  8. Basgall M.J., Hasperué W., Naiouf M., Fernández A. 2018. SMOTE-BD: An Exact and Scalable Oversampling Method for Imbalanced Classification in Big Data. Journal of Computer Science & Technology, 18 (3): 203-209.

Details

Primary Language

English

Subjects

-

Journal Section

Research Article

Publication Date

September 17, 2021

Submission Date

May 20, 2021

Acceptance Date

July 28, 2021

Published in Issue

Year 2021 Volume: 10 Number: 3

APA
Mulla, G. A. A., Demir, Y., & Hassan, M. (2021). Combination of PCA with SMOTE Oversampling for Classification of High-Dimensional Imbalanced Data. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 10(3), 858-869. https://doi.org/10.17798/bitlisfen.939733
AMA
1.Mulla GAA, Demir Y, Hassan M. Combination of PCA with SMOTE Oversampling for Classification of High-Dimensional Imbalanced Data. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi. 2021;10(3):858-869. doi:10.17798/bitlisfen.939733
Chicago
Mulla, Guhdar A. A., Yıldırım Demir, and Masoud Hassan. 2021. “Combination of PCA With SMOTE Oversampling for Classification of High-Dimensional Imbalanced Data”. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi 10 (3): 858-69. https://doi.org/10.17798/bitlisfen.939733.
EndNote
Mulla GAA, Demir Y, Hassan M (September 1, 2021) Combination of PCA with SMOTE Oversampling for Classification of High-Dimensional Imbalanced Data. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi 10 3 858–869.
IEEE
[1]G. A. A. Mulla, Y. Demir, and M. Hassan, “Combination of PCA with SMOTE Oversampling for Classification of High-Dimensional Imbalanced Data”, Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, vol. 10, no. 3, pp. 858–869, Sept. 2021, doi: 10.17798/bitlisfen.939733.
ISNAD
Mulla, Guhdar A. A. - Demir, Yıldırım - Hassan, Masoud. “Combination of PCA With SMOTE Oversampling for Classification of High-Dimensional Imbalanced Data”. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi 10/3 (September 1, 2021): 858-869. https://doi.org/10.17798/bitlisfen.939733.
JAMA
1.Mulla GAA, Demir Y, Hassan M. Combination of PCA with SMOTE Oversampling for Classification of High-Dimensional Imbalanced Data. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi. 2021;10:858–869.
MLA
Mulla, Guhdar A. A., et al. “Combination of PCA With SMOTE Oversampling for Classification of High-Dimensional Imbalanced Data”. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, vol. 10, no. 3, Sept. 2021, pp. 858-69, doi:10.17798/bitlisfen.939733.
Vancouver
1.Guhdar A. A. Mulla, Yıldırım Demir, Masoud Hassan. Combination of PCA with SMOTE Oversampling for Classification of High-Dimensional Imbalanced Data. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi. 2021 Sep. 1;10(3):858-69. doi:10.17798/bitlisfen.939733

Cited By

Bitlis Eren University

Journal of Science Editor

Bitlis Eren University Graduate Institute

Bes Minare Mah. Ahmet Eren Bulvari, Merkez Kampus, 13000 BITLIS

E-mail: fbe@beu.edu.tr