Improving classification performance for an imbalanced educational dataset example using SMOTE

Yavuz Ünal; Ahmet Sağlam; Osman Kayhan

doi:10.31590/ejosat.638608

EN TR

Improving classification performance for an imbalanced educational dataset example using SMOTE

Öz

With technology, a lot of data is formed in digital environments. One of the areas with intensive data is educational data sets. By analyzing educational data sets, students' situatiokjgjjööÖns can be predicted by foreseeing. In this way, students can be assisted by anticipating situations such as drop-out due to failure. Educational institutions can take measures to prevent such dropouts and reduce student drop-out. Thus, financial losses of students and educational institutions can be prevented. In this study, the data of five separate associate degree students who were enrolled in Amasya University Distance Education Center in 2016-2017 were used. These are associate degree programs in child development, medical documentation and secretarial, electricity, mechatronics, and internet and network technologies. It was estimated whether the students could graduate or not at the end of the IV. Semester with looking at their I. and II. semester course notes. These data were analyzed by k nearest neighbor (K-NN) and KStar algorithms. Some of the data were obtained from the distance education center as imbalanced data due to the low number of students. In Educational Data Mining, researchers usually overlook the balance of the distribution on a dataset. Unbalanced data can seriously affect the success of classification. Synthetic minority oversampling technique (SMOTE) method was applied to these unbalanced data and how it affected the success of classification was examined. First, the raw data were analyzed with K-nearest neighbors classifier and KStar classifier. In this study, the analysis results of these five chapters are given in tables and comparatively. In this study, it has been seen that SMOTE oversampling method increase the classification success. In areas where unstable data such as educational data mining may exist, higher classification accuracy can be achieved with the help of different oversampling methods.

Anahtar Kelimeler

Improving classification performance for an imbalanced educational dataset example using SMOTE

Öz

With technology, a lot of data is formed in digital environments. One of the areas with intensive data is educational data sets. By analyzing educational data sets, students' situatiokjgjjööÖns can be predicted by foreseeing. In this way, students can be assisted by anticipating situations such as drop-out due to failure. Educational institutions can take measures to prevent such dropouts and reduce student drop-out. Thus, financial losses of students and educational institutions can be prevented. In this study, the data of five separate associate degree students who were enrolled in Amasya University Distance Education Center in 2016-2017 were used. These are associate degree programs in child development, medical documentation and secretarial, electricity, mechatronics, and internet and network technologies. It was estimated whether the students could graduate or not at the end of the IV. Semester with looking at their I. and II. semester course notes. These data were analyzed by k nearest neighbor (K-NN) and KStar algorithms. Some of the data were obtained from the distance education center as imbalanced data due to the low number of students. In Educational Data Mining, researchers usually overlook the balance of the distribution on a dataset. Unbalanced data can seriously affect the success of classification. Synthetic minority oversampling technique (SMOTE) method was applied to these unbalanced data and how it affected the success of classification was examined. First, the raw data were analyzed with K-nearest neighbors classifier and KStar classifier. In this study, the analysis results of these five chapters are given in tables and comparatively. In this study, it has been seen that SMOTE oversampling method increase the classification success. In areas where unstable data such as educational data mining may exist, higher classification accuracy can be achieved with the help of different oversampling methods.

Anahtar Kelimeler

Kaynakça

Aydemir, E. (2019). Ders Geçme Notlarının Veri Madenciliği Yöntemleriyle Tahmin Edilmesi. Avrupa Bilim ve Teknoloji Dergisi, (15), 70-76.
Bunkhumpornpat, C., Sinapiromsaran, K., & Lursinsap, C. (2009). Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Pacific-Asia conference on knowledge discovery and data mining, Berlin, Germany.
Çölkesen, İ., & Kavzoğlu, T. (2011).Örnek tabanlı k-star algoritması ile uzaktan algılanmış görüntülerin sınıflandırılması. UFUAB VI.Teknik Sempozyumu, Belek, Antalya.
Ge, Y., Yue, D., & Chen, L. (2017). Prediction of wind turbine blades icing based on MBK-SMOTE and random forest in imbalanced data set. IEEE Conference on Energy Internet and Energy System Integration (EI2), Changsha, China.
Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232), 1012-1014.
Güldal H., Çakıcı, Y. (2017). Eğitsel Veri Madenciliği. 12th International Balkan Education and Science Congress, Nessebar, Bulgaria.
Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. International conference on intelligent computing, Berlin, Germany.
Kalıpsız, O., & Cihan, P. (2015). Öğrenci Proje Anketlerini Sınıflandırmada En İyi Algoritmanın Belirlenmesi. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 8(1), 41-49.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Mühendislik

Bölüm

Araştırma Makalesi

Yazarlar

Yavuz Ünal ^*

Ahmet Sağlam Bu kişi benim
0000-0002-2616-8253

Osman Kayhan

Yayımlanma Tarihi

31 Ekim 2019

Gönderilme Tarihi

1 Ağustos 2019

Kabul Tarihi

26 Ekim 2019

Yayımlandığı Sayı

Yıl 2019

DOI

https://doi.org/10.31590/ejosat.638608

IZ

https://izlik.org/JA55BG56XW

Kaynak Göster

RIS / Bibtex

APA

Ünal, Y., Sağlam, A., & Kayhan, O. (2019). Improving classification performance for an imbalanced educational dataset example using SMOTE. Avrupa Bilim ve Teknoloji Dergisi, 485-489. https://doi.org/10.31590/ejosat.638608

AMA

1.Ünal Y, Sağlam A, Kayhan O. Improving classification performance for an imbalanced educational dataset example using SMOTE. EJOSAT. Published online 01 Ekim 2019:485-489. doi:10.31590/ejosat.638608

Chicago

Ünal, Yavuz, Ahmet Sağlam, ve Osman Kayhan. 2019. “Improving classification performance for an imbalanced educational dataset example using SMOTE”. Avrupa Bilim ve Teknoloji Dergisi, Ekim 1, 485-89. https://doi.org/10.31590/ejosat.638608.

EndNote

Ünal Y, Sağlam A, Kayhan O (01 Ekim 2019) Improving classification performance for an imbalanced educational dataset example using SMOTE. Avrupa Bilim ve Teknoloji Dergisi 485–489.

IEEE

[1]Y. Ünal, A. Sağlam, ve O. Kayhan, “Improving classification performance for an imbalanced educational dataset example using SMOTE”, EJOSAT, ss. 485–489, Eki. 2019, doi: 10.31590/ejosat.638608.

ISNAD

Ünal, Yavuz - Sağlam, Ahmet - Kayhan, Osman. “Improving classification performance for an imbalanced educational dataset example using SMOTE”. Avrupa Bilim ve Teknoloji Dergisi. 01 Ekim 2019. 485-489. https://doi.org/10.31590/ejosat.638608.

JAMA

1.Ünal Y, Sağlam A, Kayhan O. Improving classification performance for an imbalanced educational dataset example using SMOTE. EJOSAT. 2019;:485–489.

MLA

Ünal, Yavuz, vd. “Improving classification performance for an imbalanced educational dataset example using SMOTE”. Avrupa Bilim ve Teknoloji Dergisi, Ekim 2019, ss. 485-9, doi:10.31590/ejosat.638608.

Vancouver

1.Yavuz Ünal, Ahmet Sağlam, Osman Kayhan. Improving classification performance for an imbalanced educational dataset example using SMOTE. EJOSAT. 01 Ekim 2019;485-9. doi:10.31590/ejosat.638608

Cited By

Usage of Weka Software Based On Machine Learning Algorithms for Prediction of Liver Fibrosis/Cirrhosis

Black Sea Journal of Engineering and Science

https://doi.org/10.34248/bsengineering.1351863

MULTICLASS CLASSIFICATION OF MARKETPLACE PRODUCTS WITH MACHINE LEARNING

MEDIA STATISTIKA

https://doi.org/10.14710/medstat.17.1.25-35