Improving classification performance for an imbalanced educational dataset example using SMOTE
Öz
With technology, a lot of data is formed in digital environments. One of the areas with intensive data is educational data sets. By analyzing educational data sets, students' situatiokjgjjööÖns can be predicted by foreseeing. In this way, students can be assisted by anticipating situations such as drop-out due to failure. Educational institutions can take measures to prevent such dropouts and reduce student drop-out. Thus, financial losses of students and educational institutions can be prevented. In this study, the data of five separate associate degree students who were enrolled in Amasya University Distance Education Center in 2016-2017 were used. These are associate degree programs in child development, medical documentation and secretarial, electricity, mechatronics, and internet and network technologies. It was estimated whether the students could graduate or not at the end of the IV. Semester with looking at their I. and II. semester course notes. These data were analyzed by k nearest neighbor (K-NN) and KStar algorithms. Some of the data were obtained from the distance education center as imbalanced data due to the low number of students. In Educational Data Mining, researchers usually overlook the balance of the distribution on a dataset. Unbalanced data can seriously affect the success of classification. Synthetic minority oversampling technique (SMOTE) method was applied to these unbalanced data and how it affected the success of classification was examined. First, the raw data were analyzed with K-nearest neighbors classifier and KStar classifier. In this study, the analysis results of these five chapters are given in tables and comparatively. In this study, it has been seen that SMOTE oversampling method increase the classification success. In areas where unstable data such as educational data mining may exist, higher classification accuracy can be achieved with the help of different oversampling methods.
Anahtar Kelimeler
Kaynakça
- Aydemir, E. (2019). Ders Geçme Notlarının Veri Madenciliği Yöntemleriyle Tahmin Edilmesi. Avrupa Bilim ve Teknoloji Dergisi, (15), 70-76.
- Bunkhumpornpat, C., Sinapiromsaran, K., & Lursinsap, C. (2009). Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Pacific-Asia conference on knowledge discovery and data mining, Berlin, Germany.
- Çölkesen, İ., & Kavzoğlu, T. (2011).Örnek tabanlı k-star algoritması ile uzaktan algılanmış görüntülerin sınıflandırılması. UFUAB VI.Teknik Sempozyumu, Belek, Antalya.
- Ge, Y., Yue, D., & Chen, L. (2017). Prediction of wind turbine blades icing based on MBK-SMOTE and random forest in imbalanced data set. IEEE Conference on Energy Internet and Energy System Integration (EI2), Changsha, China.
- Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232), 1012-1014.
- Güldal H., Çakıcı, Y. (2017). Eğitsel Veri Madenciliği. 12th International Balkan Education and Science Congress, Nessebar, Bulgaria.
- Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. International conference on intelligent computing, Berlin, Germany.
- Kalıpsız, O., & Cihan, P. (2015). Öğrenci Proje Anketlerini Sınıflandırmada En İyi Algoritmanın Belirlenmesi. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 8(1), 41-49.
Ayrıntılar
Birincil Dil
İngilizce
Konular
Mühendislik
Bölüm
Araştırma Makalesi
Yayımlanma Tarihi
31 Ekim 2019
Gönderilme Tarihi
1 Ağustos 2019
Kabul Tarihi
26 Ekim 2019
Yayımlandığı Sayı
Yıl 2019
Cited By
Usage of Weka Software Based On Machine Learning Algorithms for Prediction of Liver Fibrosis/Cirrhosis
Black Sea Journal of Engineering and Science
https://doi.org/10.34248/bsengineering.1351863