Research Article

Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions

Volume: 11 Number: 4 December 30, 2020
EN

Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions

Abstract

The purpose of this study was to examine the performance of Naive Bayes, k-nearest neighborhood, neural networks, and logistic regression analysis in terms of sample size and test data rate in classifying students according to their mathematics performance. The target population was 62728 students in the 15-year-old group who were participated in the Programme for International Student Assessment (PISA) in 2012 from The Organisation for Economic Co-operation and Development (OECD) countries. The performance of each algorithm was tested by using 11%, 22%, 33%, 44% and 55% of each dataset for small (500 students), medium (1000 students) and large (5000 students) sample sizes. 100 replications were performed for each analysis. As the evaluation criteria, accuracy rates, RMSE values, and total elapsed time were used. RMSE values for each algorithm were statistically compared by using Friedman and Wilcoxon tests. The results revealed that while the classification performance of the methods increased as the sample size increased, the increase of training data ratio had different effects on the performance of the algorithms. The Naive Bayes showed high performance even in small samples, performed the analyzes very quickly, and was not affected by the change in the training data ratio. Logistic regression analysis was the most effective method in large samples but had a poor performance in small samples. While neural networks showed a similar tendency, its overall performance was lower than Naive Bayes and logistic regression. The lowest performances in all conditions were obtained by the k-nearest neighborhood algorithm.

Keywords

References

  1. Aha, D. W., Kibler, D. & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning 6, 37-66.
  2. Aksu, G., & Guzeller, C. O. (2016). Classification of PISA 2012 mathematical literacy scores using decision-tree method: Turkey sampling. Education and Science, 41(185), 101-122.
  3. Akpınar, H. (2014). Veri madenciliği veri analizi. Papatya Yayınları, İstanbul.
  4. Baker, R. S. J. (2010). Data mining for education. International Encyclopedia of Education, 7(3), 112-118.
  5. Baker, R.S.J. & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1 (1), 3-17.
  6. Bahadır, E. (2013). Yapay sinir ağları ve lojistik regresyon analizi yaklaşımları ile öğretmen adaylarının akademik başarılarının tahmini (Yayımlanmamış Doktora Tezi). Marmara Üniversitesi, İstanbul.
  7. Barker, K., Trafalis, T. & Rhoads, T. R. (2004). Learning from student data. In Proceedings of the 2004 Systems and Information Engineering Design Symposium (pp. 79-86). IEEE.
  8. Berens, J., Schneider, K., Gortz, S., Oster, S., & Burghoff, J. (2019). Early detection of students at risk - predicting student dropouts using administrative student data from German universities and machine learning methods. JEDM | Journal of Educational Data Mining, 11(3), 1-41. https://doi.org/10.5281/zenodo.3594771.

Details

Primary Language

English

Subjects

-

Journal Section

Research Article

Publication Date

December 30, 2020

Submission Date

March 1, 2020

Acceptance Date

November 12, 2020

Published in Issue

Year 2020 Volume: 11 Number: 4

APA
Koyuncu, İ., & Gelbal, S. (2020). Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions. Journal of Measurement and Evaluation in Education and Psychology, 11(4), 325-345. https://doi.org/10.21031/epod.696664
AMA
1.Koyuncu İ, Gelbal S. Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions. JMEEP. 2020;11(4):325-345. doi:10.21031/epod.696664
Chicago
Koyuncu, İlhan, and Selahattin Gelbal. 2020. “Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions”. Journal of Measurement and Evaluation in Education and Psychology 11 (4): 325-45. https://doi.org/10.21031/epod.696664.
EndNote
Koyuncu İ, Gelbal S (December 1, 2020) Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions. Journal of Measurement and Evaluation in Education and Psychology 11 4 325–345.
IEEE
[1]İ. Koyuncu and S. Gelbal, “Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions”, JMEEP, vol. 11, no. 4, pp. 325–345, Dec. 2020, doi: 10.21031/epod.696664.
ISNAD
Koyuncu, İlhan - Gelbal, Selahattin. “Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions”. Journal of Measurement and Evaluation in Education and Psychology 11/4 (December 1, 2020): 325-345. https://doi.org/10.21031/epod.696664.
JAMA
1.Koyuncu İ, Gelbal S. Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions. JMEEP. 2020;11:325–345.
MLA
Koyuncu, İlhan, and Selahattin Gelbal. “Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions”. Journal of Measurement and Evaluation in Education and Psychology, vol. 11, no. 4, Dec. 2020, pp. 325-4, doi:10.21031/epod.696664.
Vancouver
1.İlhan Koyuncu, Selahattin Gelbal. Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions. JMEEP. 2020 Dec. 1;11(4):325-4. doi:10.21031/epod.696664

Cited By