EN
Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions
Abstract
The purpose of this study was to examine the performance of Naive Bayes, k-nearest neighborhood, neural networks, and logistic regression analysis in terms of sample size and test data rate in classifying students according to their mathematics performance. The target population was 62728 students in the 15-year-old group who were participated in the Programme for International Student Assessment (PISA) in 2012 from The Organisation for Economic Co-operation and Development (OECD) countries. The performance of each algorithm was tested by using 11%, 22%, 33%, 44% and 55% of each dataset for small (500 students), medium (1000 students) and large (5000 students) sample sizes. 100 replications were performed for each analysis. As the evaluation criteria, accuracy rates, RMSE values, and total elapsed time were used. RMSE values for each algorithm were statistically compared by using Friedman and Wilcoxon tests. The results revealed that while the classification performance of the methods increased as the sample size increased, the increase of training data ratio had different effects on the performance of the algorithms. The Naive Bayes showed high performance even in small samples, performed the analyzes very quickly, and was not affected by the change in the training data ratio. Logistic regression analysis was the most effective method in large samples but had a poor performance in small samples. While neural networks showed a similar tendency, its overall performance was lower than Naive Bayes and logistic regression. The lowest performances in all conditions were obtained by the k-nearest neighborhood algorithm.
Keywords
References
- Aha, D. W., Kibler, D. & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning 6, 37-66.
- Aksu, G., & Guzeller, C. O. (2016). Classification of PISA 2012 mathematical literacy scores using decision-tree method: Turkey sampling. Education and Science, 41(185), 101-122.
- Akpınar, H. (2014). Veri madenciliği veri analizi. Papatya Yayınları, İstanbul.
- Baker, R. S. J. (2010). Data mining for education. International Encyclopedia of Education, 7(3), 112-118.
- Baker, R.S.J. & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1 (1), 3-17.
- Bahadır, E. (2013). Yapay sinir ağları ve lojistik regresyon analizi yaklaşımları ile öğretmen adaylarının akademik başarılarının tahmini (Yayımlanmamış Doktora Tezi). Marmara Üniversitesi, İstanbul.
- Barker, K., Trafalis, T. & Rhoads, T. R. (2004). Learning from student data. In Proceedings of the 2004 Systems and Information Engineering Design Symposium (pp. 79-86). IEEE.
- Berens, J., Schneider, K., Gortz, S., Oster, S., & Burghoff, J. (2019). Early detection of students at risk - predicting student dropouts using administrative student data from German universities and machine learning methods. JEDM | Journal of Educational Data Mining, 11(3), 1-41. https://doi.org/10.5281/zenodo.3594771.
Details
Primary Language
English
Subjects
-
Journal Section
Research Article
Publication Date
December 30, 2020
Submission Date
March 1, 2020
Acceptance Date
November 12, 2020
Published in Issue
Year 2020 Volume: 11 Number: 4
APA
Koyuncu, İ., & Gelbal, S. (2020). Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions. Journal of Measurement and Evaluation in Education and Psychology, 11(4), 325-345. https://doi.org/10.21031/epod.696664
AMA
1.Koyuncu İ, Gelbal S. Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions. JMEEP. 2020;11(4):325-345. doi:10.21031/epod.696664
Chicago
Koyuncu, İlhan, and Selahattin Gelbal. 2020. “Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions”. Journal of Measurement and Evaluation in Education and Psychology 11 (4): 325-45. https://doi.org/10.21031/epod.696664.
EndNote
Koyuncu İ, Gelbal S (December 1, 2020) Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions. Journal of Measurement and Evaluation in Education and Psychology 11 4 325–345.
IEEE
[1]İ. Koyuncu and S. Gelbal, “Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions”, JMEEP, vol. 11, no. 4, pp. 325–345, Dec. 2020, doi: 10.21031/epod.696664.
ISNAD
Koyuncu, İlhan - Gelbal, Selahattin. “Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions”. Journal of Measurement and Evaluation in Education and Psychology 11/4 (December 1, 2020): 325-345. https://doi.org/10.21031/epod.696664.
JAMA
1.Koyuncu İ, Gelbal S. Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions. JMEEP. 2020;11:325–345.
MLA
Koyuncu, İlhan, and Selahattin Gelbal. “Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions”. Journal of Measurement and Evaluation in Education and Psychology, vol. 11, no. 4, Dec. 2020, pp. 325-4, doi:10.21031/epod.696664.
Vancouver
1.İlhan Koyuncu, Selahattin Gelbal. Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions. JMEEP. 2020 Dec. 1;11(4):325-4. doi:10.21031/epod.696664
Cited By
Classification of Scale Items with Exploratory Graph Analysis and Machine Learning Methods
International Journal of Assessment Tools in Education
https://doi.org/10.21449/ijate.880914Classification of Students’ Mathematical Literacy Score Using Educational Data Mining: PISA 2015 Turkey Application
Cumhuriyet Science Journal
https://doi.org/10.17776/csj.1136733Stacking: An ensemble learning approach to predict student performance in PISA 2022
Education and Information Technologies
https://doi.org/10.1007/s10639-024-13110-2Yapay Zekâ Teknikleriyle Yükseköğretim Kurumları Sınavı (YKS) Puanlarının Tahmini
Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji
https://doi.org/10.29109/gujsc.1509217