Research Article
BibTex RIS Cite

Eğitimsel Veri Madenciliği: Öğrencilerin Performansını Tahmin Etmek İçin Ağaç Tabanlı Bir Modelin İnşası

Year 2025, Volume: 14 Issue: 1, 181 - 195, 29.01.2025

Abstract

Eğitimsel veri madenciliği, eğitim verilerindeki gizli örüntüleri keşfeden bir araştırma alanıdır. Bu çalışmada öğrencilerin final not performanslarını tahmin etmek amacıyla en temel özelliklerden oluşan bir veri setine makine öğrenmesi algoritmaları uygulanmıştır. Böylece en önemli özellikler ve en yüksek performanslı makine öğrenmesi algoritması da tespit edilmeye çalışılmıştır. Bu amaçla özellik seçim sürecinde tek değişkenli özellik seçimi, ağaç tabanlı özellik seçimi ve L1 tabanlı özellik seçimi yöntemleri kullanılmıştır. Öğrenme modellerini oluşturmak için sınıflandırma ve regresyon ağaçları, k-en yakın komşular, naive Bayes, rastgele orman ve destek vektör makineleri kullanılmıştır. L1 tabanlı özellik seçimi ve sınıflandırma ve regresyon ağaçları, sırasıyla özellik seçimi ve model oluşturma süreçlerinde en iyi performansı sağlamıştır. Deneysel sonuçlar, önerilen modelin ortalama 0,7700 sınıflandırma doğruluğuna ve 0,7888 F1 puanına ulaştığını göstermektedir. L1 tabanlı özellik seçme yönteminde yalnızca 4 özellik seçilmiştir: bunlar burs türü, toplam maaş, üniversiteye ulaşım ve son yarıyıldaki genel not ortalamasıdır. Sonuç olarak öğrencilerin akademik başarılarını etkileyen pek çok gösterge mevcut olup, ölçme süreci sonrasında ortaya çıkan başarı ya da başarısızlık, bu özellikler dikkate alınarak önceden tahmin edilebilmektedir. Böyle bir görev, eğitimsel girdi ve çıktılar arasındaki ilişki mekanizmasının anlaşılmasını sağlayacak ve eğitim sürecine ilişkin eksiklikleri ortadan kaldıracaktır.

References

  • Acar, E. (2022). Comparison of the Performances of OECD Countries in the Perspective of Socio-Economic Global Indices: CRITIC-Based Cocoso Method. Dumlupınar Üniversitesi Sosyal Bilimler Dergisi, 73, 256–277. https://doi.org/10.51290/dpusbe.1122650
  • Akdamar, E., & Kızılkaya, Y. M. (2022). Üniversite Öğrencilerinin Akademik Erteleme Eğilimleri ile Umutsuzluk Seviyeleri ve Akademik Başarıları Arasındaki İlişkinin İncelenmesi. Kahramanmaraş Sütçü İmam Üniversitesi Sosyal Bilimler Dergisi, 19(1), 212–221. https://doi.org/10.33437/ksusbd.844605
  • Aslanargun, E., Bozkurt, S., & Sarıoğlu, S. (2016). Sosyo Ekonomik Değişkenlerin Öğrencilerin Akademik Başarısı Üzerine Etkileri. Uşak Üniversitesi Sosyal Bilimler Dergisi, 9(27/3), 201–234.
  • Aziz, Y., & Memon, K. H. (2023). Fast geometrical extraction of nearest neighbors from multi-dimensional data. Pattern Recognition, 136, 109183. https://doi.org/10.1016/j.patcog.2022.109183
  • Baker, Ryan S. (2014). Educational Data Mining: An Advance for Intelligent Systems in Education. IEEE Intelligent Systems, 29(3), 78–82. https://doi.org/10.1109/MIS.2014.42
  • Baker, Ryan Shaun, & Inventado, P. S. (2014). Educational Data Mining and Learning Analytics. In Learning Analytics (pp. 61–75). Springer New York. https://doi.org/10.1007/978-1-4614-3305-7_4
  • Baudat, G., & Anouar, F. (2000). Generalized Discriminant Analysis Using a Kernel Approach. Neural Computation, 12(10), 2385–2404. https://doi.org/10.1162/089976600300014980
  • Bermingham, M. L., Pong-Wong, R., Spiliopoulou, A., Hayward, C., Rudan, I., Campbell, H., Wright, A. F., Wilson, J. F., Agakov, F., Navarro, P., & Haley, C. S. (2015). Application of high-dimensional feature selection: evaluation for genomic prediction in man. Scientific Reports, 5(1), 10312. https://doi.org/10.1038/srep10312
  • Beyer, K. S., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When Is “‘Nearest Neighbor’” Meaningful? ICDT ’99 Proceedings of the 7th International Conference on Database Theory, 217–235.
  • Bolón-Canedo, V., Sánchez-Maroño, N., & Alonso-Betanzos, A. (2013). A review of feature selection methods on synthetic data. Knowledge and Information Systems, 34(3), 483–519. https://doi.org/10.1007/s10115-012-0487-8
  • Breiman, L., Friedman, J. H., Stone, C. J., & Olshen, R. A. (1984). Classification and Regression Trees (1st ed.). Chapman and Hall/CRC.
  • Burges, C. C. J. C. (1998). A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2), 121–167. https://doi.org/10.1023/A:1009715923555
  • Cristianini, N., & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel based Learning Methods. Cambridge University Press.
  • Ghosh, D., & Cabrera, J. (2022). Enriched Random Forest for High Dimensional Genomic Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(5), 2817–2828. https://doi.org/10.1109/TCBB.2021.3089417
  • Han, J., & Kamber, M. (2006). Data Mining: Concepts and Techniques (2nd ed.). Morgan Kaufmann.
  • Hechenbichler, K., & Schliep, K. (2004). Weighted k-Nearest-Neighbor Techniques and Ordinal Classification. Collaborative Research Center 386, 399. https://doi.org/10.5282/ubm/epub.1769
  • Ismail, L., Materwala, H., & Hennebelle, A. (2021). Comparative Analysis of Machine Learning Models for Students’ Performance Prediction. In Advances in Intelligent Systems and Computing (pp. 149–160). https://doi.org/10.1007/978-3-030-71782-7_14
  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning (Vol. 103). Springer New York. https://doi.org/10.1007/978-1-4614-7138-7
  • Kazak, E. (2021). Farklı Sosyo Ekonomik Çevrelerde Bulunan Okulların Etkililiğine İlişkin Öğretmenlerin Görüşleri. Abant İzzet Baysal Üniversitesi Eğitim Fakültesi Dergisi, 21(1), 139–161. https://doi.org/10.17240/aibuefd.2021.21.60703-829153
  • Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1–2), 273–324. https://doi.org/10.1016/S0004-3702(97)00043-X
  • Lee, N., & Kim, J.-M. (2010). Conversion of categorical variables into numerical variables via Bayesian network classifiers for binary classifications. Computational Statistics & Data Analysis, 54(5), 1247–1265. https://doi.org/10.1016/j.csda.2009.11.003
  • Lenat, D. B., & Feigenbaum, E. A. (1991). On the thresholds of knowledge. Artificial Intelligence, 47(1–3), 185–250. https://doi.org/10.1016/0004-3702(91)90055-O
  • Lin, H.-T., Lin, C.-J., & Weng, R. C. (2007). A note on Platt’s probabilistic outputs for support vector machines. Machine Learning, 68(3), 267–276. https://doi.org/10.1007/s10994-007-5018-6
  • Lockhart, R., Taylor, J., Tibshirani, R. J., & Tibshirani, R. (2014). A significance test for the lasso. The Annals of Statistics, 42(2). https://doi.org/10.1214/13-AOS1175
  • Manning, C. D., & Raghavan, P. (2009). An Introduction to Information Retrieval. In Online (p. 1). https://doi.org/10.1109/LPT.2009.2020494
  • Nisbet, R., Miner, G., & Yale, K. (2018). Data Understanding and Preparation. In Handbook of Statistical Analysis and Data Mining Applications (pp. 55–82). Elsevier. https://doi.org/10.1016/B978-0-12-416632-5.00004-9
  • Özdemir, A., Saylam, R., & Bilen, B. B. (2018). Eğitim Sisteminde Veri Madenciliği Uygulamaları Ve Farkındalık Üzerine Bir Durum Çalışması. Atatürk Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 22(Özel Sayı 2), 2159–2172.
  • Özkan, Ö. (2015). Veri Madenciliği Kavramı ve Eğitimde Veri Madenciliği Uygulamaları. Uluslararası Eğitim Bilimleri Dergisi, 5, 262–272.
  • Pallathadka, H., Wenda, A., Ramirez-Asís, E., Asís-López, M., Flores-Albornoz, J., & Phasinam, K. (2023). Classification and prediction of student performance data using various machine learning algorithms. Materials Today: Proceedings, 80, 3782–3785. https://doi.org/10.1016/j.matpr.2021.07.382
  • Russell, S. J., & Norvig, P. (2010). Artificial Intelligence: A Modern Approach (3rd ed.). Prentice Hall.
  • Sa, C. L., Abang Ibrahim, D. H. bt., Dahliana Hossain, E., & bin Hossin, M. (2014). Student performance analysis system (SPAS). The 5th International Conference on Information and Communication Technology for The Muslim World (ICT4M), 1–6. https://doi.org/10.1109/ICT4M.2014.7020662
  • Şahin, M., & Demirtaş, H. (2014). Üniversitelerde Yabancı Uyruklu Öğrencilerin Akademik Başarı Düzeyleri, Yaşadıkları Sorunlar ve Çözüm Önerileri. Milli Eğitim Dergisi, 44(204), 88–113.
  • Salah Hashim, A., Akeel Awadh, W., & Khalaf Hamoud, A. (2020). Student Performance Prediction Model based on Supervised Machine Learning Algorithms. IOP Conference Series: Materials Science and Engineering, 928(3), 032019. https://doi.org/10.1088/1757-899X/928/3/032019
  • Sarıer, Y. (2020). TIMSS Uygulamalarında Türkiye’nin Performansı ve Akademik Başarıyı Yordayan Değişkenler. Temel Eğitim, 2(2), 6–27.
  • Shanmugarajeshwari, V., & Lawrance, R. (2016). Analysis of students’ performance evaluation using classification techniques. 2016 International Conference on Computing Technologies and Intelligent Data Engineering (ICCTIDE’16), 1–7. https://doi.org/10.1109/ICCTIDE.2016.7725375
  • Vapnik, V., & Lerner, A. (1963). Pattern recognition using generalized portrait method. Automation and Remote Control, 24, 774–780.
  • Yıldırım, H. İ. (2020). The Effect of Using Out-of-School Learning Environments in Science Teaching on Motivation for Learning Science. Participatory Educational Research, 7(1), 143–161. https://doi.org/10.17275/per.20.9.7.1
  • Yılmaz, N., & Sekeroglu, B. (2020). Student Performance Classification Using Artificial Intelligence Techniques. In R. A. Aliev, J. Kacprzyk, W. Pedrycz, M. Jamshidi, M. B. Babanli, & F. M. Sadikoglu (Eds.), 10th International Conference on Theory and Application of Soft Computing, Computing with Words and Perceptions - ICSCCW-2019 (pp. 596–603). Springer, Cham. https://doi.org/10.1007/978-3-030-35249-3_76
  • Yüksel, M. (2022). PISA 2018 Araştırma Sonuçlarına Göre Ülkelerin Bileşik PISA Performans Sıralaması. Muğla Sıtkı Koçman Üniversitesi Eğitim Fakültesi Dergisi, 9(2), 788–821. https://doi.org/10.21666/muefd.1093574

Educational Data Mining: Construction of a Tree-based Model to Predict Students’ Performance

Year 2025, Volume: 14 Issue: 1, 181 - 195, 29.01.2025

Abstract

Educational data mining is a research field that probes undercover patterns in educational data. In this paper, machine learning algorithms have been applied to the dataset that consists of major features so as to predict students’ final grade performances. Thus, the most significant features and the highest-performance machine learning algorithm have been also tried to be detected. To this end, univariate feature selection, tree-based feature selection, and L1-based feature selection methods have been used for the feature selection process. Classification and regression trees, k-nearest neighbors, naive Bayes, random forest, and support vector machines have been employed to build the learning models. The L1-based feature selection and classification and regression trees have delivered the best performance for the feature selection and the model creation processes, respectively. The experimental results demonstrate that the proposed model reached a classification accuracy of 0.7700 and an F1-score of 0.7888 on average. The L1-based feature selection method has selected only 4 features: these are scholarship type, total salary, transportation to the university, and cumulative grade point average in the last semester. In consequence, there exist lots of indicators that impact students' academic successes, the success or failure that emerges after the measurement process can be estimated by regarding these features in advance. Such a task will enable the relationship mechanism between the educational inputs and outputs to be understandable and eliminate shortcomings concerning the education process.

References

  • Acar, E. (2022). Comparison of the Performances of OECD Countries in the Perspective of Socio-Economic Global Indices: CRITIC-Based Cocoso Method. Dumlupınar Üniversitesi Sosyal Bilimler Dergisi, 73, 256–277. https://doi.org/10.51290/dpusbe.1122650
  • Akdamar, E., & Kızılkaya, Y. M. (2022). Üniversite Öğrencilerinin Akademik Erteleme Eğilimleri ile Umutsuzluk Seviyeleri ve Akademik Başarıları Arasındaki İlişkinin İncelenmesi. Kahramanmaraş Sütçü İmam Üniversitesi Sosyal Bilimler Dergisi, 19(1), 212–221. https://doi.org/10.33437/ksusbd.844605
  • Aslanargun, E., Bozkurt, S., & Sarıoğlu, S. (2016). Sosyo Ekonomik Değişkenlerin Öğrencilerin Akademik Başarısı Üzerine Etkileri. Uşak Üniversitesi Sosyal Bilimler Dergisi, 9(27/3), 201–234.
  • Aziz, Y., & Memon, K. H. (2023). Fast geometrical extraction of nearest neighbors from multi-dimensional data. Pattern Recognition, 136, 109183. https://doi.org/10.1016/j.patcog.2022.109183
  • Baker, Ryan S. (2014). Educational Data Mining: An Advance for Intelligent Systems in Education. IEEE Intelligent Systems, 29(3), 78–82. https://doi.org/10.1109/MIS.2014.42
  • Baker, Ryan Shaun, & Inventado, P. S. (2014). Educational Data Mining and Learning Analytics. In Learning Analytics (pp. 61–75). Springer New York. https://doi.org/10.1007/978-1-4614-3305-7_4
  • Baudat, G., & Anouar, F. (2000). Generalized Discriminant Analysis Using a Kernel Approach. Neural Computation, 12(10), 2385–2404. https://doi.org/10.1162/089976600300014980
  • Bermingham, M. L., Pong-Wong, R., Spiliopoulou, A., Hayward, C., Rudan, I., Campbell, H., Wright, A. F., Wilson, J. F., Agakov, F., Navarro, P., & Haley, C. S. (2015). Application of high-dimensional feature selection: evaluation for genomic prediction in man. Scientific Reports, 5(1), 10312. https://doi.org/10.1038/srep10312
  • Beyer, K. S., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When Is “‘Nearest Neighbor’” Meaningful? ICDT ’99 Proceedings of the 7th International Conference on Database Theory, 217–235.
  • Bolón-Canedo, V., Sánchez-Maroño, N., & Alonso-Betanzos, A. (2013). A review of feature selection methods on synthetic data. Knowledge and Information Systems, 34(3), 483–519. https://doi.org/10.1007/s10115-012-0487-8
  • Breiman, L., Friedman, J. H., Stone, C. J., & Olshen, R. A. (1984). Classification and Regression Trees (1st ed.). Chapman and Hall/CRC.
  • Burges, C. C. J. C. (1998). A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2), 121–167. https://doi.org/10.1023/A:1009715923555
  • Cristianini, N., & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel based Learning Methods. Cambridge University Press.
  • Ghosh, D., & Cabrera, J. (2022). Enriched Random Forest for High Dimensional Genomic Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(5), 2817–2828. https://doi.org/10.1109/TCBB.2021.3089417
  • Han, J., & Kamber, M. (2006). Data Mining: Concepts and Techniques (2nd ed.). Morgan Kaufmann.
  • Hechenbichler, K., & Schliep, K. (2004). Weighted k-Nearest-Neighbor Techniques and Ordinal Classification. Collaborative Research Center 386, 399. https://doi.org/10.5282/ubm/epub.1769
  • Ismail, L., Materwala, H., & Hennebelle, A. (2021). Comparative Analysis of Machine Learning Models for Students’ Performance Prediction. In Advances in Intelligent Systems and Computing (pp. 149–160). https://doi.org/10.1007/978-3-030-71782-7_14
  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning (Vol. 103). Springer New York. https://doi.org/10.1007/978-1-4614-7138-7
  • Kazak, E. (2021). Farklı Sosyo Ekonomik Çevrelerde Bulunan Okulların Etkililiğine İlişkin Öğretmenlerin Görüşleri. Abant İzzet Baysal Üniversitesi Eğitim Fakültesi Dergisi, 21(1), 139–161. https://doi.org/10.17240/aibuefd.2021.21.60703-829153
  • Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1–2), 273–324. https://doi.org/10.1016/S0004-3702(97)00043-X
  • Lee, N., & Kim, J.-M. (2010). Conversion of categorical variables into numerical variables via Bayesian network classifiers for binary classifications. Computational Statistics & Data Analysis, 54(5), 1247–1265. https://doi.org/10.1016/j.csda.2009.11.003
  • Lenat, D. B., & Feigenbaum, E. A. (1991). On the thresholds of knowledge. Artificial Intelligence, 47(1–3), 185–250. https://doi.org/10.1016/0004-3702(91)90055-O
  • Lin, H.-T., Lin, C.-J., & Weng, R. C. (2007). A note on Platt’s probabilistic outputs for support vector machines. Machine Learning, 68(3), 267–276. https://doi.org/10.1007/s10994-007-5018-6
  • Lockhart, R., Taylor, J., Tibshirani, R. J., & Tibshirani, R. (2014). A significance test for the lasso. The Annals of Statistics, 42(2). https://doi.org/10.1214/13-AOS1175
  • Manning, C. D., & Raghavan, P. (2009). An Introduction to Information Retrieval. In Online (p. 1). https://doi.org/10.1109/LPT.2009.2020494
  • Nisbet, R., Miner, G., & Yale, K. (2018). Data Understanding and Preparation. In Handbook of Statistical Analysis and Data Mining Applications (pp. 55–82). Elsevier. https://doi.org/10.1016/B978-0-12-416632-5.00004-9
  • Özdemir, A., Saylam, R., & Bilen, B. B. (2018). Eğitim Sisteminde Veri Madenciliği Uygulamaları Ve Farkındalık Üzerine Bir Durum Çalışması. Atatürk Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 22(Özel Sayı 2), 2159–2172.
  • Özkan, Ö. (2015). Veri Madenciliği Kavramı ve Eğitimde Veri Madenciliği Uygulamaları. Uluslararası Eğitim Bilimleri Dergisi, 5, 262–272.
  • Pallathadka, H., Wenda, A., Ramirez-Asís, E., Asís-López, M., Flores-Albornoz, J., & Phasinam, K. (2023). Classification and prediction of student performance data using various machine learning algorithms. Materials Today: Proceedings, 80, 3782–3785. https://doi.org/10.1016/j.matpr.2021.07.382
  • Russell, S. J., & Norvig, P. (2010). Artificial Intelligence: A Modern Approach (3rd ed.). Prentice Hall.
  • Sa, C. L., Abang Ibrahim, D. H. bt., Dahliana Hossain, E., & bin Hossin, M. (2014). Student performance analysis system (SPAS). The 5th International Conference on Information and Communication Technology for The Muslim World (ICT4M), 1–6. https://doi.org/10.1109/ICT4M.2014.7020662
  • Şahin, M., & Demirtaş, H. (2014). Üniversitelerde Yabancı Uyruklu Öğrencilerin Akademik Başarı Düzeyleri, Yaşadıkları Sorunlar ve Çözüm Önerileri. Milli Eğitim Dergisi, 44(204), 88–113.
  • Salah Hashim, A., Akeel Awadh, W., & Khalaf Hamoud, A. (2020). Student Performance Prediction Model based on Supervised Machine Learning Algorithms. IOP Conference Series: Materials Science and Engineering, 928(3), 032019. https://doi.org/10.1088/1757-899X/928/3/032019
  • Sarıer, Y. (2020). TIMSS Uygulamalarında Türkiye’nin Performansı ve Akademik Başarıyı Yordayan Değişkenler. Temel Eğitim, 2(2), 6–27.
  • Shanmugarajeshwari, V., & Lawrance, R. (2016). Analysis of students’ performance evaluation using classification techniques. 2016 International Conference on Computing Technologies and Intelligent Data Engineering (ICCTIDE’16), 1–7. https://doi.org/10.1109/ICCTIDE.2016.7725375
  • Vapnik, V., & Lerner, A. (1963). Pattern recognition using generalized portrait method. Automation and Remote Control, 24, 774–780.
  • Yıldırım, H. İ. (2020). The Effect of Using Out-of-School Learning Environments in Science Teaching on Motivation for Learning Science. Participatory Educational Research, 7(1), 143–161. https://doi.org/10.17275/per.20.9.7.1
  • Yılmaz, N., & Sekeroglu, B. (2020). Student Performance Classification Using Artificial Intelligence Techniques. In R. A. Aliev, J. Kacprzyk, W. Pedrycz, M. Jamshidi, M. B. Babanli, & F. M. Sadikoglu (Eds.), 10th International Conference on Theory and Application of Soft Computing, Computing with Words and Perceptions - ICSCCW-2019 (pp. 596–603). Springer, Cham. https://doi.org/10.1007/978-3-030-35249-3_76
  • Yüksel, M. (2022). PISA 2018 Araştırma Sonuçlarına Göre Ülkelerin Bileşik PISA Performans Sıralaması. Muğla Sıtkı Koçman Üniversitesi Eğitim Fakültesi Dergisi, 9(2), 788–821. https://doi.org/10.21666/muefd.1093574
There are 39 citations in total.

Details

Primary Language English
Subjects Higher Education Studies (Other)
Journal Section Articles
Authors

Furkan Aydın 0000-0003-0610-8744

Publication Date January 29, 2025
Submission Date November 13, 2023
Acceptance Date October 13, 2024
Published in Issue Year 2025 Volume: 14 Issue: 1

Cite

APA Aydın, F. (2025). Educational Data Mining: Construction of a Tree-based Model to Predict Students’ Performance. Bartın University Journal of Faculty of Education, 14(1), 181-195.

All the articles published in the journal are open access and distributed under the conditions of CommonsAttribution-NonCommercial 4.0 International License 

88x31.png


Bartın University Journal of Faculty of Education