Research Article
BibTex RIS Cite

Predicting Student Achievement via Machine Learning: Evidence from Turkish Subset of PISA

Year 2024, , 7 - 27, 28.06.2024
https://doi.org/10.51803/yssr.1461030

Abstract

This study seeks to identify the determinants of academic performance in mathematics, science, and reading among Turkish secondary school students. Using data from the OECD's PISA 2018 survey, which includes several student- and school-level variables as well as test scores, we employed a range of supervised machine learning methods specifically ensemble decision trees to assess their predictive performance. Our results indicate that the boosted regression tree (BRT) method outperforms other methods bagging and random forest regression trees. Notably, the BRT highlights the importance of general secondary education programs over vocational and technical (VAT) education in predicting academic achievement. Moreover, both characteristics specific to student and school environment are demonstrated to be significant predictors of academic performance in all subject areas. These findings contribute to the development of evidence-based educational policies in Turkey.

References

  • References
  • Aksu, G., & Güzeller, C. O. (2016). Classification of PISA 2012 mathematical literacy scores using decision-tree method: Turkey sampling. TED Eğitim ve Bilim 41(185). [CrossRef]
  • Baker, R. S., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1), 3–17.
  • Breiman, L. (2017). Classification and regression trees. Routledge. [CrossRef]
  • Chen, J., Zhang, Y., Wei, Y., & Hu, J. (2021). Discrimination of the contextual features of top performers in scientific literacy using a machine learning approach. Research in Science Education, 51(1), 129–158. [CrossRef]
  • Dong, X., & Hu, J. (2019). An exploration of impact factors influencing students’ reading literacy in Singapore with machine learning approaches. International Journal of English Linguistics, 9(5), 52–65. [CrossRef]
  • Filiz, E., & Öz, E. (2019). Finding the Best Algorithms and Effective Factors in Classification of Turkish Science Student Success. Journal of Baltic Science Education, 18(2), 239–253. [CrossRef]
  • Gabriel, F., Signolet, J., &Westwell, M. (2018). A machine learning approach to investigating the effects of mathematics dispositions on mathematical literacy. International Journal of Research & Method in Education, 41(3), 306–327. [CrossRef]
  • Gorostiaga, A., & Rojo-Álvarez, J. L. (2016). On the use of conventional and statistical-learning techniques for the analysis of PISA results in Spain. Neurocomputing, 171, 625–637. [CrossRef]
  • Hanushek, E. A. (1979). Conceptual and empirical issues in the estimation of educational production functions. Journal of Human Resources, 351–388. [CrossRef]
  • Hanushek, E. A., & Kimko, D. D. (2000). Schooling, labor-force quality, and the growth of nations. American Economic Review, 90(5), 1184–1208. [CrossRef]
  • Hu, J., Peng, Y., & Ma, H. (2022). Examining the contextual factors of science effectiveness: a machine learning-based approach. School Effectiveness and School Improvement, 33(1), 21–50.
  • James, G., Witten, D., Hastie, T. & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. Springer. [CrossRef]
  • Kasap, Y., Doğan, N., & Koçak, C. (2021). PISA 2018’de Okuduğunu anlama başarısını yordayan değişkenlerin veri madenciliği ile belirlenmesi. Manisa Celal Bayar Üniversitesi Sosyal Bilimler Dergisi, 19(4), 241–258. [CrossRef]
  • Kılıç Depren, S. (2018). Prediction of students’ science achievement: an application of multivariate adaptive regression splines and regression trees. Journal of Baltic Science Education, 17(5), 887–903. [CrossRef]
  • Kıray, S. A., Gök, B., &Bozkır, A. S. (2015). Identifying the factors affecting science and mathematics achievement using data mining methods. Journal of Education in Science Environment and Health, 1(1), 28–48. [CrossRef]
  • Kleinberg, J., Ludwig, J., Mullainathan, J., and Obermeyer, Z. (2015). "Prediction Policy Problems", American Economic Review, Papers and Proceedings, 105(5), 491–495. [CrossRef]
  • Lee, J. W., & Barro, R. J. (2001). Schooling quality in a cross–section of countries. Economica, 68(272), 465–488. [CrossRef]
  • Lee, H., & Lee, J. W. (2021). Why East Asian students perform better in mathematics than their peers: An investigation using a machine learning approach. CAMA Working Paper No. 66/2021. [CrossRef]
  • Martínez-Abad, F., Gamazo, A., & Rodríguez-Conde, M. J. (2020). Educational Data Mining: Identification of factors associated with school effectiveness in PISA assessment. Studies in Educational Evaluation, 66, Article 100875. [CrossRef]
  • Masci, C., Johnes, G., &Agasisti, T. (2018). Student and school performance across countries: A machine learning approach. European Journal of Operational Research, 269(3), 1072–1085. [CrossRef] MEB (2019). PISA 2018 ulusalönraporu. Ankara: http://pisa.meb.gov.tr/eski%20dosyalar/wpcontent/uploads/2020/01/PISA_2018_Turkiye_On_Raporu.pdf OECD. (2009). PISA Data Analysis Manual. https://www.oecd-ilibrary.org/docserver/9789264056275-en.pdf?expires=1680205505&id=id&accname=guest&checksum=11DAE831D022F23D8FF8E094F9E7AB8C
  • OECD (2019), PISA 2018, https://www.oecd.org/pisa/data/2018database/ accessed on 25 October 2021.
  • OECD. (2019). PISA 2018 Technical Report. https://www.oecd.org/pisa/data/pisa2018technicalreport/
  • OECD. (2019). Turkey - Country Note - PISA 2018 Results. https://www.oecd.org/pisa/publications/PISA2018_CN_TUR.pdf
  • Prasad, A. M., Iverson, L. R., &Liaw, A. (2006). Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems, 9, 181–199. [CrossRef]
  • Puah, S. (2021). Predicting Students’ Academic Performance: A Comparison between Traditional MLR and Machine Learning Methods with PISA 2015. Preprint. doi: 10.31234/osf.io/2yshm [CrossRef]
  • Rebai, S., Yahia, F. B., &Essid, H. (2020). A graphically based machine learning approach to predict secondary schools performance in Tunisia. Socio-Economic Planning Sciences, 70, Article 100724. [CrossRef]
  • Romero, C., & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135–146. [CrossRef]
  • She, H. C., Lin, H. S., & Huang, L. Y. (2019). Reflections on and implications of the Programme for International Student Assessment 2015 (PISA 2015) performance of students in Taiwan: The role of epistemic beliefs about science in scientific literacy. Journal of Research in Science Teaching, 56(10), 1309–1340. [CrossRef]
  • Sirin, S. R. (2005). Socioeconomic status and academic achievement: A meta-analytic review of research. Review of Educational Research, 75(3), 417–453. [CrossRef]
  • Uğuz, E., Şahin, S., & Yılmaz, R. (2021). PİSA 2018 fen bilimleri puanlarının değerlendirilmesinde eğitsel veri madenciliğinin kullanımı. Bilgi ve İletişim Teknolojileri Dergisi, 3(2), 212–227. [CrossRef]
  • Walberg, H. J. (1981). A psychological theory of educational productivity. In F. H. Farley & N. Gordon (Eds.), Psychology and education (pp. 81–110). Berkeley, CA: McCutchan.
  • Woessmann, L. 2008. "How equal are educational opportunities? Family background and student achievement in Europe and the United States." ZeitschriftfürBetriebswirtschaft, 78(1), 45–70.
  • Yoo, J. E. (2018). TIMSS 2011 student and teacher predictors for mathematics achievement explored and identified via elastic net. Frontiers in Psychology, 9, Article 317. [CrossRef]
  • Yu, C. H., Kaprolet, C., Jannasch-Pennell, A., & DiGangi, S. (2012). A data mining approach to comparing American and Canadian grade 10 students’ PISA science test performance. Journal of Data Science, 10(24), 441–464. [CrossRef]

Öğrenci Başarısının Makine Öğrenmesi Yöntemiyle Tahmini: PISA' dan Türkiye Bulguları

Year 2024, , 7 - 27, 28.06.2024
https://doi.org/10.51803/yssr.1461030

Abstract

Bu çalışmanın amacı Türkiye’de orta okul öğrencilerinin matematik, fen bilimleri ve okuma alanlarındaki başarı performanslarının tahmin edilmesinde kullanılabilecek değişkenlerin belirlenmesidir. Bunun için, OECD’nin 2018’de düzenlemiş olduğu PISA çalışmasının öğrenci ve okul anketleriyle birlikte PISA test sonuçları ve gözetimli regresyon tabanlı makine öğrenmesi yöntemleri kullanılarak Türk orta okul öğrencilerinin akademik başarısını en iyi tahmin edebilecek model araştırılmıştır. Çoklu doğrusal regresyon, ridge regresyon, LASSO, elastic net regresyon, torbalama ve rassal orman yöntemleri arasından yükseltme karar ağacı (BRT) yöntemi en iyi tahmin performansına sahip olarak belirlenmiştir. Yükseltme karar ağacı (BRT) yönteminden elde edilen bulgulara göre Türk orta okul öğrencilerinin akademik başarısını tahmin etmede öne çıkan değişkenlerden en önemlisi öğrencinin kayıtlı olduğu okulun program tipidir (Mesleki ve Teknik Orta Öğretim yerine Genel Orta Öğretimdir). Ek olarak, Türk orta okul öğrencilerinin akademik başarısını tahmin etmede hem öğrenci düzeyinde hem de okul düzeyindeki değişkenler öne çıkmaktadır. Söz konusu bulgular her ders için geçerlidir.

References

  • References
  • Aksu, G., & Güzeller, C. O. (2016). Classification of PISA 2012 mathematical literacy scores using decision-tree method: Turkey sampling. TED Eğitim ve Bilim 41(185). [CrossRef]
  • Baker, R. S., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1), 3–17.
  • Breiman, L. (2017). Classification and regression trees. Routledge. [CrossRef]
  • Chen, J., Zhang, Y., Wei, Y., & Hu, J. (2021). Discrimination of the contextual features of top performers in scientific literacy using a machine learning approach. Research in Science Education, 51(1), 129–158. [CrossRef]
  • Dong, X., & Hu, J. (2019). An exploration of impact factors influencing students’ reading literacy in Singapore with machine learning approaches. International Journal of English Linguistics, 9(5), 52–65. [CrossRef]
  • Filiz, E., & Öz, E. (2019). Finding the Best Algorithms and Effective Factors in Classification of Turkish Science Student Success. Journal of Baltic Science Education, 18(2), 239–253. [CrossRef]
  • Gabriel, F., Signolet, J., &Westwell, M. (2018). A machine learning approach to investigating the effects of mathematics dispositions on mathematical literacy. International Journal of Research & Method in Education, 41(3), 306–327. [CrossRef]
  • Gorostiaga, A., & Rojo-Álvarez, J. L. (2016). On the use of conventional and statistical-learning techniques for the analysis of PISA results in Spain. Neurocomputing, 171, 625–637. [CrossRef]
  • Hanushek, E. A. (1979). Conceptual and empirical issues in the estimation of educational production functions. Journal of Human Resources, 351–388. [CrossRef]
  • Hanushek, E. A., & Kimko, D. D. (2000). Schooling, labor-force quality, and the growth of nations. American Economic Review, 90(5), 1184–1208. [CrossRef]
  • Hu, J., Peng, Y., & Ma, H. (2022). Examining the contextual factors of science effectiveness: a machine learning-based approach. School Effectiveness and School Improvement, 33(1), 21–50.
  • James, G., Witten, D., Hastie, T. & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. Springer. [CrossRef]
  • Kasap, Y., Doğan, N., & Koçak, C. (2021). PISA 2018’de Okuduğunu anlama başarısını yordayan değişkenlerin veri madenciliği ile belirlenmesi. Manisa Celal Bayar Üniversitesi Sosyal Bilimler Dergisi, 19(4), 241–258. [CrossRef]
  • Kılıç Depren, S. (2018). Prediction of students’ science achievement: an application of multivariate adaptive regression splines and regression trees. Journal of Baltic Science Education, 17(5), 887–903. [CrossRef]
  • Kıray, S. A., Gök, B., &Bozkır, A. S. (2015). Identifying the factors affecting science and mathematics achievement using data mining methods. Journal of Education in Science Environment and Health, 1(1), 28–48. [CrossRef]
  • Kleinberg, J., Ludwig, J., Mullainathan, J., and Obermeyer, Z. (2015). "Prediction Policy Problems", American Economic Review, Papers and Proceedings, 105(5), 491–495. [CrossRef]
  • Lee, J. W., & Barro, R. J. (2001). Schooling quality in a cross–section of countries. Economica, 68(272), 465–488. [CrossRef]
  • Lee, H., & Lee, J. W. (2021). Why East Asian students perform better in mathematics than their peers: An investigation using a machine learning approach. CAMA Working Paper No. 66/2021. [CrossRef]
  • Martínez-Abad, F., Gamazo, A., & Rodríguez-Conde, M. J. (2020). Educational Data Mining: Identification of factors associated with school effectiveness in PISA assessment. Studies in Educational Evaluation, 66, Article 100875. [CrossRef]
  • Masci, C., Johnes, G., &Agasisti, T. (2018). Student and school performance across countries: A machine learning approach. European Journal of Operational Research, 269(3), 1072–1085. [CrossRef] MEB (2019). PISA 2018 ulusalönraporu. Ankara: http://pisa.meb.gov.tr/eski%20dosyalar/wpcontent/uploads/2020/01/PISA_2018_Turkiye_On_Raporu.pdf OECD. (2009). PISA Data Analysis Manual. https://www.oecd-ilibrary.org/docserver/9789264056275-en.pdf?expires=1680205505&id=id&accname=guest&checksum=11DAE831D022F23D8FF8E094F9E7AB8C
  • OECD (2019), PISA 2018, https://www.oecd.org/pisa/data/2018database/ accessed on 25 October 2021.
  • OECD. (2019). PISA 2018 Technical Report. https://www.oecd.org/pisa/data/pisa2018technicalreport/
  • OECD. (2019). Turkey - Country Note - PISA 2018 Results. https://www.oecd.org/pisa/publications/PISA2018_CN_TUR.pdf
  • Prasad, A. M., Iverson, L. R., &Liaw, A. (2006). Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems, 9, 181–199. [CrossRef]
  • Puah, S. (2021). Predicting Students’ Academic Performance: A Comparison between Traditional MLR and Machine Learning Methods with PISA 2015. Preprint. doi: 10.31234/osf.io/2yshm [CrossRef]
  • Rebai, S., Yahia, F. B., &Essid, H. (2020). A graphically based machine learning approach to predict secondary schools performance in Tunisia. Socio-Economic Planning Sciences, 70, Article 100724. [CrossRef]
  • Romero, C., & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135–146. [CrossRef]
  • She, H. C., Lin, H. S., & Huang, L. Y. (2019). Reflections on and implications of the Programme for International Student Assessment 2015 (PISA 2015) performance of students in Taiwan: The role of epistemic beliefs about science in scientific literacy. Journal of Research in Science Teaching, 56(10), 1309–1340. [CrossRef]
  • Sirin, S. R. (2005). Socioeconomic status and academic achievement: A meta-analytic review of research. Review of Educational Research, 75(3), 417–453. [CrossRef]
  • Uğuz, E., Şahin, S., & Yılmaz, R. (2021). PİSA 2018 fen bilimleri puanlarının değerlendirilmesinde eğitsel veri madenciliğinin kullanımı. Bilgi ve İletişim Teknolojileri Dergisi, 3(2), 212–227. [CrossRef]
  • Walberg, H. J. (1981). A psychological theory of educational productivity. In F. H. Farley & N. Gordon (Eds.), Psychology and education (pp. 81–110). Berkeley, CA: McCutchan.
  • Woessmann, L. 2008. "How equal are educational opportunities? Family background and student achievement in Europe and the United States." ZeitschriftfürBetriebswirtschaft, 78(1), 45–70.
  • Yoo, J. E. (2018). TIMSS 2011 student and teacher predictors for mathematics achievement explored and identified via elastic net. Frontiers in Psychology, 9, Article 317. [CrossRef]
  • Yu, C. H., Kaprolet, C., Jannasch-Pennell, A., & DiGangi, S. (2012). A data mining approach to comparing American and Canadian grade 10 students’ PISA science test performance. Journal of Data Science, 10(24), 441–464. [CrossRef]
There are 35 citations in total.

Details

Primary Language English
Subjects Econometrics (Other)
Journal Section Makaleler
Authors

Selin Erdoğan 0000-0001-8903-3068

Hüseyin Taştan 0000-0002-2701-1039

Publication Date June 28, 2024
Submission Date March 29, 2024
Acceptance Date May 21, 2024
Published in Issue Year 2024

Cite

APA Erdoğan, S., & Taştan, H. (2024). Predicting Student Achievement via Machine Learning: Evidence from Turkish Subset of PISA. Yildiz Social Science Review, 10(1), 7-27. https://doi.org/10.51803/yssr.1461030