The Effect of the Weight Variable on Predicting Reading Comprehension Achievement in PISA 2018: A Data Mining Approach

Yusuf Kasap; Mustafa Köroğlu

doi:10.26466/opusjsr.1730221

Araştırma Makalesi

PISA 2018’de Okuduğunu Anlama Başarısının Yordanmasında Ağırlık Değişkeninin Etkisi: Veri Madenciliği Yaklaşımı

Yıl 2025, Cilt: 22 Sayı: 5, 1146 - 1159, 30.09.2025

Yusuf Kasap , Mustafa Köroğlu

https://doi.org/10.26466/opusjsr.1730221

Öz

Bu çalışma, öğrenci düzeyi örneklem ağırlıklarının başarı puanlarını yordamadaki model performansını nasıl etkilediğini incelemektedir. Analizlerde, 2018 PISA öğrenci anketinden elde edilen 34 bağımsız değişken kullanılarak Sınıflama ve Regresyon Ağacı (CART) ve Rastgele Orman (RF) yöntemleri uygulanmıştır. Türkiye’de daha önceki veri madenciliği çalışmalarında örneklem ağırlıkları dikkate alınmadığından, bu araştırma alana özgün bir katkı sunmaktadır. Bulgulara göre, örneklem ağırlıkları kullanıldığında CART yöntemiyle belirlenen on önemli değişkenden yalnızca biri farklılaşmış, ancak değişkenlerin önem sırası da değişmiştir. RF yöntemiyle oluşturulan modellerde ise yalnızca beş değişken ortak kalmış, diğerleri farklılık göstermiştir. Her iki yöntemde örneklem ağırlıkları dâhil edildiğinde, modellerin yordama performansında hafif fakat istatistiksel olarak anlamlı olmayan bir düşüş gözlenmiştir. Bu sonuçlar, örneklem ağırlıklarının değişken seçiminde etkili olduğunu ancak genel model doğruluğunu anlamlı biçimde etkilemediğini göstermektedir. Genel olarak, elde edilen bulgular, geniş ölçekli eğitimsel veri madenciliğinde geçerli ve güvenilir sonuçlar elde etmek için örneklem ağırlıklarının kullanılmasının gerekliliğini ortaya koymaktadır.

Anahtar Kelimeler

Sınıflama , Örneklem ağırlığı , Veri madenciliği

Kaynakça

Abad, F. M., & López, A. C. (2017). Data-mining techniques in detecting factors linked to academic achievement. School Effectiveness and School Improvement, 28(1), 39–55. https://doi.org/-10.1080/09243453.2016.1235591
Addey, C., Sellar, S., Steiner-Khamsi, G., Lingard, B., & Verger, A. (2017). Forum discussion: The rise of international large-scale assessments and rationales for participation. Compare, 47(3), 434–452. https://doi.org/10.1080-/03057925.2017.1301399
Aksu, G., & Güzeller, C. O. (2016). Classification of PISA 2012 mathematical literacy scores using decision-tree method: Turkey sampling. Education and Science, 41(185), 101–122. https://doi.org/10.15390/EB.2016.4766
Arıkan, S., Özer, F., Şeker, V., & Ertaş, G. (2020). The importance of sample weights and plausible values in large-scale tests [Geniş ölçekli testlerde örneklem ağırlıklarının ve olası değerlerin önemi]. Journal of Measurement and Evaluation in Education and Psychology, 11(1), 43–60. https://doi.org/10.21031/epod.602765
Asparouhov, T. (2005). Sampling weights in latent variable modeling. Structural Equation Modeling, 12(3), 411–434. https://doi.org/10.1207/s1532-8007sem1203_4
Bezek Güre, Ö., Kayri, M., & Erdoğan, F. (2020). Analysis of factors affecting PISA 2015 mathematics literacy via educational data mining [PISA 2015 matematik okuryazarlığını etkileyen faktörlerin eğitimsel veri madenciliği ile analizi]. Education and Science, 45(202), 393–415. https://doi.org/10.15390/EB.2020.8477
Büyüköztürk, Ş., Kılıç-Çakmak, E., Akgün, Ö. E., Karadeniz, Ş., & Demirel, F. (2018). Scientific research methods (25th ed.) [Bilimsel araştırma yöntemleri]. Pegem Academi Publishing.
Chiu, M. M., & McBride-Chang, C. (2010). Family and reading in 41 countries. Scientific Studies of Reading, 14(6), 514–543. https://doi.org/-10.1080/10888431003623520
Cochran, W. G. (1977). Sampling techniques (3rd ed.). John Wiley & Sons.
Cunningham, A. E., & Stanovich, K. E. (1998). What reading does for the mind. American Educator, 22(1–2), 8–15.
Domingos, P. (2000). A unified bias-variance decomposition and its applications. In Proceedings of the 17th International Conference on Machine Learning (pp. 231–238). Morgan Kaufmann.
Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78–87. https://doi.org/-10.1145/2347736.2347755
Field, A. (2009). Discovering statistics using SPSS (3rd ed.). Sage.
Gamazo, A., & Abad, F. M. (2020). An exploration of factors linked to academic performance in PISA 2018 through data mining techniques. Frontiers in Psychology, 11, 575167. https://doi.org/10.3389/fpsyg.2020.575167
Grilli, L., & Pratesi, M. (2004). Weighted estimation in multilevel ordinal and binary models in the presence of informative sampling designs. Survey Methodology, 30(1), 93–103.
Hamilton, L. S. (2003). Assessment as a policy tool. Review of Research in Education, 27(1), 25–68. https://doi.org/10.3102/0091732X027001025
Hong, E. (1999). Test anxiety, perceived test difficulty, and test performance: Temporal patterns of their effects. Learning and Individual Differences, 11(4), 431–447. https://doi.org/10.-1016/S1041-6080(99)80012-0
Kiray, S. A., Gok, B., & Bozkir, A. S. (2015). Identifying the factors affecting science and mathematics achievement using data mining methods. Journal of Education in Science, Environment and Health, 1(1), 28–48. https://doi.-org/10.21891/jeseh.41216
Kish, L. (1992). Weighting for unequal Pi. Journal of Official Statistics, 8(2), 183–200.
LaRoche, S., & Foy, P. (2016). Sample design in TIMSS Advanced 2015. In M. O. Martin, I. V. S. Mullis, & M. Hooper (Eds.), Methods and procedures in TIMSS Advanced 2015 (pp. 3.1–3.27). IEA.
Lohr, S. L. (2010). Sampling: Design and analysis (2nd ed.). Brooks/Cole.
MEB. (2016a). PISA 2015 national report [PISA 2015 ulusal raporu]. Ministry of National Education.
MEB. (2019). PISA 2018 preliminary national report [PISA 2018 ulusal ön raporu]. Ministry of National Education.
Meinck, S. (2015). Computing sampling weights in large-scale assessments in education. Survey Insights: Methods from the Field. https://surveyinsights.org/?p=5353
Ng, A. Y. (1997). Preventing “overfitting” of cross-validation data. In Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann.
OECD. (2009). Survey weighting and the calculation of sampling variance. In PISA 2006 technical report. OECD Publishing.
OECD. (2017). PISA 2015 assessment and analytical framework (rev. ed.). OECD Publishing. https://doi.org/10.1787/9789264281820-en
OECD. (2019). PISA 2018 assessment and analytical framework. OECD Publishing. https://doi.org-/10.1787/b25efab8-en
Pfeffermann, D., Skinner, C. J., Holmes, D. J., Goldstein, H., & Rasbash, J. (1998). Weighting for unequal selection probabilities in multilevel models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(1), 23–40. https://doi.org/10.1111/1467-9868.00106
Rust, K. (2013). Sampling, weighting, and variance estimation in international large-scale assessments. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment (pp. 117–154). Chapman & Hall/CRC. https://doi.org/10.1201/b16061
Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational Researcher, 39(2), 142–151. https://doi.org/10.3102/0013189-X10363170
Rutkowski, L., von Davier, M., & Rutkowski, D. (Eds.). (2013). Handbook of international large-scale assessment. CRC Press. https://doi.-org/10.1201/b16061
Särndal, C.-E., Swensson, B., & Wretman, J. (1992). Model assisted survey sampling. Springer-Verlag. Shah, S. O., & Hussain, M. (2021). Parental occupation and its effect on the academic performance of children. JETIR, 8(8).
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Pearson/Allyn & Bacon.
Tat, O., Koyuncu, İ., & Gelbal, S. (2019). The effect of using plausible values and weights on linear regression and HLM parameters [Makul değer ve ağırlıklandırma kullanımının doğrusal regresyon ve HLM parametrelerine etkisi]. Journal of Measurement and Evaluation in Education and Psychology, 10(3), 235–248.
Von Davier, M., Gonzalez, E., & Mislevy, R. (2009). What are plausible values and why are they useful. IERI Monograph Series, 2(1), 9–36.
Waldow, F. (2009). What PISA did and did not do: Germany after the “PISA-shock”. European Educational Research Journal, 8(3), 476–483. https://doi.org/10.2304/eerj.2009.8.3.476
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450. https://doi.org/10.1007-/BF02294627
Wiseman, A. W. (2013). Policy responses to PISA in comparative perspective. In H. D. Meyer & A. Benavot (Eds.), PISA, power, and policy: The emergence of global educational governance (pp. 303–322). Symposium Books.
Wu, M. (2005). The role of plausible values in large-scale surveys. Studies in Educational Evaluation, 31(2–3), 114–128. https://doi.org/10.-1016/j.stueduc.2005.05.005
Yung, J. L., Hsu, Y.-C., & Rice, K. (2012). Integrating data mining in program evaluation of K-12 online education. Journal of Educational Technology & Society, 15(3), 27–41.

The Effect of the Weight Variable on Predicting Reading Comprehension Achievement in PISA 2018: A Data Mining Approach

Yıl 2025, Cilt: 22 Sayı: 5, 1146 - 1159, 30.09.2025

Yusuf Kasap , Mustafa Köroğlu

https://doi.org/10.26466/opusjsr.1730221

Öz

This study investigates how student-level sample weights affect model performance in predicting achievement scores. The analyses employed Classification and Regression Tree (CART) and Random Forest (RF) methods with 34 independent variables from the 2018 PISA student survey. Since no prior data mining studies in Turkey have considered sample weights, this research provides an original contribution to the field. According to the findings, when sample weights were used, only one of the ten significant variables identified by the CART method differed, while the order of variable importance also shifted. In the models created with the RF method, only five variables remained common, and the others differed. When sample weights were included in both methods, a slight, statistically non-significant decrease was observed in the prediction performance of the models. These results indicate that sample weights are effective in variable selection but do not significantly affect overall model accuracy. Overall, the findings highlight the necessity of incorporating sample weights to ensure valid and reliable results in large-scale educational data mining.

Anahtar Kelimeler

Classification , Sample weight , Data mining

Kaynakça

Abad, F. M., & López, A. C. (2017). Data-mining techniques in detecting factors linked to academic achievement. School Effectiveness and School Improvement, 28(1), 39–55. https://doi.org/-10.1080/09243453.2016.1235591
Addey, C., Sellar, S., Steiner-Khamsi, G., Lingard, B., & Verger, A. (2017). Forum discussion: The rise of international large-scale assessments and rationales for participation. Compare, 47(3), 434–452. https://doi.org/10.1080-/03057925.2017.1301399
Aksu, G., & Güzeller, C. O. (2016). Classification of PISA 2012 mathematical literacy scores using decision-tree method: Turkey sampling. Education and Science, 41(185), 101–122. https://doi.org/10.15390/EB.2016.4766
Arıkan, S., Özer, F., Şeker, V., & Ertaş, G. (2020). The importance of sample weights and plausible values in large-scale tests [Geniş ölçekli testlerde örneklem ağırlıklarının ve olası değerlerin önemi]. Journal of Measurement and Evaluation in Education and Psychology, 11(1), 43–60. https://doi.org/10.21031/epod.602765
Asparouhov, T. (2005). Sampling weights in latent variable modeling. Structural Equation Modeling, 12(3), 411–434. https://doi.org/10.1207/s1532-8007sem1203_4
Bezek Güre, Ö., Kayri, M., & Erdoğan, F. (2020). Analysis of factors affecting PISA 2015 mathematics literacy via educational data mining [PISA 2015 matematik okuryazarlığını etkileyen faktörlerin eğitimsel veri madenciliği ile analizi]. Education and Science, 45(202), 393–415. https://doi.org/10.15390/EB.2020.8477
Büyüköztürk, Ş., Kılıç-Çakmak, E., Akgün, Ö. E., Karadeniz, Ş., & Demirel, F. (2018). Scientific research methods (25th ed.) [Bilimsel araştırma yöntemleri]. Pegem Academi Publishing.
Chiu, M. M., & McBride-Chang, C. (2010). Family and reading in 41 countries. Scientific Studies of Reading, 14(6), 514–543. https://doi.org/-10.1080/10888431003623520
Cochran, W. G. (1977). Sampling techniques (3rd ed.). John Wiley & Sons.
Cunningham, A. E., & Stanovich, K. E. (1998). What reading does for the mind. American Educator, 22(1–2), 8–15.
Domingos, P. (2000). A unified bias-variance decomposition and its applications. In Proceedings of the 17th International Conference on Machine Learning (pp. 231–238). Morgan Kaufmann.
Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78–87. https://doi.org/-10.1145/2347736.2347755
Field, A. (2009). Discovering statistics using SPSS (3rd ed.). Sage.
Gamazo, A., & Abad, F. M. (2020). An exploration of factors linked to academic performance in PISA 2018 through data mining techniques. Frontiers in Psychology, 11, 575167. https://doi.org/10.3389/fpsyg.2020.575167
Grilli, L., & Pratesi, M. (2004). Weighted estimation in multilevel ordinal and binary models in the presence of informative sampling designs. Survey Methodology, 30(1), 93–103.
Hamilton, L. S. (2003). Assessment as a policy tool. Review of Research in Education, 27(1), 25–68. https://doi.org/10.3102/0091732X027001025
Hong, E. (1999). Test anxiety, perceived test difficulty, and test performance: Temporal patterns of their effects. Learning and Individual Differences, 11(4), 431–447. https://doi.org/10.-1016/S1041-6080(99)80012-0
Kiray, S. A., Gok, B., & Bozkir, A. S. (2015). Identifying the factors affecting science and mathematics achievement using data mining methods. Journal of Education in Science, Environment and Health, 1(1), 28–48. https://doi.-org/10.21891/jeseh.41216
Kish, L. (1992). Weighting for unequal Pi. Journal of Official Statistics, 8(2), 183–200.
LaRoche, S., & Foy, P. (2016). Sample design in TIMSS Advanced 2015. In M. O. Martin, I. V. S. Mullis, & M. Hooper (Eds.), Methods and procedures in TIMSS Advanced 2015 (pp. 3.1–3.27). IEA.
Lohr, S. L. (2010). Sampling: Design and analysis (2nd ed.). Brooks/Cole.
MEB. (2016a). PISA 2015 national report [PISA 2015 ulusal raporu]. Ministry of National Education.
MEB. (2019). PISA 2018 preliminary national report [PISA 2018 ulusal ön raporu]. Ministry of National Education.
Meinck, S. (2015). Computing sampling weights in large-scale assessments in education. Survey Insights: Methods from the Field. https://surveyinsights.org/?p=5353
Ng, A. Y. (1997). Preventing “overfitting” of cross-validation data. In Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann.
OECD. (2009). Survey weighting and the calculation of sampling variance. In PISA 2006 technical report. OECD Publishing.
OECD. (2017). PISA 2015 assessment and analytical framework (rev. ed.). OECD Publishing. https://doi.org/10.1787/9789264281820-en
OECD. (2019). PISA 2018 assessment and analytical framework. OECD Publishing. https://doi.org-/10.1787/b25efab8-en
Pfeffermann, D., Skinner, C. J., Holmes, D. J., Goldstein, H., & Rasbash, J. (1998). Weighting for unequal selection probabilities in multilevel models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(1), 23–40. https://doi.org/10.1111/1467-9868.00106
Rust, K. (2013). Sampling, weighting, and variance estimation in international large-scale assessments. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment (pp. 117–154). Chapman & Hall/CRC. https://doi.org/10.1201/b16061
Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational Researcher, 39(2), 142–151. https://doi.org/10.3102/0013189-X10363170
Rutkowski, L., von Davier, M., & Rutkowski, D. (Eds.). (2013). Handbook of international large-scale assessment. CRC Press. https://doi.-org/10.1201/b16061
Särndal, C.-E., Swensson, B., & Wretman, J. (1992). Model assisted survey sampling. Springer-Verlag. Shah, S. O., & Hussain, M. (2021). Parental occupation and its effect on the academic performance of children. JETIR, 8(8).
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Pearson/Allyn & Bacon.
Tat, O., Koyuncu, İ., & Gelbal, S. (2019). The effect of using plausible values and weights on linear regression and HLM parameters [Makul değer ve ağırlıklandırma kullanımının doğrusal regresyon ve HLM parametrelerine etkisi]. Journal of Measurement and Evaluation in Education and Psychology, 10(3), 235–248.
Von Davier, M., Gonzalez, E., & Mislevy, R. (2009). What are plausible values and why are they useful. IERI Monograph Series, 2(1), 9–36.
Waldow, F. (2009). What PISA did and did not do: Germany after the “PISA-shock”. European Educational Research Journal, 8(3), 476–483. https://doi.org/10.2304/eerj.2009.8.3.476
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450. https://doi.org/10.1007-/BF02294627
Wiseman, A. W. (2013). Policy responses to PISA in comparative perspective. In H. D. Meyer & A. Benavot (Eds.), PISA, power, and policy: The emergence of global educational governance (pp. 303–322). Symposium Books.
Wu, M. (2005). The role of plausible values in large-scale surveys. Studies in Educational Evaluation, 31(2–3), 114–128. https://doi.org/10.-1016/j.stueduc.2005.05.005
Yung, J. L., Hsu, Y.-C., & Rice, K. (2012). Integrating data mining in program evaluation of K-12 online education. Journal of Educational Technology & Society, 15(3), 27–41.

Toplam 41 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Psikolojik Metodoloji, Tasarım ve Analiz
Bölüm	Research Articles
Yazarlar	Yusuf Kasap 0000-0002-5114-1175 Mustafa Köroğlu 0000-0001-9610-8523
Erken Görünüm Tarihi	28 Eylül 2025
Yayımlanma Tarihi	30 Eylül 2025
Gönderilme Tarihi	29 Haziran 2025
Kabul Tarihi	27 Eylül 2025
Yayımlandığı Sayı	Yıl 2025 Cilt: 22 Sayı: 5

Kaynak Göster

APA	Kasap, Y., & Köroğlu, M. (2025). The Effect of the Weight Variable on Predicting Reading Comprehension Achievement in PISA 2018: A Data Mining Approach. OPUS Journal of Society Research, 22(5), 1146-1159. https://doi.org/10.26466/opusjsr.1730221

Kapak Resmi İndir

Makale Dosyaları

Tam Metin