Research Article

The Effect of the Weight Variable on Predicting Reading Comprehension Achievement in PISA 2018: A Data Mining Approach

Volume: 22 Number: 5 September 30, 2025
TR EN

The Effect of the Weight Variable on Predicting Reading Comprehension Achievement in PISA 2018: A Data Mining Approach

Abstract

This study investigates how student-level sample weights affect model performance in predicting achievement scores. The analyses employed Classification and Regression Tree (CART) and Random Forest (RF) methods with 34 independent variables from the 2018 PISA student survey. Since no prior data mining studies in Turkey have considered sample weights, this research provides an original contribution to the field. According to the findings, when sample weights were used, only one of the ten significant variables identified by the CART method differed, while the order of variable importance also shifted. In the models created with the RF method, only five variables remained common, and the others differed. When sample weights were included in both methods, a slight, statistically non-significant decrease was observed in the prediction performance of the models. These results indicate that sample weights are effective in variable selection but do not significantly affect overall model accuracy. Overall, the findings highlight the necessity of incorporating sample weights to ensure valid and reliable results in large-scale educational data mining.

Keywords

Classification, Sample weight, Data mining

References

  1. Abad, F. M., & López, A. C. (2017). Data-mining techniques in detecting factors linked to academic achievement. School Effectiveness and School Improvement, 28(1), 39–55. https://doi.org/-10.1080/09243453.2016.1235591
  2. Addey, C., Sellar, S., Steiner-Khamsi, G., Lingard, B., & Verger, A. (2017). Forum discussion: The rise of international large-scale assessments and rationales for participation. Compare, 47(3), 434–452. https://doi.org/10.1080-/03057925.2017.1301399
  3. Aksu, G., & Güzeller, C. O. (2016). Classification of PISA 2012 mathematical literacy scores using decision-tree method: Turkey sampling. Education and Science, 41(185), 101–122. https://doi.org/10.15390/EB.2016.4766
  4. Arıkan, S., Özer, F., Şeker, V., & Ertaş, G. (2020). The importance of sample weights and plausible values in large-scale tests [Geniş ölçekli testlerde örneklem ağırlıklarının ve olası değerlerin önemi]. Journal of Measurement and Evaluation in Education and Psychology, 11(1), 43–60. https://doi.org/10.21031/epod.602765
  5. Asparouhov, T. (2005). Sampling weights in latent variable modeling. Structural Equation Modeling, 12(3), 411–434. https://doi.org/10.1207/s1532-8007sem1203_4
  6. Bezek Güre, Ö., Kayri, M., & Erdoğan, F. (2020). Analysis of factors affecting PISA 2015 mathematics literacy via educational data mining [PISA 2015 matematik okuryazarlığını etkileyen faktörlerin eğitimsel veri madenciliği ile analizi]. Education and Science, 45(202), 393–415. https://doi.org/10.15390/EB.2020.8477
  7. Büyüköztürk, Ş., Kılıç-Çakmak, E., Akgün, Ö. E., Karadeniz, Ş., & Demirel, F. (2018). Scientific research methods (25th ed.) [Bilimsel araştırma yöntemleri]. Pegem Academi Publishing.
  8. Chiu, M. M., & McBride-Chang, C. (2010). Family and reading in 41 countries. Scientific Studies of Reading, 14(6), 514–543. https://doi.org/-10.1080/10888431003623520
  9. Cochran, W. G. (1977). Sampling techniques (3rd ed.). John Wiley & Sons.
  10. Cunningham, A. E., & Stanovich, K. E. (1998). What reading does for the mind. American Educator, 22(1–2), 8–15.
APA
Kasap, Y., & Köroğlu, M. (2025). The Effect of the Weight Variable on Predicting Reading Comprehension Achievement in PISA 2018: A Data Mining Approach. OPUS Journal of Society Research, 22(5), 1146-1159. https://doi.org/10.26466/opusjsr.1730221
AMA
1.Kasap Y, Köroğlu M. The Effect of the Weight Variable on Predicting Reading Comprehension Achievement in PISA 2018: A Data Mining Approach. OPUS JSR. 2025;22(5):1146-1159. doi:10.26466/opusjsr.1730221
Chicago
Kasap, Yusuf, and Mustafa Köroğlu. 2025. “The Effect of the Weight Variable on Predicting Reading Comprehension Achievement in PISA 2018: A Data Mining Approach”. OPUS Journal of Society Research 22 (5): 1146-59. https://doi.org/10.26466/opusjsr.1730221.
EndNote
Kasap Y, Köroğlu M (September 1, 2025) The Effect of the Weight Variable on Predicting Reading Comprehension Achievement in PISA 2018: A Data Mining Approach. OPUS Journal of Society Research 22 5 1146–1159.
IEEE
[1]Y. Kasap and M. Köroğlu, “The Effect of the Weight Variable on Predicting Reading Comprehension Achievement in PISA 2018: A Data Mining Approach”, OPUS JSR, vol. 22, no. 5, pp. 1146–1159, Sept. 2025, doi: 10.26466/opusjsr.1730221.
ISNAD
Kasap, Yusuf - Köroğlu, Mustafa. “The Effect of the Weight Variable on Predicting Reading Comprehension Achievement in PISA 2018: A Data Mining Approach”. OPUS Journal of Society Research 22/5 (September 1, 2025): 1146-1159. https://doi.org/10.26466/opusjsr.1730221.
JAMA
1.Kasap Y, Köroğlu M. The Effect of the Weight Variable on Predicting Reading Comprehension Achievement in PISA 2018: A Data Mining Approach. OPUS JSR. 2025;22:1146–1159.
MLA
Kasap, Yusuf, and Mustafa Köroğlu. “The Effect of the Weight Variable on Predicting Reading Comprehension Achievement in PISA 2018: A Data Mining Approach”. OPUS Journal of Society Research, vol. 22, no. 5, Sept. 2025, pp. 1146-59, doi:10.26466/opusjsr.1730221.
Vancouver
1.Yusuf Kasap, Mustafa Köroğlu. The Effect of the Weight Variable on Predicting Reading Comprehension Achievement in PISA 2018: A Data Mining Approach. OPUS JSR. 2025 Sep. 1;22(5):1146-59. doi:10.26466/opusjsr.1730221