The purpose of this study is to examine the effect of missing data imputation methods, namely regression imputation (RI), multiple imputation (MI) and k-nearest neighbor (kNN) on differential item functioning (DIF). In this regard, the datasets used in the research were created by deleting some of the data via the missing completely at random mechanism from the complete datasets obtained from 600 students in Türkiye, the United Kingdom, the USA, New Zealand and Australia, who answered booklets 14 and 15 from the PISA 2018 science literacy test. Data imputation was applied to the datasets through missing data using RI, MI and kNN methods and DIF analysis was performed on all datasets in terms of language and gender variables via Lord’s χ2 method, Raju’s area measurement method and item response theory likelihood ratio method. DIF results from the complete datasets were taken as a reference and they were compared with the results from other datasets. As a result of the research, values close to 10% of accurate imputation were achieved in the RI method depending on language and gen-der variables. In MI and kNN methods, results closest to the complete datasets were obtained at a rate of 5% depending on the language variable. In the MI method, inaccurate results were obtained in all proportions in terms of the gender variable. For the gender variable, the kNN method gave accurate results at rates of 5% and 10%.
Differential item functioning Missing data Item response theory Raju’s area measurement Likelihood ratio
The purpose of this study is to examine the effect of missing data imputation methods, namely regression imputation (RI), multiple imputation (MI) and k-nearest neighbor (kNN) on differential item functioning (DIF). In this regard, the datasets used in the research were created by deleting some of the data via the missing completely at random mechanism from the complete datasets obtained from 600 students in Türkiye, the United Kingdom, the USA, New Zealand and Australia, who answered booklets 14 and 15 from the PISA 2018 science literacy test. Data imputation was applied to the datasets through missing data using RI, MI and kNN methods and DIF analysis was performed on all datasets in terms of language and gender variables via Lord’s χ2 method, Raju’s area measurement method and item response theory likelihood ratio method. DIF results from the complete datasets were taken as a reference and they were compared with the results from other datasets. As a result of the research, values close to 10% of accurate imputation were achieved in the RI method depending on language and gen-der variables. In MI and kNN methods, results closest to the complete datasets were obtained at a rate of 5% depending on the language variable. In the MI method, inaccurate results were obtained in all proportions in terms of the gender variable. For the gender variable, the kNN method gave accurate results at rates of 5% and 10%.
Differential item functioning Missing data Item response theory Raju’s area measurement Likelihood ratio
Primary Language | English |
---|---|
Subjects | Measurement Theories and Applications in Education and Psychology |
Journal Section | Articles |
Authors | |
Early Pub Date | August 27, 2024 |
Publication Date | September 9, 2024 |
Submission Date | January 12, 2024 |
Acceptance Date | April 28, 2024 |
Published in Issue | Year 2024 Volume: 11 Issue: 3 |