TY - JOUR TT - The Effect of Background Variables on Gender Related Differential Item Functioning AU - Kıbrıslıoğlu Uysal, Nermin AU - Atalay Kabasakal, Kübra PY - 2017 DA - December Y2 - 2017 DO - 10.21031/epod.333451 JF - Journal of Measurement and Evaluation in Education and Psychology JO - JMEEP PB - Association for Measurement and Evaluation in Education and Psychology WT - DergiPark SN - 1309-6575 SP - 373 EP - 390 VL - 8 IS - 4 KW - sosyoekonomik düzey KW - okuma başarısı KW - değişen madde fonksiyonu KW - MIMIC N2 - In thisstudy, the effect of socioeconomic status and reading ability, on the presenceof gender-related DIF were examined. For this purpose, presence of differentialitem functioning (DIF) between gender groups in PISA 2015 science items in nineselected countries were detected. One cluster of science items fromcomputer-based assessment (CBA) was taken into consideration. The countrieswere selected among the ones that implemented CBA, on the basis of their rankin science achievement. Multiple Indicator Multiple Causes method (MIMIC) wasused for DIF analyses. DIF analysis in the MIMIC involves fit comparisons ofboth full and reduced models to determine if the items can measure the latenttrait equally among the specified groups. The MIMIC analysis was conducted intwo steps. First, the items were tested for exhibiting DIF between gendergroups. Then the socioeconomic status and the reading ability were added to themodel to test gender-related DIF items and their effects, respectively.According to the results of the study, gender-related DIF appeared in all ofthe selected countries with between two and six items. In four of the countries,none of the selected variables significantly affected the presence ofgender-related DIF. Instead, in the remaining countries, the number ofgender-related DIF items was decreased by adding selected variables to themodel. The effects of variables which reduced the number of gender-related DIFitems were discussed within each country. CR - Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67–91. doi:10.1111/j.1745-3984.1992.tb00368.x CR - Allalouf, A., Hambleton, R. K., & Sireci, S. G. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educational Measurement,36(3), 185–198. CR - Asil, M. ve Gelbal, S. (2012). PISA öğrenci anketinin kültürler arası eşdeğerliği. Eğitim ve Bilim, 37(166), 236-249. CR - Atalay Kabasakal, K., ve Kelecioğlu, H. (2012). PISA 2006 öğrenci anketinde yer alan maddelerin değişen madde fonksiyonu açısından incelenmesi. Ankara Üniversitesi Eğitim Bilimleri Fakültesi Dergisi, 45(2), 77-96. CR - Barr, A. B. (2015). Family socioeconomic status, family health, and changes in students' math achievement across high school: A mediational model. Social Science & Medicine, 140, 27-34. CR - Budgell, G. R., Raju, N. S., & Quartetti, D. A. (1995). Analysis of differential item functioning in translated assessment instruments. Applied Psychological Measurement, 19(4), 309-321 CR - Camilli, G. (1993). The case against item bias techniques based on internal criteria: Do item bias procedures obscure test fairness issues? The use of differential item functioning statistics: A discussion of current practice and future implications. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 397-413). Hillsdale, NJ: Lawrence Erlbaum CR - Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. London: Sage publications. CR - Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253-260. CR - Chun, S. (2014). Using MIMIC methods to detect and ıdentify sources of DIF among multiple groups. Unpublished master thesis. University of South Florida, USA Clauser, B., Mazor, K., & Hambleton, R. K. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel–Haenszel procedure. Applied Measurement in Education, 6, 269-279. CR - Coleman, J. S., Campbell, E. Q., Hobson, C. J., McPartland, J., Mood, A. M., Weinfeld, F. D., & York, R. (1966). Equality of educational opportunity. Washington, DC, 1066-5684. CR - Cromley, G. J. (2009). Reading achievement and science proficiency: International comparisons from the programme on ınternational student assessment. Reading Psychology, 30(2), 89-118. CR - Demps, D. L., & Onwuegbuzie, A. J. (2001). The relationship between eighthgrade reading scores and achievement on the Georgia High School Graduation Test. Research in the Schools, 8(2), 1–9. CR - Dorans, N.J., & Holland, P.W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P.W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35–66). Hilsdale: Lawrence Erlbaum Associates. CR - Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23, 355-368. CR - Ercikan, K., & Koh, K. (2005). Examining the construct comparability of the English and French versions of timss. International Journal of Testing, 5(1), 23 - 35. CR - Fleishman, J. A., Spector, W. D., & Altman, B. M. (2002). Impact of differential item functioning on age and gender differences in functional disability. Journal of Gerontology: Social Sciences,57(5), 275-284 CR - Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295. CR - French, B. F., & Maller, S. J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67, 373-393. CR - Gallo, J. J., Anthony, J. C., & Muthe´n, B. O. (1994). Age differences in the symptoms of depression: A latent trait analysis. Journal of Gerontology: Psychological Sciences, 49, 251-264. CR - Glöckner-Rist, A., & Hoitjink, H. (2003). The best of both worlds: Factor analysis of dichotomous data using item response theory and structural equation modeling. Structural Equation Modeling, 10, 544-565. CR - Grisaya, A., & Monseur, C. (2007). Measuring the equivalence of item difficulty in the various versions of an international test. Studies in Educational Evaluation, 33(1), 69-86 CR - Hambleton, R. K. , Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage. CR - Hecht, S. A., Burgess, S. R., Torgesen, J.K., Wagner, R. K., & Rashotte, C.A (2000).Explaining social class differences in growth of reading skills from beginning kindergarten through fourth-grade: The role of phonological awareness, rate of access, and print knowledge. Reading and Writing, 12(1),99-128. CR - Holland, W. P., & Thayer, D. T. (1988). Differential item performance and the Mantel–Haenszel procedure.In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum. CR - Huang, X., Wilson, M., & Wang, L. (2016). Exploring plausible causes of differential item functioning in the PISA science assessment: language, curriculum or culture. Educational Psychology, 36 (2), 378-390. http://dx.doi.org/10.1080/01443410.2014.946890 CR - Husin, M. (2014). Assesiıng mathematical competence in second language: exploring dif evidences from PISA Malaysian data. Unpublished master thesis, University of Wisconsion, Milwaukee. CR - Hyde, J.,S. & Lin, M. (1988). Gender differences in verbal ability: A meta-analysis. Psychological Bulletin, 104(1), 53-69. Doi: http://dx.doi.org/10.1037/0033-2909.104.1.53 CR - Joreskog, K., & A. S. Goldberger (1975). Estimation of a model with a multiple indicators and multiple causes of a single latent variable. Journal of American Statistical Association,70, 631-639. CR - Lan, M. C. (2014). Exploring gender differential item functioning (DIF) in eight grade mathematics items for the United States and Taiwan. Unpublished doctoral disstertation.University of Washington. CR - Le, L. T. (2009). Investigating gender differential item functioning across countries and test languages for PISA science items. International Journal of Testing, 9(2), 122-133. http://dx.doi.org/10.1080/15305050902880769 CR - Levine, D. W., Bowen, D. J., Kaplan, R. M., Kripke, D. F., Naughton, M. J., & Shumaker, S. A. (2003). Factor structure and measurement invariance of the women’s health initiative ınsomnia rating scale.Psychological Assessment,15, 123-136. CR - Logan S.& Johnstone, R. (2009). Gender differences in reading ability and attitudes: examining where these differences lie. Journal of Research in Reading, 32(2), 199-214 doi: 10.1111/j.1467-9817.2008.01389.x CR - Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ:Lawrence Erlbaum. CR - Lyons-Thomas, J., Sandilands, D., D. & Ercikan, K. (2014). Gender differential item functioning in mathematics in four ınternational jurisdictions. Education and Science, 39 (172), 20-32. CR - MacIntosh, R., & Hashim, S. (2003). Variance estimation for converting MIMIC model parameters to IRT parameters in DIF analysis. Applied Psychological Measurement, 27, 372-379. CR - Mellenbergh, J. G. (1989). Item bias and item response theory. International Journal of Educational Research, 13(2),127-143. CR - Meredith, W., & Millsap, R. (1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57(2), 289-311. CR - Millsap, R., & Meredith, W. (1992). Inferential conditions in the statistical detection of measurement bias. Applied Psychological Measurement, 16(4), 389-402. CR - Muthén, B. O. (1985). A method for studying the homogeneity of test items with respect to other relevant variables. Journal of Educational Statistics, 10, 121-132. CR - Muthén, B. O., Kao, C. F., & Burstein, L. (1991). Instructionally sensitive psychometrics: Application of a new IRT-based detection technique to mathematics achievement test items. Journal of Educational Measurement, 28(1), 1-22. CR - Muthén, L., K. & Muthén, B., O. (2010). M plus Statistical Analysis with Latent Variables User’s Guide, Sixth Edition. Los Angeles, CA: Muthén & Muthén. CR - Nolen, S. B. (2003). Learning environment, motivation, and achievement in high school science. Journal of Research in Science Teaching, 40(4), 347–368. CR - OECD (2015). PISA 2015 Technical report. OECD: http://www.oecd.org/pisa/data/2015-technical-report/ CR - Oort, F. J. (1998). Simulation study of item bias detection with restricted factor analysis. Structural Equation Modeling, 5, 107-124. CR - O’Reilly, T., & McNamara, D. S. (2007). The impact of science knowledge, reading skill, and reading strategy knowledge on more traditional “high-stakes” measures of high school students’ science achievement. American Educational Research Journal, 44(1), 161–196. CR - NCES (2003). NAEP validity studies: An agenda for NEAP validity studies (Report No. 2003-07). Retrieved from https://nces.ed.gov/pubs2003/200307.pdf CR - Schmidt, W. H., Cogan, L. S., & McKnight, C. C. (2011). Equality of educational opportunity: Myth or reality in U.S. schooling?. American Educator 34(4), 12-19. CR - Shealy, R. T., & Stout, W. F. (1993). A model-biased standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159-194. CR - Shih, C. L., & Wang, W. C. (2009). Differential item functioning detection using multiple indicators, multiple causes method with a pure short anchor. Applied Psychological Measurement, 33(3),184-199. CR - Sireci, S. G., & Swaminathan, H. (1996). Evaluating Translation Equivalence: So What's the Big DIF? Paper presented at the AERA, Ellenville, NY. CR - Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370. CR - Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147-169). Hillsdale, NJ: Lawrence Erlbaum. CR - Wainer, H., Sireci, S., & Thissen, D. (1991). Differential testlet functioning: Definitions and detection. Journal of Educational Measurement, 28, 197-219. CR - Walker, C. M., Zhang, B. & Surber, J. (2008) Using a multidimensional differential item functioning framework to determine if reading ability affects student performance in mathematics. Applied Measurement in Education, 21(2), 162-181, DOI:10.1080/08957340801926201 CR - Wang, W. C., Shih, C. L., & Yang, C.C. (2009). The MIMIC method with scale purification procedure for detecting differential item functioning. Educational and Psychological Measurement, 69(5), 713-731. CR - Wang, W. C., & Su, Y. H. (2004a). Effects of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17, 113-144. CR - Wang, W.-C., & Su, Y.-H. (2004b). Factors influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of differential item functioning in polytomous items. Applied Psychological Measurement, 28, 450-480. CR - Wang, W.C., & Yeh, Y.L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498. CR - Welch, C. J., & Miller, T. R. (1995). Assessing differential item functioning in direct writing assessments: Problems and an example. Journal of Educational Measurement, 32, 163-178. CR - White, K. R. (1982). The relation between socioeconomic status and academic achievement, Psychological Bulletin, 91(3), 461-481 Wright, B. D. & Stone, M. H. (1979). Best test design. Chicago: MESA Press. CR - Woods, C. M. (2009). Evaluation of MIMIC-model methods for DIF testing with comparison to two-group analysis. Multivariate Behavioral Research, 44(1), 1-27. CR - Woods, C., M., & Grimm, K. J. (2011). Testing for nonuniform differential ıtem functioning with multiple ındicator multiple cause models. Applied Psychological Measurement, 35(5) 339–361. http://dx.doi.org/10.1177/0146621611405984. CR - Wu, A. D., & Ercikan, K. (2006). Using multiple-variable matching to identify culturalsources of differential item functioning. International Journal of Testing, 6(3),287300. CR - Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (dif):logistic regression modeling as a unitary framework for binary and likert-type(ordinal)item scores. Ottawa on Directorate of Human Resources Research and Evaluation, Department of National Defense. UR - https://doi.org/10.21031/epod.333451 L1 - https://dergipark.org.tr/en/download/article-file/361011 ER -