TY - JOUR T1 - Performances Based on Ability Estimation of the Methods of Detecting Differential Item Functioning: A Simulation Study AU - Ertuna, Levent AU - Uysal, İbrahim AU - Ertaş, F. Güneş AU - Kelecioğlu, Hülya PY - 2019 DA - June Y2 - 2019 DO - 10.21031/epod.534312 JF - Journal of Measurement and Evaluation in Education and Psychology JO - JMEEP PB - Association for Measurement and Evaluation in Education and Psychology WT - DergiPark SN - 1309-6575 SP - 133 EP - 148 VL - 10 IS - 2 LA - en AB - The aim ofthe study is to examine differential item functioning (DIF) detection methods—thesimultaneous item bias test (SIBTEST), Item Response Theory likelihood ratio (IRT-LR),Lord chi square (χ2), and Raju area measures—based on ability estimates whenpurifying items with DIF from the test, considering conditions of ratio of theitems with DIF, effect size of DIF, and type of DIF. This study is a simulationstudy and 50 replications were conducted for each condition. In order tocompare DIF detection methods, error (RMSD) and coefficient of concordance(Pearson’s correlation coefficient) were calculated according to estimated andinitial abilities for the reference group. As a result of the study, the lowesterror and the highest concordance were seen in the case of 10% uniform DIF inthe test and the method of IRT-LR, considering all other conditions. Moreover,for the method of SIBTEST and IRT-LR in all conditions, it was found that theerror obtained by purifying items with C level DIF is lower than the errorobtained by purifying items with both B and C level DIF. Similarly, for themethod of SIBTEST and IRT-LR in all conditions, it was seen that theconcordance coefficient found by purifying C level DIF is higher than thecoefficient by purifying items with both B and C level DIF. KW - Differential item functioning KW - simulation KW - ratio of the items with DIF KW - type of DIF CR - Atalay Kabasakal, K., Arsan, N., Gök, B., & Kelecioğlu, H. (2014). Comparing performances (type I error and power) of IRT Likelihood Ratio SIBTEST and Mantel-Haenszel methods in the determination of differential item functioning. Educational Sciences: Theory & Practice, 14(6), 2175–2193. doi: 10.12738/estp.2014.6.2165 CR - Camilli, G. (1993). The case against item bias detection techniques based on internal criteria: Do item bias procedures obscure test fairness issues? In P. W. Holland & H. Wainer (Eds.), Differential Item Functioning (pp. 397–418). New York: Routledge. CR - Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. California: Sage. CR - Cheng, C. M. (2005). A study on Differential Item Functioning of the basic mathematical competence test for junior high schools in Taiwan (Doctoral Dissertation). Available from ProOuest Dissertations and Theses database (UMI No. 3189625). CR - Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31-47. doi: 10.1111/j.1745-3992.1998.tb00619.x CR - Cohen, A. S., & Kim, S. (1993). A comparison of Lord’s Chi Square and Raju’s Area Measures in detection of DIF. Applied Psychological Measurement, 17(1), 39–52. doi: 10.1177/014662169301700109 CR - Cromwell, S. (2002, February). A primer on ways to explore item bias. Paper presented at the Annual Meeting of the Southwest Educational Research Association, Austin, TX. CR - Downing, S. M., & Haladyna, T. M. (2004). Validity threats: Overcoming interference with proposed interpretations of assessment data. Medical Education, 38(3), 327-333. https://doi.org/10.1046/j.1365-2923.2004.01777.x CR - Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT Likelihood Ratio. Applied Psychological Measurement, 29(4), 278-295. doi: 10.1177/0146621605275728 CR - Finch, W. H., & French, B. F. (2007). Detection of crossing differential item functioning a comparison of four methods. Educational and Psychological Measurement, 67(4), 565-582. doi: 10.1177/0013164406296975 CR - Glas, C. A. W., & Meijer, R. R. (2003). A Bayesian approach to person fit analysis in Item Response Theory models. Applied Psychological Measurement, 27(3), 217-233. doi: 10.1177/0146621603252216 CR - Golia S. (2010). The assessment of DIF on Rasch measures with an application to job satisfaction. Electronic Journal of Applied Statistical Analysis: Decision Support Systems and Services Evaluation, 1(1) 16–25. doi: 10.1285/i2037-3627v1n1p16 CR - Golia S. (2015). Assessing the impact of uniform and nonuniform differential item functioning items on Rasch measure: The polytomous case. Computational Statistics, 30, 441–461. doi: 10.1007/s00180-014-0542-x CR - Green, S., & Salkind, N. (2005). Using SPSS for Windows and Macintosh: Analyzing and understanding data (4th Ed). Upper Saddle River, NJ: Pearson. CR - Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of Item Response Theory. California: Sage. CR - Han, K. T. (2007). WinGen: Windows software that generates IRT parameters and item responses. Applied Psychological Measurement, 31(5), 457-459. doi: 10.1177/0146621607299271 CR - Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in Item Response Theory. Applied Psychological Measurement, 20(2), 101-125. doi: 10.1177/014662169602000201 CR - Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the Logistic Regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329-349. doi: 10.1207/S15324818AME1404_2 CR - Kelecioğlu, H., Karabay, B., & Karabay, E. (2014). Investigation of placement test in terms of item biasness. Elementary Education Online, 13(3), 934–953. CR - Kim, S., & Lee, W. (2004). IRT scale linking methods for mixed-format tests (ACT Research Report 2004-5). Iowa City, IA: Act, Inc. CR - Kristanjansonn, E., Aylesworth, R., McDowell, I., & Zumbo, B. D. (2005). A comparison of four methods for detecting differential item functioning in ordered response model. Educational and Psychological Measurement, 65(6), 935-953. doi: 10.1177/0013164405275668 CR - Lee, Y. H., & Zhang, J. (2017). Effects of differential item functioning on examinees' test performance and reliability of test. International Journal of Testing, 17(1), 23-54. https://doi.org/10.1080/15305058.2016.1224888 CR - Lei, P-W., Chen, S-Y., & Yu, L. (2006). Comparing methods of assessing differential item functioning in a computerized adaptive testing environment. Journal of Educational Measurement, 43(3), 245-264. https://doi.org/10.1111/j.1745-3984.2006.00015.x CR - Li, H. H., & Stout, W. (1994). SIBTEST: A fortran V program for computing the simultaneous item bias DIF statistics. Department of Statistics, University of Illinois, Urbana Champaign. CR - Li, Z., & Zumbo, B. D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score data. Psicológica, 30, 343-370. CR - Lopez, G. E. (2012). Detection and classification of DIF types using parametric and nonparametric methods: A comparison of the IRT-Likelihood Ratio test, Crossing-SIBTEST, and Logistic Regression procedures (Doctoral dissertation). Retrieved from http://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=5327&context=etd CR - Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous Differential Item Functioning. Behavior Research Methods, 42(3), 847–862. doi: 10.3758/BRM.42.3.847 CR - Magis, D., Beland, S., & Raiche, G. (2013). difR: Collection of methods to detect dichotomous Differential Item Functioning (DIF) in psychometrics. R package version 5.0. http: //www.CRAN.R-project.org/package=difR CR - Mazor, K. M., Clauser, R. E., & Hambleton, R. K. (1993, March). Identification of nonuniform Differential Item Functioning using a variation of the Mantel-Haenszel procedure. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Francisco, CA. CR - Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105–118. doi: 10.3102/10769986007002105 CR - Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741-749. doi: 10.1037/0003-066X.50.9.741 CR - Muraki, E., & Bock, R. D. (2003). PARSCALE 4 for Windows: IRT based test scoring and item analysis for graded items and rating scales [Computer software]. Skokie, IL: Scientific Software International, Inc. CR - Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential. Applied Psychological Measurement, 18(4), 315-328. doi: 10.1177/014662169401800403 CR - Oshima, T. C., & Morris, S. (2008). Raju’s differential functioning of items and tests (DFIT). Educational Measurement: Issues and Practice, 27(3), 43-50. doi: 10.1111/j.1745-3992.2008.00127.x CR - Osterlind, S. J. (1983). Test item bias. Newbury Park, California: Sage. CR - Penfield, R. D., & Lam, T. C. M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19(3), 5–15. doi: 10.1111/j.1745-3992.2000.tb00033.x CR - Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197-207. doi: 10.1177/014662169001400208 CR - Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33(2), 215-230. Retrieved from http://www.jstor.org/stable/1435184 CR - Roznowski, M., & Reith, J. (1999). Examining the measurement quality of tests containing differentially functioning items: Do biased items result in poor measurement? Educational and Psychological Measurement, 59(2), 248-269. doi: 10.1177/00131649921969839 CR - Rupp, A. A., & Zumbo, B. D. (2003). Which model is best? Robustness properties to justify model choice among unidimensional IRT models under item parameter drift. Alberta Journal of Educational Research, 49, 264-276. CR - Rupp, A. A., & Zumbo, B. D. (2006). Understanding parameter invariance in unidimensional IRT models. Educational and Psychological Measurement, 66(1), 63-84. doi: 10.1177/0013164404273942 CR - Shepard, L., Camilli, G., & Averill, M. (1981). Comparison of procedures for detecting test-item bias with both internal and external ability criteria. Journal of Educational Statistics, 6(4), 317–375. doi: 10.2307/1164616 CR - Suh, Y. (2016). Effect size measures for differential item functioning in a multidimensional IRT model. Journal of Educational Measurement, 53(4), 403-430. https://doi.org/10.1111/jedm.12123 CR - Tennant, A., & Pallant, J. F. (2007). DIF matters: A practical approach to test if Differential Item Functioning makes a difference. Rasch Measurement Transactions, 20(4), 1082-1084. CR - Thissen, D. (2001). IRTLRDIF v.2.0b: Software for the computation of the statistics involved in Item Response Theory Likelihood-Ratio tests for Differential Item Functioning. L.L. Thurstone Psychometric Laboratory, University of North Carolina, Chapel Hill, NC. CR - Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67-113). Hillsdale, NJ: Lawrence Erlbaum Associates. CR - Wells, C. S., Subkoviak, M. J., & Serlin, R. C. (2002). The effect of item parameter drift on examinee ability estimates. Applied Psychological Measurement, 26(1), 77–87. doi: 10.1177/0146621602261005 CR - Zumbo, B. D. (1999). Handbook on the theory and methods of differential item functioning: Logistic regression modeling as a unitary framework for binary and likert-type item scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense. UR - https://doi.org/10.21031/epod.534312 L1 - https://dergipark.org.tr/en/download/article-file/740361 ER -