Research Article
BibTex RIS Cite
Year 2019, , 133 - 148, 28.06.2019
https://doi.org/10.21031/epod.534312

Abstract

References

  • Atalay Kabasakal, K., Arsan, N., Gök, B., & Kelecioğlu, H. (2014). Comparing performances (type I error and power) of IRT Likelihood Ratio SIBTEST and Mantel-Haenszel methods in the determination of differential item functioning. Educational Sciences: Theory & Practice, 14(6), 2175–2193. doi: 10.12738/estp.2014.6.2165
  • Camilli, G. (1993). The case against item bias detection techniques based on internal criteria: Do item bias procedures obscure test fairness issues? In P. W. Holland & H. Wainer (Eds.), Differential Item Functioning (pp. 397–418). New York: Routledge.
  • Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. California: Sage.
  • Cheng, C. M. (2005). A study on Differential Item Functioning of the basic mathematical competence test for junior high schools in Taiwan (Doctoral Dissertation). Available from ProOuest Dissertations and Theses database (UMI No. 3189625).
  • Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31-47. doi: 10.1111/j.1745-3992.1998.tb00619.x
  • Cohen, A. S., & Kim, S. (1993). A comparison of Lord’s Chi Square and Raju’s Area Measures in detection of DIF. Applied Psychological Measurement, 17(1), 39–52. doi: 10.1177/014662169301700109
  • Cromwell, S. (2002, February). A primer on ways to explore item bias. Paper presented at the Annual Meeting of the Southwest Educational Research Association, Austin, TX.
  • Downing, S. M., & Haladyna, T. M. (2004). Validity threats: Overcoming interference with proposed interpretations of assessment data. Medical Education, 38(3), 327-333. https://doi.org/10.1046/j.1365-2923.2004.01777.x
  • Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT Likelihood Ratio. Applied Psychological Measurement, 29(4), 278-295. doi: 10.1177/0146621605275728
  • Finch, W. H., & French, B. F. (2007). Detection of crossing differential item functioning a comparison of four methods. Educational and Psychological Measurement, 67(4), 565-582. doi: 10.1177/0013164406296975
  • Glas, C. A. W., & Meijer, R. R. (2003). A Bayesian approach to person fit analysis in Item Response Theory models. Applied Psychological Measurement, 27(3), 217-233. doi: 10.1177/0146621603252216
  • Golia S. (2010). The assessment of DIF on Rasch measures with an application to job satisfaction. Electronic Journal of Applied Statistical Analysis: Decision Support Systems and Services Evaluation, 1(1) 16–25. doi: 10.1285/i2037-3627v1n1p16
  • Golia S. (2015). Assessing the impact of uniform and nonuniform differential item functioning items on Rasch measure: The polytomous case. Computational Statistics, 30, 441–461. doi: 10.1007/s00180-014-0542-x
  • Green, S., & Salkind, N. (2005). Using SPSS for Windows and Macintosh: Analyzing and understanding data (4th Ed). Upper Saddle River, NJ: Pearson.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of Item Response Theory. California: Sage.
  • Han, K. T. (2007). WinGen: Windows software that generates IRT parameters and item responses. Applied Psychological Measurement, 31(5), 457-459. doi: 10.1177/0146621607299271
  • Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in Item Response Theory. Applied Psychological Measurement, 20(2), 101-125. doi: 10.1177/014662169602000201
  • Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the Logistic Regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329-349. doi: 10.1207/S15324818AME1404_2
  • Kelecioğlu, H., Karabay, B., & Karabay, E. (2014). Investigation of placement test in terms of item biasness. Elementary Education Online, 13(3), 934–953.
  • Kim, S., & Lee, W. (2004). IRT scale linking methods for mixed-format tests (ACT Research Report 2004-5). Iowa City, IA: Act, Inc.
  • Kristanjansonn, E., Aylesworth, R., McDowell, I., & Zumbo, B. D. (2005). A comparison of four methods for detecting differential item functioning in ordered response model. Educational and Psychological Measurement, 65(6), 935-953. doi: 10.1177/0013164405275668
  • Lee, Y. H., & Zhang, J. (2017). Effects of differential item functioning on examinees' test performance and reliability of test. International Journal of Testing, 17(1), 23-54. https://doi.org/10.1080/15305058.2016.1224888
  • Lei, P-W., Chen, S-Y., & Yu, L. (2006). Comparing methods of assessing differential item functioning in a computerized adaptive testing environment. Journal of Educational Measurement, 43(3), 245-264. https://doi.org/10.1111/j.1745-3984.2006.00015.x
  • Li, H. H., & Stout, W. (1994). SIBTEST: A fortran V program for computing the simultaneous item bias DIF statistics. Department of Statistics, University of Illinois, Urbana Champaign.
  • Li, Z., & Zumbo, B. D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score data. Psicológica, 30, 343-370.
  • Lopez, G. E. (2012). Detection and classification of DIF types using parametric and nonparametric methods: A comparison of the IRT-Likelihood Ratio test, Crossing-SIBTEST, and Logistic Regression procedures (Doctoral dissertation). Retrieved from http://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=5327&context=etd
  • Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous Differential Item Functioning. Behavior Research Methods, 42(3), 847–862. doi: 10.3758/BRM.42.3.847
  • Magis, D., Beland, S., & Raiche, G. (2013). difR: Collection of methods to detect dichotomous Differential Item Functioning (DIF) in psychometrics. R package version 5.0. http: //www.CRAN.R-project.org/package=difR
  • Mazor, K. M., Clauser, R. E., & Hambleton, R. K. (1993, March). Identification of nonuniform Differential Item Functioning using a variation of the Mantel-Haenszel procedure. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Francisco, CA.
  • Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105–118. doi: 10.3102/10769986007002105
  • Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741-749. doi: 10.1037/0003-066X.50.9.741
  • Muraki, E., & Bock, R. D. (2003). PARSCALE 4 for Windows: IRT based test scoring and item analysis for graded items and rating scales [Computer software]. Skokie, IL: Scientific Software International, Inc.
  • Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential. Applied Psychological Measurement, 18(4), 315-328. doi: 10.1177/014662169401800403
  • Oshima, T. C., & Morris, S. (2008). Raju’s differential functioning of items and tests (DFIT). Educational Measurement: Issues and Practice, 27(3), 43-50. doi: 10.1111/j.1745-3992.2008.00127.x
  • Osterlind, S. J. (1983). Test item bias. Newbury Park, California: Sage.
  • Penfield, R. D., & Lam, T. C. M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19(3), 5–15. doi: 10.1111/j.1745-3992.2000.tb00033.x
  • Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197-207. doi: 10.1177/014662169001400208
  • Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33(2), 215-230. Retrieved from http://www.jstor.org/stable/1435184
  • Roznowski, M., & Reith, J. (1999). Examining the measurement quality of tests containing differentially functioning items: Do biased items result in poor measurement? Educational and Psychological Measurement, 59(2), 248-269. doi: 10.1177/00131649921969839
  • Rupp, A. A., & Zumbo, B. D. (2003). Which model is best? Robustness properties to justify model choice among unidimensional IRT models under item parameter drift. Alberta Journal of Educational Research, 49, 264-276.
  • Rupp, A. A., & Zumbo, B. D. (2006). Understanding parameter invariance in unidimensional IRT models. Educational and Psychological Measurement, 66(1), 63-84. doi: 10.1177/0013164404273942
  • Shepard, L., Camilli, G., & Averill, M. (1981). Comparison of procedures for detecting test-item bias with both internal and external ability criteria. Journal of Educational Statistics, 6(4), 317–375. doi: 10.2307/1164616
  • Suh, Y. (2016). Effect size measures for differential item functioning in a multidimensional IRT model. Journal of Educational Measurement, 53(4), 403-430. https://doi.org/10.1111/jedm.12123
  • Tennant, A., & Pallant, J. F. (2007). DIF matters: A practical approach to test if Differential Item Functioning makes a difference. Rasch Measurement Transactions, 20(4), 1082-1084.
  • Thissen, D. (2001). IRTLRDIF v.2.0b: Software for the computation of the statistics involved in Item Response Theory Likelihood-Ratio tests for Differential Item Functioning. L.L. Thurstone Psychometric Laboratory, University of North Carolina, Chapel Hill, NC.
  • Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67-113). Hillsdale, NJ: Lawrence Erlbaum Associates.
  • Wells, C. S., Subkoviak, M. J., & Serlin, R. C. (2002). The effect of item parameter drift on examinee ability estimates. Applied Psychological Measurement, 26(1), 77–87. doi: 10.1177/0146621602261005
  • Zumbo, B. D. (1999). Handbook on the theory and methods of differential item functioning: Logistic regression modeling as a unitary framework for binary and likert-type item scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense.

Performances Based on Ability Estimation of the Methods of Detecting Differential Item Functioning: A Simulation Study

Year 2019, , 133 - 148, 28.06.2019
https://doi.org/10.21031/epod.534312

Abstract

The aim of
the study is to examine differential item functioning (DIF) detection methods—the
simultaneous item bias test (SIBTEST), Item Response Theory likelihood ratio (IRT-LR),
Lord chi square (χ2), and Raju area measures—based on ability estimates when
purifying items with DIF from the test, considering conditions of ratio of the
items with DIF, effect size of DIF, and type of DIF. This study is a simulation
study and 50 replications were conducted for each condition. In order to
compare DIF detection methods, error (RMSD) and coefficient of concordance
(Pearson’s correlation coefficient) were calculated according to estimated and
initial abilities for the reference group. As a result of the study, the lowest
error and the highest concordance were seen in the case of 10% uniform DIF in
the test and the method of IRT-LR, considering all other conditions. Moreover,
for the method of SIBTEST and IRT-LR in all conditions, it was found that the
error obtained by purifying items with C level DIF is lower than the error
obtained by purifying items with both B and C level DIF. Similarly, for the
method of SIBTEST and IRT-LR in all conditions, it was seen that the
concordance coefficient found by purifying C level DIF is higher than the
coefficient by purifying items with both B and C level DIF.

References

  • Atalay Kabasakal, K., Arsan, N., Gök, B., & Kelecioğlu, H. (2014). Comparing performances (type I error and power) of IRT Likelihood Ratio SIBTEST and Mantel-Haenszel methods in the determination of differential item functioning. Educational Sciences: Theory & Practice, 14(6), 2175–2193. doi: 10.12738/estp.2014.6.2165
  • Camilli, G. (1993). The case against item bias detection techniques based on internal criteria: Do item bias procedures obscure test fairness issues? In P. W. Holland & H. Wainer (Eds.), Differential Item Functioning (pp. 397–418). New York: Routledge.
  • Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. California: Sage.
  • Cheng, C. M. (2005). A study on Differential Item Functioning of the basic mathematical competence test for junior high schools in Taiwan (Doctoral Dissertation). Available from ProOuest Dissertations and Theses database (UMI No. 3189625).
  • Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31-47. doi: 10.1111/j.1745-3992.1998.tb00619.x
  • Cohen, A. S., & Kim, S. (1993). A comparison of Lord’s Chi Square and Raju’s Area Measures in detection of DIF. Applied Psychological Measurement, 17(1), 39–52. doi: 10.1177/014662169301700109
  • Cromwell, S. (2002, February). A primer on ways to explore item bias. Paper presented at the Annual Meeting of the Southwest Educational Research Association, Austin, TX.
  • Downing, S. M., & Haladyna, T. M. (2004). Validity threats: Overcoming interference with proposed interpretations of assessment data. Medical Education, 38(3), 327-333. https://doi.org/10.1046/j.1365-2923.2004.01777.x
  • Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT Likelihood Ratio. Applied Psychological Measurement, 29(4), 278-295. doi: 10.1177/0146621605275728
  • Finch, W. H., & French, B. F. (2007). Detection of crossing differential item functioning a comparison of four methods. Educational and Psychological Measurement, 67(4), 565-582. doi: 10.1177/0013164406296975
  • Glas, C. A. W., & Meijer, R. R. (2003). A Bayesian approach to person fit analysis in Item Response Theory models. Applied Psychological Measurement, 27(3), 217-233. doi: 10.1177/0146621603252216
  • Golia S. (2010). The assessment of DIF on Rasch measures with an application to job satisfaction. Electronic Journal of Applied Statistical Analysis: Decision Support Systems and Services Evaluation, 1(1) 16–25. doi: 10.1285/i2037-3627v1n1p16
  • Golia S. (2015). Assessing the impact of uniform and nonuniform differential item functioning items on Rasch measure: The polytomous case. Computational Statistics, 30, 441–461. doi: 10.1007/s00180-014-0542-x
  • Green, S., & Salkind, N. (2005). Using SPSS for Windows and Macintosh: Analyzing and understanding data (4th Ed). Upper Saddle River, NJ: Pearson.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of Item Response Theory. California: Sage.
  • Han, K. T. (2007). WinGen: Windows software that generates IRT parameters and item responses. Applied Psychological Measurement, 31(5), 457-459. doi: 10.1177/0146621607299271
  • Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in Item Response Theory. Applied Psychological Measurement, 20(2), 101-125. doi: 10.1177/014662169602000201
  • Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the Logistic Regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329-349. doi: 10.1207/S15324818AME1404_2
  • Kelecioğlu, H., Karabay, B., & Karabay, E. (2014). Investigation of placement test in terms of item biasness. Elementary Education Online, 13(3), 934–953.
  • Kim, S., & Lee, W. (2004). IRT scale linking methods for mixed-format tests (ACT Research Report 2004-5). Iowa City, IA: Act, Inc.
  • Kristanjansonn, E., Aylesworth, R., McDowell, I., & Zumbo, B. D. (2005). A comparison of four methods for detecting differential item functioning in ordered response model. Educational and Psychological Measurement, 65(6), 935-953. doi: 10.1177/0013164405275668
  • Lee, Y. H., & Zhang, J. (2017). Effects of differential item functioning on examinees' test performance and reliability of test. International Journal of Testing, 17(1), 23-54. https://doi.org/10.1080/15305058.2016.1224888
  • Lei, P-W., Chen, S-Y., & Yu, L. (2006). Comparing methods of assessing differential item functioning in a computerized adaptive testing environment. Journal of Educational Measurement, 43(3), 245-264. https://doi.org/10.1111/j.1745-3984.2006.00015.x
  • Li, H. H., & Stout, W. (1994). SIBTEST: A fortran V program for computing the simultaneous item bias DIF statistics. Department of Statistics, University of Illinois, Urbana Champaign.
  • Li, Z., & Zumbo, B. D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score data. Psicológica, 30, 343-370.
  • Lopez, G. E. (2012). Detection and classification of DIF types using parametric and nonparametric methods: A comparison of the IRT-Likelihood Ratio test, Crossing-SIBTEST, and Logistic Regression procedures (Doctoral dissertation). Retrieved from http://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=5327&context=etd
  • Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous Differential Item Functioning. Behavior Research Methods, 42(3), 847–862. doi: 10.3758/BRM.42.3.847
  • Magis, D., Beland, S., & Raiche, G. (2013). difR: Collection of methods to detect dichotomous Differential Item Functioning (DIF) in psychometrics. R package version 5.0. http: //www.CRAN.R-project.org/package=difR
  • Mazor, K. M., Clauser, R. E., & Hambleton, R. K. (1993, March). Identification of nonuniform Differential Item Functioning using a variation of the Mantel-Haenszel procedure. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Francisco, CA.
  • Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105–118. doi: 10.3102/10769986007002105
  • Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741-749. doi: 10.1037/0003-066X.50.9.741
  • Muraki, E., & Bock, R. D. (2003). PARSCALE 4 for Windows: IRT based test scoring and item analysis for graded items and rating scales [Computer software]. Skokie, IL: Scientific Software International, Inc.
  • Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential. Applied Psychological Measurement, 18(4), 315-328. doi: 10.1177/014662169401800403
  • Oshima, T. C., & Morris, S. (2008). Raju’s differential functioning of items and tests (DFIT). Educational Measurement: Issues and Practice, 27(3), 43-50. doi: 10.1111/j.1745-3992.2008.00127.x
  • Osterlind, S. J. (1983). Test item bias. Newbury Park, California: Sage.
  • Penfield, R. D., & Lam, T. C. M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19(3), 5–15. doi: 10.1111/j.1745-3992.2000.tb00033.x
  • Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197-207. doi: 10.1177/014662169001400208
  • Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33(2), 215-230. Retrieved from http://www.jstor.org/stable/1435184
  • Roznowski, M., & Reith, J. (1999). Examining the measurement quality of tests containing differentially functioning items: Do biased items result in poor measurement? Educational and Psychological Measurement, 59(2), 248-269. doi: 10.1177/00131649921969839
  • Rupp, A. A., & Zumbo, B. D. (2003). Which model is best? Robustness properties to justify model choice among unidimensional IRT models under item parameter drift. Alberta Journal of Educational Research, 49, 264-276.
  • Rupp, A. A., & Zumbo, B. D. (2006). Understanding parameter invariance in unidimensional IRT models. Educational and Psychological Measurement, 66(1), 63-84. doi: 10.1177/0013164404273942
  • Shepard, L., Camilli, G., & Averill, M. (1981). Comparison of procedures for detecting test-item bias with both internal and external ability criteria. Journal of Educational Statistics, 6(4), 317–375. doi: 10.2307/1164616
  • Suh, Y. (2016). Effect size measures for differential item functioning in a multidimensional IRT model. Journal of Educational Measurement, 53(4), 403-430. https://doi.org/10.1111/jedm.12123
  • Tennant, A., & Pallant, J. F. (2007). DIF matters: A practical approach to test if Differential Item Functioning makes a difference. Rasch Measurement Transactions, 20(4), 1082-1084.
  • Thissen, D. (2001). IRTLRDIF v.2.0b: Software for the computation of the statistics involved in Item Response Theory Likelihood-Ratio tests for Differential Item Functioning. L.L. Thurstone Psychometric Laboratory, University of North Carolina, Chapel Hill, NC.
  • Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67-113). Hillsdale, NJ: Lawrence Erlbaum Associates.
  • Wells, C. S., Subkoviak, M. J., & Serlin, R. C. (2002). The effect of item parameter drift on examinee ability estimates. Applied Psychological Measurement, 26(1), 77–87. doi: 10.1177/0146621602261005
  • Zumbo, B. D. (1999). Handbook on the theory and methods of differential item functioning: Logistic regression modeling as a unitary framework for binary and likert-type item scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense.
There are 48 citations in total.

Details

Primary Language English
Journal Section Articles
Authors

İbrahim Uysal 0000-0002-6767-0362

Levent Ertuna 0000-0001-7810-1168

F. Güneş Ertaş 0000-0001-8785-7768

Hülya Kelecioğlu 0000-0002-0741-9934

Publication Date June 28, 2019
Acceptance Date May 26, 2019
Published in Issue Year 2019

Cite

APA Uysal, İ., Ertuna, L., Ertaş, F. G., Kelecioğlu, H. (2019). Performances Based on Ability Estimation of the Methods of Detecting Differential Item Functioning: A Simulation Study. Journal of Measurement and Evaluation in Education and Psychology, 10(2), 133-148. https://doi.org/10.21031/epod.534312