TY - JOUR T1 - Investigation of differential item and step functioning procedures in polytomus items AU - Kuzu, Yasemin AU - Gelbal, Selahattin PY - 2023 DA - September Y2 - 2023 DO - 10.21031/epod.1221823 JF - Journal of Measurement and Evaluation in Education and Psychology JO - JMEEP PB - Association for Measurement and Evaluation in Education and Psychology WT - DergiPark SN - 1309-6575 SP - 200 EP - 221 VL - 14 IS - 3 LA - en AB - This study aimed to compare differential item functioning (DIF) and differential step function (DSF) detection methods in polytomously scored items under various conditions. In this context, the study examined Kazakhstan, Turkey and USA data obtained from the items related to the frequency of using digital devices at school in PISA 2018 students’ “ICT Familiarity Questionnaire”. Mantel test, Liu-Agresti statistics, Cox β and poly-SIBTEST methods were used for polytomous DIF analysis while Adjacent Category Logistic Regression Model and Cumulative Category Log Odds Ratio methods were used for DSF analysis. This study was carried out with correlational survey model, by using “differential category combining, focus group sample size, focus group: reference group sample ratio and DIF/DSF detection method”. SAS and R software were utilized in the creation of conditions; SIBTEST was used for poly-SIBTEST for analysis and DIFAS programs were used for the other methods. Analyses demonstrated that the number of items/steps exhibiting high level of DIF/DSF was higher in the small sample according to polytomous DIF methods and in the large sample compared to DSF methods. During the steps, it was stated that the DIF value was lower in the items containing DSF with the opposite sign; therefore, not performing DSF analysis in an item with no DIF may yield erroneous results. Although the differential category combining conditions created within the scope of the research did not have a systematic effect on the results, it was suggested to examine this situation in future studies, considering that the frequency of marking the combined categories differentiated the results. KW - multi-category differential item function KW - differential step function KW - adjacent approach KW - cumulative approach KW - AC-LOR KW - CU-LOR CR - Akour, M., Sabah, S., & Hammouri, H. (2015). Net and global differential item functioning in pisa polytomously scored science items. Journal of Psychoeducational Assessment, 33(2), 166–176. https://doi.org/10.1177/0734282914541337 CR - American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (AERA, APA, & NCME) (2014). Standards for educational and psychological testing. American Educational Research Association. CR - Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness‐of‐fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36(4), 277–300. https://doi.org/10.1111/j.1745-3984.1999.tb00558.x CR - Ayodele, A.N. (2017). Examining power and type 1 error for step and item level tests of invariance: Investigating the effect of the number of item score levels (Doctoral dissertation). University of Minnesota, USA. CR - Benítez, I., Padilla, J.L., Hidalgo Montesinos, M. D., & Sireci, S. G. (2015). Using mixed methods to interpret differential item functioning. Applied Measurement in Education, 29(1), 1–16. https://doi.org/10.1080/08957347.2015.1102915 CR - Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15(2), 113–141. https://doi.org/10.1207/S15324818AME1502_01 CR - Camilli, G., & Congdon, P. (1999). Application of a method of estimating DIF for polytomous test items. Journal of Educational and Behavioral Statistics, 24(4), 323–341. https://www.jstor.org/stable/pdf/1165366.pdf CR - Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Sage Publications. CR - Chang, H. H., Mazzeo, J., & Roussos, L. (1996). Detecting DIF for polytomous items: An adaptation of the SIBTEST procedure. Journal of educational measurement, 33(3), 333–353. https://doi.org/10.1111/j.1745-3984.1996.tb00496.x CR - Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. An NCME instructional module. Educational Measurement: Issues and Practice, 17(1), 31–44. https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1745-3992.1998.tb00619.x CR - Cohen, A. S., Kim, S.-H., & Baker, F. B. (1993). Detection of differential item functioning in the graded response model. Applied Psychological Measurement, 17(4), 335–350. https://doi.org/10.1177/014662169301700402 CR - Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20(2), 215–232. https://doi.org/10.1111/j.2517-6161.1958.tb00292.x CR - Ellis, B. B., & Raju, N. S. (2003). Test and Item Bias: What they are, what they aren't, and how to detect them. In J. E. Wall & G. R. Walz (Eds.), Measuring up: Assessment issues for teachers, counselors, and administrators. CAPS Press. CR - Elosua, P., & Wells, C. S. (2013). Detecting DIF in polytomous items using MACS, IRT and ordinal logistic regression. Psicológica, 34(2), 327–342. https://www.redalyc.org/pdf/169/16929535011.pdf Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates. CR - French, A. W., & Miller, T. R. (1996). Logistic regression and its use in detecting differential item functioning in polytomous items. Journal of Educational Measurement, 33(3), 315–332. https://doi.org/10.1111/j.1745-3984.1996.tb00495.x CR - Gattamorta, K. A. (2009). A comparison of adjacent categories and cumulative DSF effect estimators [Doctoral dissertation]. University of Miami, Florida. CR - Gattamorta, K. A., & Penfield, R. D. (2012). A comparison of adjacent categories and cumulative differential step functioning effect estimators. Applied Measurement in Education, 25(2), 142–161. https://doi.org/10.1080/08957347.2012.660387 CR - Gelin, M. N., & Zumbo, B. D. (2003). Differential item functioning results may change depending on how an item is scored: An illustration with the center for epidemiologic studies depression scale. Educational and Psychological Measurement, 63(1), 65–74. https://doi.org/10.1177/0013164402239317 CR - Gonzalez-Roma, V., Hernandez, A., & Gomez-Benito, J. (2006). Power and Type I error of the mean and covariance structure analysis model for detecting differential item functioning in graded response items. Multivariate Behavioral Research, 41(1), 29–53. https://doi.org/10.1207/s15327906mbr4101_3 CR - Göçer-Şahin, S., Gelbal, S., & Walker, C. M. (2016, October). Impact of decreasing category number of polytomous items on DIF [Conference presentation]. 15th International Mineral Processing Symposium (IMPS 2016), USA. CR - Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage Publications. CR - Henderson, D. L. (2001, April 10-14). Prevalence of gender DIF in mixed format high school exit examinations. American Educational Research Association 2001 Annual Meeting, USA. https://files.eric.ed.gov/fulltext/ED458284.pdf Kaiser, H. F. (1970). A second generation little jiffy. Psychometrika, 35(4), 401–415. https://doi.org/10.1007/BF02291817 CR - Kristjansson, E., Aylesworth, R., Mcdowell, I., & Zumbo, B. D. (2005). A comparison of four methods for detecting differential item functioning in ordered response items. Educational and Psychological Measurement, 65(6), 935–953. https://doi.org/10.1177/0013164405275668 CR - Kuzu, Y. (2021). Investigation of Differential Item and Step Functioning Procedures in Polytomously Scored Items [ Doctoral dissertation]. Hacettepe University, Ankara. CR - Mantel, N. (1963). Chi-square tests with one degree of freedom; extensions of the Mantel-Haenszel procedure. Journal of the American Statistical Association, 58(303), 690–700. https://www.jstor.org/stable/pdf/2282717.pdf CR - Meade, A. W., & Lautenschlager, G. J. (2004). A comparison of item response theory and confirmatory factor analytic methodologies for establishing measurement equivalence/invariance. Organizational Research Methods, 7(4), 361–388. https://tarjomefa.com/wp-content/uploads/2019/10/F1430-TarjomeFa-English.pdf CR - Mellor, T. L. (1995). A comparison of four differantial item functioning methods for polytomously scored items [Unpublished doctoral dissertation]. The University of Texas, Austin. CR - Miller, T., Chahine, S., & Childs, R. A. (2010). Detecting differential item functioning and differential step functioning due to differences that should matter. Practical Assessment, Research, and Evaluation, 15(10), 1–13. https://doi.org/10.7275/dzm4-q558 CR - Penfield, R. D. (2007). Assessing differential step functioning in polytomous items using a common odds ratio estimator. Journal of educational measurement, 44(3), 187–210. https://doi.org/10.1111/j.1745-3984.2007.00034.x CR - Penfield, R. D. (2008). Three classes of nonparametric differential step functioning effect estimators. Applied Psychological Measurement, 32(6), 480–501. https://doi.org/10.1177/0146621607305399 CR - Penfield, R. D. (2010). Distinguishing between net and global DIF in polytomous items. Journal of Educational Measurement, 47(2), 129–149. https://doi.org/10.1111/j.1745-3984.2010.00105.x CR - Penfield, R. D. (2013). DIFAS 5.0 differential item functioning analysis system user’s manual. https://soe.uncg.edu/wp-content/uploads/2015/12/DIFASManual_V5.pdf CR - Penfield, R. D., & Algina, J. (2003). Applying the Liu‐Agresti estimator of the cumulative common odds ratio to DIF detection in polytomous items. Journal of Educational Measurement, 40(4), 353–370. https://doi.org/10.1111/j.1745-3984.2003.tb01151.x CR - Penfield, R. D., Alvarez, K., & Lee, O. (2008). Using a taxonomy of differential step functioning to improve the interpretation of DIF in polytomous items: An illustration. Applied Measurement in Education, 22(1), 61–78. https://doi.org/10.1080/08957340802558367 CR - Penfield, R. D., & Lam, T. C. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19(3), 5–15. https://onlinelibrary.wiley.com/doi/pdfdirect/10.1111/j.1745-3992.2000.tb00033.x CR - Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel‐Haenszel Type I error performance. Journal of Educational Measurement, 33(2), 215–230. https://www.jstor.org/stable/pdf/1435184.pdf CR - Sireci, S.G., & Rios, J.A. (2013). Decisions that make a difference in detecting differential item functioning. Educational Research and Evaluation, 19(2-3), 170-187. https://doi.org/10.1080/13803611.2013.767621 CR - Somes, G. W. (1986). The generalized Mantel–Haenszel statistic. The American Statistician, 40(2), 106–108. https://www.jstor.org/stable/pdf/2684866.pdf CR - Wang, W. C., & Su, Y. H. (2004). Factors influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of differential item functioning in polytomous items. Applied Psychological Measurement, 28(6), 450–480. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=96cc44755a12838b2cde4401a0635aaa6b075768 CR - Wood, S. W. (2011). Differential item functioning procedures for polytomous items when examinee sample sizes are small [Unpublished doctoral thesis]. The University of Iowa, USA. CR - Yandı, A. (2017). Comparison of the methods of examining measurement equivalence under different conditions in terms of statistical power ratios [Unpublished doctoral thesis]. Ankara University, Ankara. CR - Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language assessment quarterly, 4(2), 223–233. https://doi.org/10.1080/15434300701375832 CR - Zwick, R. (2012). A review of ETS differential item functioning assessment procedures: Flagging rules, minimum sample size requirements, and criterion refinement. ETS Research Report Series, 2012(1), i-30. https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/j.2333-8504.2012.tb02290.x CR - Zwick, R., Donoghue, J. R., & Grima, A. (1993). Assessment of differential item functioning for performance tasks. Journal of educational measurement, 30(3), 233–251. https://doi.org/10.1111/j.1745-3984.1993.tb00425.x UR - https://doi.org/10.21031/epod.1221823 L1 - https://dergipark.org.tr/en/download/article-file/2843049 ER -