Research Article
BibTex RIS Cite

Investigation of differential item and step functioning procedures in polytomus items

Year 2023, Volume: 14 Issue: 3, 200 - 221, 30.09.2023
https://doi.org/10.21031/epod.1221823

Abstract

This study aimed to compare differential item functioning (DIF) and differential step function (DSF) detection methods in polytomously scored items under various conditions. In this context, the study examined Kazakhstan, Turkey and USA data obtained from the items related to the frequency of using digital devices at school in PISA 2018 students’ “ICT Familiarity Questionnaire”. Mantel test, Liu-Agresti statistics, Cox β and poly-SIBTEST methods were used for polytomous DIF analysis while Adjacent Category Logistic Regression Model and Cumulative Category Log Odds Ratio methods were used for DSF analysis. This study was carried out with correlational survey model, by using “differential category combining, focus group sample size, focus group: reference group sample ratio and DIF/DSF detection method”. SAS and R software were utilized in the creation of conditions; SIBTEST was used for poly-SIBTEST for analysis and DIFAS programs were used for the other methods. Analyses demonstrated that the number of items/steps exhibiting high level of DIF/DSF was higher in the small sample according to polytomous DIF methods and in the large sample compared to DSF methods. During the steps, it was stated that the DIF value was lower in the items containing DSF with the opposite sign; therefore, not performing DSF analysis in an item with no DIF may yield erroneous results. Although the differential category combining conditions created within the scope of the research did not have a systematic effect on the results, it was suggested to examine this situation in future studies, considering that the frequency of marking the combined categories differentiated the results.

References

  • Akour, M., Sabah, S., & Hammouri, H. (2015). Net and global differential item functioning in pisa polytomously scored science items. Journal of Psychoeducational Assessment, 33(2), 166–176. https://doi.org/10.1177/0734282914541337
  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (AERA, APA, & NCME) (2014). Standards for educational and psychological testing. American Educational Research Association.
  • Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness‐of‐fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36(4), 277–300. https://doi.org/10.1111/j.1745-3984.1999.tb00558.x
  • Ayodele, A.N. (2017). Examining power and type 1 error for step and item level tests of invariance: Investigating the effect of the number of item score levels (Doctoral dissertation). University of Minnesota, USA.
  • Benítez, I., Padilla, J.L., Hidalgo Montesinos, M. D., & Sireci, S. G. (2015). Using mixed methods to interpret differential item functioning. Applied Measurement in Education, 29(1), 1–16. https://doi.org/10.1080/08957347.2015.1102915
  • Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15(2), 113–141. https://doi.org/10.1207/S15324818AME1502_01
  • Camilli, G., & Congdon, P. (1999). Application of a method of estimating DIF for polytomous test items. Journal of Educational and Behavioral Statistics, 24(4), 323–341. https://www.jstor.org/stable/pdf/1165366.pdf
  • Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Sage Publications.
  • Chang, H. H., Mazzeo, J., & Roussos, L. (1996). Detecting DIF for polytomous items: An adaptation of the SIBTEST procedure. Journal of educational measurement, 33(3), 333–353. https://doi.org/10.1111/j.1745-3984.1996.tb00496.x
  • Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. An NCME instructional module. Educational Measurement: Issues and Practice, 17(1), 31–44. https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1745-3992.1998.tb00619.x
  • Cohen, A. S., Kim, S.-H., & Baker, F. B. (1993). Detection of differential item functioning in the graded response model. Applied Psychological Measurement, 17(4), 335–350. https://doi.org/10.1177/014662169301700402
  • Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20(2), 215–232. https://doi.org/10.1111/j.2517-6161.1958.tb00292.x
  • Ellis, B. B., & Raju, N. S. (2003). Test and Item Bias: What they are, what they aren't, and how to detect them. In J. E. Wall & G. R. Walz (Eds.), Measuring up: Assessment issues for teachers, counselors, and administrators. CAPS Press.
  • Elosua, P., & Wells, C. S. (2013). Detecting DIF in polytomous items using MACS, IRT and ordinal logistic regression. Psicológica, 34(2), 327–342. https://www.redalyc.org/pdf/169/16929535011.pdf Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.
  • French, A. W., & Miller, T. R. (1996). Logistic regression and its use in detecting differential item functioning in polytomous items. Journal of Educational Measurement, 33(3), 315–332. https://doi.org/10.1111/j.1745-3984.1996.tb00495.x
  • Gattamorta, K. A. (2009). A comparison of adjacent categories and cumulative DSF effect estimators [Doctoral dissertation]. University of Miami, Florida.
  • Gattamorta, K. A., & Penfield, R. D. (2012). A comparison of adjacent categories and cumulative differential step functioning effect estimators. Applied Measurement in Education, 25(2), 142–161. https://doi.org/10.1080/08957347.2012.660387
  • Gelin, M. N., & Zumbo, B. D. (2003). Differential item functioning results may change depending on how an item is scored: An illustration with the center for epidemiologic studies depression scale. Educational and Psychological Measurement, 63(1), 65–74. https://doi.org/10.1177/0013164402239317
  • Gonzalez-Roma, V., Hernandez, A., & Gomez-Benito, J. (2006). Power and Type I error of the mean and covariance structure analysis model for detecting differential item functioning in graded response items. Multivariate Behavioral Research, 41(1), 29–53. https://doi.org/10.1207/s15327906mbr4101_3
  • Göçer-Şahin, S., Gelbal, S., & Walker, C. M. (2016, October). Impact of decreasing category number of polytomous items on DIF [Conference presentation]. 15th International Mineral Processing Symposium (IMPS 2016), USA.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage Publications.
  • Henderson, D. L. (2001, April 10-14). Prevalence of gender DIF in mixed format high school exit examinations. American Educational Research Association 2001 Annual Meeting, USA. https://files.eric.ed.gov/fulltext/ED458284.pdf Kaiser, H. F. (1970). A second generation little jiffy. Psychometrika, 35(4), 401–415. https://doi.org/10.1007/BF02291817
  • Kristjansson, E., Aylesworth, R., Mcdowell, I., & Zumbo, B. D. (2005). A comparison of four methods for detecting differential item functioning in ordered response items. Educational and Psychological Measurement, 65(6), 935–953. https://doi.org/10.1177/0013164405275668
  • Kuzu, Y. (2021). Investigation of Differential Item and Step Functioning Procedures in Polytomously Scored Items [ Doctoral dissertation]. Hacettepe University, Ankara.
  • Mantel, N. (1963). Chi-square tests with one degree of freedom; extensions of the Mantel-Haenszel procedure. Journal of the American Statistical Association, 58(303), 690–700. https://www.jstor.org/stable/pdf/2282717.pdf
  • Meade, A. W., & Lautenschlager, G. J. (2004). A comparison of item response theory and confirmatory factor analytic methodologies for establishing measurement equivalence/invariance. Organizational Research Methods, 7(4), 361–388. https://tarjomefa.com/wp-content/uploads/2019/10/F1430-TarjomeFa-English.pdf
  • Mellor, T. L. (1995). A comparison of four differantial item functioning methods for polytomously scored items [Unpublished doctoral dissertation]. The University of Texas, Austin.
  • Miller, T., Chahine, S., & Childs, R. A. (2010). Detecting differential item functioning and differential step functioning due to differences that should matter. Practical Assessment, Research, and Evaluation, 15(10), 1–13. https://doi.org/10.7275/dzm4-q558
  • Penfield, R. D. (2007). Assessing differential step functioning in polytomous items using a common odds ratio estimator. Journal of educational measurement, 44(3), 187–210. https://doi.org/10.1111/j.1745-3984.2007.00034.x
  • Penfield, R. D. (2008). Three classes of nonparametric differential step functioning effect estimators. Applied Psychological Measurement, 32(6), 480–501. https://doi.org/10.1177/0146621607305399
  • Penfield, R. D. (2010). Distinguishing between net and global DIF in polytomous items. Journal of Educational Measurement, 47(2), 129–149. https://doi.org/10.1111/j.1745-3984.2010.00105.x
  • Penfield, R. D. (2013). DIFAS 5.0 differential item functioning analysis system user’s manual. https://soe.uncg.edu/wp-content/uploads/2015/12/DIFASManual_V5.pdf
  • Penfield, R. D., & Algina, J. (2003). Applying the Liu‐Agresti estimator of the cumulative common odds ratio to DIF detection in polytomous items. Journal of Educational Measurement, 40(4), 353–370. https://doi.org/10.1111/j.1745-3984.2003.tb01151.x
  • Penfield, R. D., Alvarez, K., & Lee, O. (2008). Using a taxonomy of differential step functioning to improve the interpretation of DIF in polytomous items: An illustration. Applied Measurement in Education, 22(1), 61–78. https://doi.org/10.1080/08957340802558367
  • Penfield, R. D., & Lam, T. C. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19(3), 5–15. https://onlinelibrary.wiley.com/doi/pdfdirect/10.1111/j.1745-3992.2000.tb00033.x
  • Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel‐Haenszel Type I error performance. Journal of Educational Measurement, 33(2), 215–230. https://www.jstor.org/stable/pdf/1435184.pdf
  • Sireci, S.G., & Rios, J.A. (2013). Decisions that make a difference in detecting differential item functioning. Educational Research and Evaluation, 19(2-3), 170-187. https://doi.org/10.1080/13803611.2013.767621
  • Somes, G. W. (1986). The generalized Mantel–Haenszel statistic. The American Statistician, 40(2), 106–108. https://www.jstor.org/stable/pdf/2684866.pdf
  • Wang, W. C., & Su, Y. H. (2004). Factors influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of differential item functioning in polytomous items. Applied Psychological Measurement, 28(6), 450–480. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=96cc44755a12838b2cde4401a0635aaa6b075768
  • Wood, S. W. (2011). Differential item functioning procedures for polytomous items when examinee sample sizes are small [Unpublished doctoral thesis]. The University of Iowa, USA.
  • Yandı, A. (2017). Comparison of the methods of examining measurement equivalence under different conditions in terms of statistical power ratios [Unpublished doctoral thesis]. Ankara University, Ankara.
  • Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language assessment quarterly, 4(2), 223–233. https://doi.org/10.1080/15434300701375832
  • Zwick, R. (2012). A review of ETS differential item functioning assessment procedures: Flagging rules, minimum sample size requirements, and criterion refinement. ETS Research Report Series, 2012(1), i-30. https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/j.2333-8504.2012.tb02290.x
  • Zwick, R., Donoghue, J. R., & Grima, A. (1993). Assessment of differential item functioning for performance tasks. Journal of educational measurement, 30(3), 233–251. https://doi.org/10.1111/j.1745-3984.1993.tb00425.x
Year 2023, Volume: 14 Issue: 3, 200 - 221, 30.09.2023
https://doi.org/10.21031/epod.1221823

Abstract

References

  • Akour, M., Sabah, S., & Hammouri, H. (2015). Net and global differential item functioning in pisa polytomously scored science items. Journal of Psychoeducational Assessment, 33(2), 166–176. https://doi.org/10.1177/0734282914541337
  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (AERA, APA, & NCME) (2014). Standards for educational and psychological testing. American Educational Research Association.
  • Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness‐of‐fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36(4), 277–300. https://doi.org/10.1111/j.1745-3984.1999.tb00558.x
  • Ayodele, A.N. (2017). Examining power and type 1 error for step and item level tests of invariance: Investigating the effect of the number of item score levels (Doctoral dissertation). University of Minnesota, USA.
  • Benítez, I., Padilla, J.L., Hidalgo Montesinos, M. D., & Sireci, S. G. (2015). Using mixed methods to interpret differential item functioning. Applied Measurement in Education, 29(1), 1–16. https://doi.org/10.1080/08957347.2015.1102915
  • Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15(2), 113–141. https://doi.org/10.1207/S15324818AME1502_01
  • Camilli, G., & Congdon, P. (1999). Application of a method of estimating DIF for polytomous test items. Journal of Educational and Behavioral Statistics, 24(4), 323–341. https://www.jstor.org/stable/pdf/1165366.pdf
  • Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Sage Publications.
  • Chang, H. H., Mazzeo, J., & Roussos, L. (1996). Detecting DIF for polytomous items: An adaptation of the SIBTEST procedure. Journal of educational measurement, 33(3), 333–353. https://doi.org/10.1111/j.1745-3984.1996.tb00496.x
  • Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. An NCME instructional module. Educational Measurement: Issues and Practice, 17(1), 31–44. https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1745-3992.1998.tb00619.x
  • Cohen, A. S., Kim, S.-H., & Baker, F. B. (1993). Detection of differential item functioning in the graded response model. Applied Psychological Measurement, 17(4), 335–350. https://doi.org/10.1177/014662169301700402
  • Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20(2), 215–232. https://doi.org/10.1111/j.2517-6161.1958.tb00292.x
  • Ellis, B. B., & Raju, N. S. (2003). Test and Item Bias: What they are, what they aren't, and how to detect them. In J. E. Wall & G. R. Walz (Eds.), Measuring up: Assessment issues for teachers, counselors, and administrators. CAPS Press.
  • Elosua, P., & Wells, C. S. (2013). Detecting DIF in polytomous items using MACS, IRT and ordinal logistic regression. Psicológica, 34(2), 327–342. https://www.redalyc.org/pdf/169/16929535011.pdf Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.
  • French, A. W., & Miller, T. R. (1996). Logistic regression and its use in detecting differential item functioning in polytomous items. Journal of Educational Measurement, 33(3), 315–332. https://doi.org/10.1111/j.1745-3984.1996.tb00495.x
  • Gattamorta, K. A. (2009). A comparison of adjacent categories and cumulative DSF effect estimators [Doctoral dissertation]. University of Miami, Florida.
  • Gattamorta, K. A., & Penfield, R. D. (2012). A comparison of adjacent categories and cumulative differential step functioning effect estimators. Applied Measurement in Education, 25(2), 142–161. https://doi.org/10.1080/08957347.2012.660387
  • Gelin, M. N., & Zumbo, B. D. (2003). Differential item functioning results may change depending on how an item is scored: An illustration with the center for epidemiologic studies depression scale. Educational and Psychological Measurement, 63(1), 65–74. https://doi.org/10.1177/0013164402239317
  • Gonzalez-Roma, V., Hernandez, A., & Gomez-Benito, J. (2006). Power and Type I error of the mean and covariance structure analysis model for detecting differential item functioning in graded response items. Multivariate Behavioral Research, 41(1), 29–53. https://doi.org/10.1207/s15327906mbr4101_3
  • Göçer-Şahin, S., Gelbal, S., & Walker, C. M. (2016, October). Impact of decreasing category number of polytomous items on DIF [Conference presentation]. 15th International Mineral Processing Symposium (IMPS 2016), USA.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage Publications.
  • Henderson, D. L. (2001, April 10-14). Prevalence of gender DIF in mixed format high school exit examinations. American Educational Research Association 2001 Annual Meeting, USA. https://files.eric.ed.gov/fulltext/ED458284.pdf Kaiser, H. F. (1970). A second generation little jiffy. Psychometrika, 35(4), 401–415. https://doi.org/10.1007/BF02291817
  • Kristjansson, E., Aylesworth, R., Mcdowell, I., & Zumbo, B. D. (2005). A comparison of four methods for detecting differential item functioning in ordered response items. Educational and Psychological Measurement, 65(6), 935–953. https://doi.org/10.1177/0013164405275668
  • Kuzu, Y. (2021). Investigation of Differential Item and Step Functioning Procedures in Polytomously Scored Items [ Doctoral dissertation]. Hacettepe University, Ankara.
  • Mantel, N. (1963). Chi-square tests with one degree of freedom; extensions of the Mantel-Haenszel procedure. Journal of the American Statistical Association, 58(303), 690–700. https://www.jstor.org/stable/pdf/2282717.pdf
  • Meade, A. W., & Lautenschlager, G. J. (2004). A comparison of item response theory and confirmatory factor analytic methodologies for establishing measurement equivalence/invariance. Organizational Research Methods, 7(4), 361–388. https://tarjomefa.com/wp-content/uploads/2019/10/F1430-TarjomeFa-English.pdf
  • Mellor, T. L. (1995). A comparison of four differantial item functioning methods for polytomously scored items [Unpublished doctoral dissertation]. The University of Texas, Austin.
  • Miller, T., Chahine, S., & Childs, R. A. (2010). Detecting differential item functioning and differential step functioning due to differences that should matter. Practical Assessment, Research, and Evaluation, 15(10), 1–13. https://doi.org/10.7275/dzm4-q558
  • Penfield, R. D. (2007). Assessing differential step functioning in polytomous items using a common odds ratio estimator. Journal of educational measurement, 44(3), 187–210. https://doi.org/10.1111/j.1745-3984.2007.00034.x
  • Penfield, R. D. (2008). Three classes of nonparametric differential step functioning effect estimators. Applied Psychological Measurement, 32(6), 480–501. https://doi.org/10.1177/0146621607305399
  • Penfield, R. D. (2010). Distinguishing between net and global DIF in polytomous items. Journal of Educational Measurement, 47(2), 129–149. https://doi.org/10.1111/j.1745-3984.2010.00105.x
  • Penfield, R. D. (2013). DIFAS 5.0 differential item functioning analysis system user’s manual. https://soe.uncg.edu/wp-content/uploads/2015/12/DIFASManual_V5.pdf
  • Penfield, R. D., & Algina, J. (2003). Applying the Liu‐Agresti estimator of the cumulative common odds ratio to DIF detection in polytomous items. Journal of Educational Measurement, 40(4), 353–370. https://doi.org/10.1111/j.1745-3984.2003.tb01151.x
  • Penfield, R. D., Alvarez, K., & Lee, O. (2008). Using a taxonomy of differential step functioning to improve the interpretation of DIF in polytomous items: An illustration. Applied Measurement in Education, 22(1), 61–78. https://doi.org/10.1080/08957340802558367
  • Penfield, R. D., & Lam, T. C. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19(3), 5–15. https://onlinelibrary.wiley.com/doi/pdfdirect/10.1111/j.1745-3992.2000.tb00033.x
  • Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel‐Haenszel Type I error performance. Journal of Educational Measurement, 33(2), 215–230. https://www.jstor.org/stable/pdf/1435184.pdf
  • Sireci, S.G., & Rios, J.A. (2013). Decisions that make a difference in detecting differential item functioning. Educational Research and Evaluation, 19(2-3), 170-187. https://doi.org/10.1080/13803611.2013.767621
  • Somes, G. W. (1986). The generalized Mantel–Haenszel statistic. The American Statistician, 40(2), 106–108. https://www.jstor.org/stable/pdf/2684866.pdf
  • Wang, W. C., & Su, Y. H. (2004). Factors influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of differential item functioning in polytomous items. Applied Psychological Measurement, 28(6), 450–480. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=96cc44755a12838b2cde4401a0635aaa6b075768
  • Wood, S. W. (2011). Differential item functioning procedures for polytomous items when examinee sample sizes are small [Unpublished doctoral thesis]. The University of Iowa, USA.
  • Yandı, A. (2017). Comparison of the methods of examining measurement equivalence under different conditions in terms of statistical power ratios [Unpublished doctoral thesis]. Ankara University, Ankara.
  • Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language assessment quarterly, 4(2), 223–233. https://doi.org/10.1080/15434300701375832
  • Zwick, R. (2012). A review of ETS differential item functioning assessment procedures: Flagging rules, minimum sample size requirements, and criterion refinement. ETS Research Report Series, 2012(1), i-30. https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/j.2333-8504.2012.tb02290.x
  • Zwick, R., Donoghue, J. R., & Grima, A. (1993). Assessment of differential item functioning for performance tasks. Journal of educational measurement, 30(3), 233–251. https://doi.org/10.1111/j.1745-3984.1993.tb00425.x
There are 44 citations in total.

Details

Primary Language English
Subjects Testing, Assessment and Psychometrics (Other)
Journal Section Articles
Authors

Yasemin Kuzu 0000-0003-4301-2645

Selahattin Gelbal 0000-0001-5181-7262

Publication Date September 30, 2023
Acceptance Date September 9, 2023
Published in Issue Year 2023 Volume: 14 Issue: 3

Cite

APA Kuzu, Y., & Gelbal, S. (2023). Investigation of differential item and step functioning procedures in polytomus items. Journal of Measurement and Evaluation in Education and Psychology, 14(3), 200-221. https://doi.org/10.21031/epod.1221823