Research Article
BibTex RIS Cite
Year 2022, , 143 - 160, 01.09.2022
https://doi.org/10.17275/per.22.108.9.5

Abstract

References

  • Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (pp. 508-600). Washington: American Council on Education.
  • Arce-Ferrer, A. J., & Bulut, O. (2017). Investigating separate and concurrent approaches for item parameter drift in 3PL item response theory equating. International Journal of Testing, 17(1), 1-22. doi:10.1080/15305058.2016.1227825
  • Babcock, B., & Albano, A. D. (2012). Rasch scale stability in the presence of item parameter and trait drift. Applied Psychological Measurement, 36(7), 565–580. doi:10.1177/0146621612455090
  • Bulut, O., & Sunbul, O. (2017). Monte Carlo simulation studies in Item Response Theory with the R programming language. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 266-287. doi:10.21031/epod.305821
  • Bock, D. B., Muraki, E., & Pfeiffenberger, W. (1988). Item pool maintenance in the presence of item parameter drift. Journal of Educational Measurement, 25(4), 275-285. http://www.jstor.org/stable/1434961
  • Brown, A., & Croudace, T. J. (2015). Scoring and estimating score precision using multidimensional IRT. In S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (pp. 307-333). New York: Routledge/Taylor & Francis.
  • Cao, Y. (2008). Mixed-format test equating: Effects of test dimensionality and common-item sets (Doctoral Dissertation, University of Maryland). Retrieved from https://drum.lib.umd.edu/handle/1903/8843
  • Carsey, T. M., & Harden, J. J. (2014). Monte Carlo simulation and resampling methods for social sciences. doi:10.4135/9781483319605
  • Chen, Q. (2013). Remove or keep: Linking items showing item parameter drift (Unpublished doctoral dissertation). Michigan State University, Michigan.
  • Chon, K. H., Lee, W.-C., & Ansley, T. N. (2007). Assessing IRT model-data fit for mixed format tests (Research Report 26). Iowa: Center for Advanced Studies in Measurement and Assessment.
  • Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press.
  • DeMars, C. (2010). Item response theory: Understanding statistics, measurement. New York: Oxford University.
  • Deng, W., & Monfils, R. (2017). Long-term impact of valid case criterion on capturing population-level growth under item response theory equating (Research Report 17-17). Retrieved from https://doi.org/10.1002/ets2.12144
  • Felan, G. D. (2002, February). Test equating: Mean, linear, equipercentile and item response theory. Paper presented at the Annual Meeting of the Southwest Educational Research Association, Austin.
  • Gaertner, M. N., & Briggs, D. C. (2009). Detecting and addressing item parameter drift in IRT test equating contexts. Boulder, CO: Center for Assessment.
  • Guo, R., Zheng, Y., & Chang, H. (2015). A stepwise test characteristic curve method to detect item parameter drift. Journal of Educational Measurement, 52(3), 280-300. doi:10.1111/jedm.12077
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. California: Sage.
  • Han, K. T. (2008). Impact of item parameter drift on test equating and proficiency estimates (Doctoral dissertation, University of Massachusetts Amherst). Retrieved from http://scholarworks.umass.edu/dissertations/AAI3325324
  • Han, K. T., & Guo, F. (2011). Potential impact of item parameter drift due to practice and curriculum change on item calibration in computerized adaptive testing (Research Report 11-02). Reston, Virginia: Graduate Management Admission Council.
  • Han, K., & Wells, C. S. (2007, April). Impact of differential item functioning (DIF) on test equating and proficiency estimates. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Chicago.
  • Han, K. T., Wells, C. S., & Hambleton, R. K. (2015). Effect of adjusting pseudo-guessing parameter estimates on test scaling when item parameter drift is present. Practical Assessment, Research, and Evaluation, 20, 16. doi:10.7275/jyyy-wp74
  • Han, K. T., Wells, C. S., & Sireci, S. G. (2012). The impact of multidirectional item parameter drift on IRT scaling coefficients and proficiency estimates. Applied Measurement in Education, 25(2), 97-117. doi:10.1080/08957347.2012.660000
  • Hanson, B. A., & Béguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3–24. doi:10.1177/0146621602026001001
  • Hu, H., Rogers, W. T., & Vukmirovic, Z. (2008). Investigation of IRT-based equating methods in the presence of outlier common items. Applied Psychological Measurement, 32(4), 311–333. doi:10.1177/0146621606292215
  • Huang, C. Y., & Shyu, C. Y. (2003, April). The impact of item parameter drift on equating. Paper Presented at the Annual Meeting of the National Council on Measurement in Education, Chicago.
  • Jimenez, F. A. (2011). Effects of outlier item parameters on IRT characteristic curve linking methods under the common-item nonequivalent groups design (Unpublished master’s thesis). University of Florida, Florida.
  • Keller, L. A., & Keller, R. R. (2011). The long-term sustainability of different item response theory scaling methods. Educational and Psychological Measurement, 71(2), 362–379. doi:10.1177/0013164410375111
  • Kieftenbeld, V., & Natesan, P. (2012). Recovery of graded response model parameters: A comparison of marginal maximum likelihood and markov chain Monte Carlo estimation. Applied Psychological Measurement, 36(5), 399–419. doi:10.1177/0146621612446170
  • Kilmen, S. (2010). Comparison of equating errors estimated from test equation methods based on item response theory according to the sample size and ability distribution (Unpublished doctoral dissertation). Ankara University, Ankara.
  • Kim, S., Harris, D. J., & Kolen, M. J. (2010). Equating with polytomous item response models. In M. L. Nering & R. Ostini (Eds.), Handbook of polytomous item response theory models (pp. 257-291). New York: Routledge.
  • Kim, S., & Lee, W. (2004). IRT scale linking methods for mixed-format tests (Research Report 2004-5). Iowa: ACT.
  • Kim, S., & Lee, W.-C. (2006). An extension of four IRT linking methods for mixed‐format tests. Journal of Educational Measurement, 43(1), 53-76. doi:10.1111/j.1745-3984.2006.00004.x
  • Kim, S., Walker, M. E., & McHale, F. (2010). Comparisons among designs for equating mixed-format tests in large-scale assessments. Journal of Educational Measurement, 47(1), 36–53. http://www.jstor.org/stable/25651535
  • Kolen, M. J., & Brennan, R. L. (2004). Test equating, scalling and linking. New York: Springer.
  • Lee, H., & Geisinger, K.F. (2019). Item parameter drift in context questionnaires from international large-scale assessments. International Journal of Testing, 19(1), 23-51. doi:10.1080/15305058.2018.1481852
  • Li, X. (2008). An investigation of the item parameter drift in the examination for the certificate of proficiency in English (ECPE). Spaan Fellow Working Papers in Second or Foreign Language Assessment, 6, 1-28.
  • Li, Y. (2012). Examining the impact of drifted polytomous anchor items on test characteristic curve (TCC) linking and IRT true score equating (Research Report 12-09). New Jersey: Educational Testing Service.
  • Lim, H. (2020). irtplay: Unidimensional item response theory modeling (Version 1.6.2) (Computer software). Retrieved from https://CRAN.R-project.org/package=irtplay
  • Marengo, D., Miceli, R., Rosato, R., & Settanni, M. (2018). Placing multiple tests on a common scale using a post-test anchor design: Effects of item position and order on the stability of parameter estimates. Frontiers in Applied Mathematics and Statistics, 4, 50. doi:10.3389/fams.2018.00050
  • McCoy, K. M. (2009). The impact of item parameter drift on examinee ability measures in a computer adaptive environment (Unpublished doctoral dissertation). Champaign, IL: University of Illinois.
  • Meng, Y. (2012). Comparison of Kernel equating and item response theory equating methods (Doctoral dissertation). Available from ProQuest Dissertations and Theses database. (UMI No. 3518262)
  • Meng, H., Steinkamp, S., & Matthews-Lopez, J. (2010, May). An investigation of item parameter drift in computer adaptive testing. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Denver.
  • Messick, S. (1993). Trait equivalence as construct validity of score interpretation across multiple methods of measurement. In R. E. Bennett & W. C. Ward (Eds.), Construction versus choice in cognitive measurement: Issues in constructed response, performance testing, and portfolio assessment (pp. 61-73). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
  • Meyers, J. L., Miller, G. E., & Way, W. D. (2008). Item position and item difficulty change in an IRT-based common item equating design. Applied Measurement in Education, 22(1), 38–60. doi:10.1080/08957340802558342
  • Meyers, J. L., Murphy, S., Goodman, J., & Turhan, A. (2012, April). The impact of item position change on item parameters and common equating results under the 3PL model. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Vancouver, B.C.
  • Michaelides, M. P. (2010). A review of the effects on IRT item parameter estimates with a focus on misbehaving common items in test equating. Frontiers in Psychology, 1, 167. doi:10.3389/fpsyg.2010.00167
  • Miller, G. E., & Fitzpatrick, S. J. (2009). Expected equating error resulting from incorrect handling of item parameter drift among the common items. Educational and Psychological Measurement, 69(3), 357-368.
  • Mooney, C. Z. (1997). Monte Carlo simulation (Series No. 07-116). doi:10.4135/9781412985116
  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. doi:10.1177%2F014662169201600206
  • Penfield, R. D., Alvarez, K., & Lee, O. (2008). Using a taxonomy of differential step functioning to improve the interpretation of DIF in polytomous items: An illustration. Applied Measurement in Education, 22(1), 61-78. doi:10.1080/08957340802558367
  • R Development Core Team. (2021). R: A language and environment for statistical computing (Cersion 4.0.5) (Computer software). R Foundation for Statistical Computing.
  • Rupp, A. A., & Zumbo, B. D. (2003a, April). Bias coefficients for lack of invariance in unidimensional IRT models. Paper presented at the annual meeting of the National Council of Measurement in Education, Chicago.
  • Rupp, A. A., & Zumbo, B. D. (2003b). Which model is best? Robustness properties to justify model choice among unidimensional IRT models under item parameter drift. Alberta Journal of Educational Research, 49(3), 264-276. doi: 10.11575/ajer.v49i3.54984
  • Sass, D. A., Schmitt, T. A., & Walker, C. M. (2008). Estimating non-normal latent trait distributions within item response theory using true and estimated item parameters. Applied Measurement in Education, 21(1), 65-88. doi:10.1080/08957340701796415
  • Skaggs, G., & Lissitz, R. (1986). IRT test equating: Relevant issues and a review of recent research. Review of Educational Research, 56(4), 495-529. doi:10.2307/1170343
  • Stahl, J. A., & Muckle, T. (2007, April). Investigating displacement in the Winsteps Rasch calibration application. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago.
  • Stocking, M. L., & Lord, F. M. (1982). Developing a common metric in item response theory (Research Report 82-25-ONR). Retrieved from https://doi.org/10.1002/j.2333-8504.1982.tb01311.x
  • Sukin, T. M. (2010). Item parameter drift as an indication of differential opportunity to learn: An exploration of item flagging methods & accurate classification of examinees (Doctoral dissertation, University of Massachusetts Amherst). Retrieved from https://scholarworks.umass.edu/open_access_dissertations/301
  • Tate, R. (2000). Performance of a proposed method for the linking of mixed format tests with constructed response and multiple choice items. Journal of Educational Measurement, 37(4), 329-346. doi:10.1111/j.1745-3984.2000.tb01090.x
  • Tian, F. (2011). A comparison of equating/linking using the Stocking-Lord method and concurrent calibration with mixed-format tests in the non-equivalent groups common-item design under IRT (Doctoral dissertation, Boston College). Retrieved from http://hdl.handle.net/2345/2370
  • Wang, W., Drasgow, F., & Liu, L. (2016). Classification accuracy of mixed format tests: A bi-factor item response theory approach. Frontiers in Psychology, 7, 270. doi:10.3389/fpsyg.2016.00270
  • Weeks, J. P. (2010). plink: An R package for linking mixed-format tests using IRT-based methods (Version 1.5-1) (Computer software). Journal of Statistical Software, 35(12), 1-33. doi:10.18637/jss.v035.i12
  • Wells, C. S., Subkoviak, M. J., & Serlin, R. C. (2002). The effect of item parameter drift on examinee ability estimates. Applied Psychological Measurement, 26(1), 77–87. doi:10.1177/0146621602261005
  • Wollack, J. A., Sung, H. J., & Kang, T. (2005, April). Longitudinal effects of item parameter drift. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Montreal, Canada.
  • Wollack, J. A., Sung, H. J., & Kang, T. (2006, April). The impact of compounding item parameter drift on ability estimation. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Francisco, CA.

Effect of Item Parameter Drift in Mixed Format Common Items on Test Equating

Year 2022, , 143 - 160, 01.09.2022
https://doi.org/10.17275/per.22.108.9.5

Abstract

The aim of the study was to examine the common items in the mixed format (e.g., multiple-choices and essay items) contain parameter drifts in the test equating processes performed with the common item non-equivalent groups design. In this study, which was carried out using Monte Carlo simulation with a fully crossed design, the factors of test length (30 and 50), sample size (1000 and 3000), common item ratio (30 and 40%), ratio of items with item parameter drift (IPD) in common items (20 and 30%), location of common items in tests (at the beginning, randomly distributed, and at the end) and IPD size in multiple-choice items (low [0.2] and high [1.0]) were studied. Four test forms were created, and two test forms do not contain parameter drifts. After the parameter drift was performed on the first of the other two test forms, the parameter drift was again performed on the second test form. Test equating results were compared using the root mean squared error (RMSE) value. As a result of the research, ratio of items with IPD in common items, IPD size in multiple-choice items, common item ratio, sample size and test length on equating errors were found to be significant.

References

  • Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (pp. 508-600). Washington: American Council on Education.
  • Arce-Ferrer, A. J., & Bulut, O. (2017). Investigating separate and concurrent approaches for item parameter drift in 3PL item response theory equating. International Journal of Testing, 17(1), 1-22. doi:10.1080/15305058.2016.1227825
  • Babcock, B., & Albano, A. D. (2012). Rasch scale stability in the presence of item parameter and trait drift. Applied Psychological Measurement, 36(7), 565–580. doi:10.1177/0146621612455090
  • Bulut, O., & Sunbul, O. (2017). Monte Carlo simulation studies in Item Response Theory with the R programming language. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 266-287. doi:10.21031/epod.305821
  • Bock, D. B., Muraki, E., & Pfeiffenberger, W. (1988). Item pool maintenance in the presence of item parameter drift. Journal of Educational Measurement, 25(4), 275-285. http://www.jstor.org/stable/1434961
  • Brown, A., & Croudace, T. J. (2015). Scoring and estimating score precision using multidimensional IRT. In S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (pp. 307-333). New York: Routledge/Taylor & Francis.
  • Cao, Y. (2008). Mixed-format test equating: Effects of test dimensionality and common-item sets (Doctoral Dissertation, University of Maryland). Retrieved from https://drum.lib.umd.edu/handle/1903/8843
  • Carsey, T. M., & Harden, J. J. (2014). Monte Carlo simulation and resampling methods for social sciences. doi:10.4135/9781483319605
  • Chen, Q. (2013). Remove or keep: Linking items showing item parameter drift (Unpublished doctoral dissertation). Michigan State University, Michigan.
  • Chon, K. H., Lee, W.-C., & Ansley, T. N. (2007). Assessing IRT model-data fit for mixed format tests (Research Report 26). Iowa: Center for Advanced Studies in Measurement and Assessment.
  • Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press.
  • DeMars, C. (2010). Item response theory: Understanding statistics, measurement. New York: Oxford University.
  • Deng, W., & Monfils, R. (2017). Long-term impact of valid case criterion on capturing population-level growth under item response theory equating (Research Report 17-17). Retrieved from https://doi.org/10.1002/ets2.12144
  • Felan, G. D. (2002, February). Test equating: Mean, linear, equipercentile and item response theory. Paper presented at the Annual Meeting of the Southwest Educational Research Association, Austin.
  • Gaertner, M. N., & Briggs, D. C. (2009). Detecting and addressing item parameter drift in IRT test equating contexts. Boulder, CO: Center for Assessment.
  • Guo, R., Zheng, Y., & Chang, H. (2015). A stepwise test characteristic curve method to detect item parameter drift. Journal of Educational Measurement, 52(3), 280-300. doi:10.1111/jedm.12077
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. California: Sage.
  • Han, K. T. (2008). Impact of item parameter drift on test equating and proficiency estimates (Doctoral dissertation, University of Massachusetts Amherst). Retrieved from http://scholarworks.umass.edu/dissertations/AAI3325324
  • Han, K. T., & Guo, F. (2011). Potential impact of item parameter drift due to practice and curriculum change on item calibration in computerized adaptive testing (Research Report 11-02). Reston, Virginia: Graduate Management Admission Council.
  • Han, K., & Wells, C. S. (2007, April). Impact of differential item functioning (DIF) on test equating and proficiency estimates. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Chicago.
  • Han, K. T., Wells, C. S., & Hambleton, R. K. (2015). Effect of adjusting pseudo-guessing parameter estimates on test scaling when item parameter drift is present. Practical Assessment, Research, and Evaluation, 20, 16. doi:10.7275/jyyy-wp74
  • Han, K. T., Wells, C. S., & Sireci, S. G. (2012). The impact of multidirectional item parameter drift on IRT scaling coefficients and proficiency estimates. Applied Measurement in Education, 25(2), 97-117. doi:10.1080/08957347.2012.660000
  • Hanson, B. A., & Béguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3–24. doi:10.1177/0146621602026001001
  • Hu, H., Rogers, W. T., & Vukmirovic, Z. (2008). Investigation of IRT-based equating methods in the presence of outlier common items. Applied Psychological Measurement, 32(4), 311–333. doi:10.1177/0146621606292215
  • Huang, C. Y., & Shyu, C. Y. (2003, April). The impact of item parameter drift on equating. Paper Presented at the Annual Meeting of the National Council on Measurement in Education, Chicago.
  • Jimenez, F. A. (2011). Effects of outlier item parameters on IRT characteristic curve linking methods under the common-item nonequivalent groups design (Unpublished master’s thesis). University of Florida, Florida.
  • Keller, L. A., & Keller, R. R. (2011). The long-term sustainability of different item response theory scaling methods. Educational and Psychological Measurement, 71(2), 362–379. doi:10.1177/0013164410375111
  • Kieftenbeld, V., & Natesan, P. (2012). Recovery of graded response model parameters: A comparison of marginal maximum likelihood and markov chain Monte Carlo estimation. Applied Psychological Measurement, 36(5), 399–419. doi:10.1177/0146621612446170
  • Kilmen, S. (2010). Comparison of equating errors estimated from test equation methods based on item response theory according to the sample size and ability distribution (Unpublished doctoral dissertation). Ankara University, Ankara.
  • Kim, S., Harris, D. J., & Kolen, M. J. (2010). Equating with polytomous item response models. In M. L. Nering & R. Ostini (Eds.), Handbook of polytomous item response theory models (pp. 257-291). New York: Routledge.
  • Kim, S., & Lee, W. (2004). IRT scale linking methods for mixed-format tests (Research Report 2004-5). Iowa: ACT.
  • Kim, S., & Lee, W.-C. (2006). An extension of four IRT linking methods for mixed‐format tests. Journal of Educational Measurement, 43(1), 53-76. doi:10.1111/j.1745-3984.2006.00004.x
  • Kim, S., Walker, M. E., & McHale, F. (2010). Comparisons among designs for equating mixed-format tests in large-scale assessments. Journal of Educational Measurement, 47(1), 36–53. http://www.jstor.org/stable/25651535
  • Kolen, M. J., & Brennan, R. L. (2004). Test equating, scalling and linking. New York: Springer.
  • Lee, H., & Geisinger, K.F. (2019). Item parameter drift in context questionnaires from international large-scale assessments. International Journal of Testing, 19(1), 23-51. doi:10.1080/15305058.2018.1481852
  • Li, X. (2008). An investigation of the item parameter drift in the examination for the certificate of proficiency in English (ECPE). Spaan Fellow Working Papers in Second or Foreign Language Assessment, 6, 1-28.
  • Li, Y. (2012). Examining the impact of drifted polytomous anchor items on test characteristic curve (TCC) linking and IRT true score equating (Research Report 12-09). New Jersey: Educational Testing Service.
  • Lim, H. (2020). irtplay: Unidimensional item response theory modeling (Version 1.6.2) (Computer software). Retrieved from https://CRAN.R-project.org/package=irtplay
  • Marengo, D., Miceli, R., Rosato, R., & Settanni, M. (2018). Placing multiple tests on a common scale using a post-test anchor design: Effects of item position and order on the stability of parameter estimates. Frontiers in Applied Mathematics and Statistics, 4, 50. doi:10.3389/fams.2018.00050
  • McCoy, K. M. (2009). The impact of item parameter drift on examinee ability measures in a computer adaptive environment (Unpublished doctoral dissertation). Champaign, IL: University of Illinois.
  • Meng, Y. (2012). Comparison of Kernel equating and item response theory equating methods (Doctoral dissertation). Available from ProQuest Dissertations and Theses database. (UMI No. 3518262)
  • Meng, H., Steinkamp, S., & Matthews-Lopez, J. (2010, May). An investigation of item parameter drift in computer adaptive testing. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Denver.
  • Messick, S. (1993). Trait equivalence as construct validity of score interpretation across multiple methods of measurement. In R. E. Bennett & W. C. Ward (Eds.), Construction versus choice in cognitive measurement: Issues in constructed response, performance testing, and portfolio assessment (pp. 61-73). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
  • Meyers, J. L., Miller, G. E., & Way, W. D. (2008). Item position and item difficulty change in an IRT-based common item equating design. Applied Measurement in Education, 22(1), 38–60. doi:10.1080/08957340802558342
  • Meyers, J. L., Murphy, S., Goodman, J., & Turhan, A. (2012, April). The impact of item position change on item parameters and common equating results under the 3PL model. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Vancouver, B.C.
  • Michaelides, M. P. (2010). A review of the effects on IRT item parameter estimates with a focus on misbehaving common items in test equating. Frontiers in Psychology, 1, 167. doi:10.3389/fpsyg.2010.00167
  • Miller, G. E., & Fitzpatrick, S. J. (2009). Expected equating error resulting from incorrect handling of item parameter drift among the common items. Educational and Psychological Measurement, 69(3), 357-368.
  • Mooney, C. Z. (1997). Monte Carlo simulation (Series No. 07-116). doi:10.4135/9781412985116
  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. doi:10.1177%2F014662169201600206
  • Penfield, R. D., Alvarez, K., & Lee, O. (2008). Using a taxonomy of differential step functioning to improve the interpretation of DIF in polytomous items: An illustration. Applied Measurement in Education, 22(1), 61-78. doi:10.1080/08957340802558367
  • R Development Core Team. (2021). R: A language and environment for statistical computing (Cersion 4.0.5) (Computer software). R Foundation for Statistical Computing.
  • Rupp, A. A., & Zumbo, B. D. (2003a, April). Bias coefficients for lack of invariance in unidimensional IRT models. Paper presented at the annual meeting of the National Council of Measurement in Education, Chicago.
  • Rupp, A. A., & Zumbo, B. D. (2003b). Which model is best? Robustness properties to justify model choice among unidimensional IRT models under item parameter drift. Alberta Journal of Educational Research, 49(3), 264-276. doi: 10.11575/ajer.v49i3.54984
  • Sass, D. A., Schmitt, T. A., & Walker, C. M. (2008). Estimating non-normal latent trait distributions within item response theory using true and estimated item parameters. Applied Measurement in Education, 21(1), 65-88. doi:10.1080/08957340701796415
  • Skaggs, G., & Lissitz, R. (1986). IRT test equating: Relevant issues and a review of recent research. Review of Educational Research, 56(4), 495-529. doi:10.2307/1170343
  • Stahl, J. A., & Muckle, T. (2007, April). Investigating displacement in the Winsteps Rasch calibration application. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago.
  • Stocking, M. L., & Lord, F. M. (1982). Developing a common metric in item response theory (Research Report 82-25-ONR). Retrieved from https://doi.org/10.1002/j.2333-8504.1982.tb01311.x
  • Sukin, T. M. (2010). Item parameter drift as an indication of differential opportunity to learn: An exploration of item flagging methods & accurate classification of examinees (Doctoral dissertation, University of Massachusetts Amherst). Retrieved from https://scholarworks.umass.edu/open_access_dissertations/301
  • Tate, R. (2000). Performance of a proposed method for the linking of mixed format tests with constructed response and multiple choice items. Journal of Educational Measurement, 37(4), 329-346. doi:10.1111/j.1745-3984.2000.tb01090.x
  • Tian, F. (2011). A comparison of equating/linking using the Stocking-Lord method and concurrent calibration with mixed-format tests in the non-equivalent groups common-item design under IRT (Doctoral dissertation, Boston College). Retrieved from http://hdl.handle.net/2345/2370
  • Wang, W., Drasgow, F., & Liu, L. (2016). Classification accuracy of mixed format tests: A bi-factor item response theory approach. Frontiers in Psychology, 7, 270. doi:10.3389/fpsyg.2016.00270
  • Weeks, J. P. (2010). plink: An R package for linking mixed-format tests using IRT-based methods (Version 1.5-1) (Computer software). Journal of Statistical Software, 35(12), 1-33. doi:10.18637/jss.v035.i12
  • Wells, C. S., Subkoviak, M. J., & Serlin, R. C. (2002). The effect of item parameter drift on examinee ability estimates. Applied Psychological Measurement, 26(1), 77–87. doi:10.1177/0146621602261005
  • Wollack, J. A., Sung, H. J., & Kang, T. (2005, April). Longitudinal effects of item parameter drift. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Montreal, Canada.
  • Wollack, J. A., Sung, H. J., & Kang, T. (2006, April). The impact of compounding item parameter drift on ability estimation. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Francisco, CA.
There are 65 citations in total.

Details

Primary Language English
Subjects Other Fields of Education
Journal Section Research Articles
Authors

İbrahim Uysal 0000-0002-6767-0362

Merve Şahin Kürşad 0000-0002-6591-0705

Abdullah Faruk Kılıç 0000-0003-3129-1763

Publication Date September 1, 2022
Acceptance Date May 20, 2022
Published in Issue Year 2022

Cite

APA Uysal, İ., Şahin Kürşad, M., & Kılıç, A. F. (2022). Effect of Item Parameter Drift in Mixed Format Common Items on Test Equating. Participatory Educational Research, 9(5), 143-160. https://doi.org/10.17275/per.22.108.9.5