Research Article
BibTex RIS Cite

Year 2025, Volume: 12 Issue: 3, 629 - 661, 04.09.2025
https://doi.org/10.21449/ijate.1562627

Abstract

References

  • Aiken, L.R. (2000). Psychological testing and assesment (10th ed.). Allyn and Bacon.
  • Aksekioğlu, B. (2017). Madde tepki kuramına dayalı test eşitleme yöntemlerinin karşılaştırılması: PISA 2012 fen testi örneği [Comparison of test equating methods based on item response theory: PISA 2012 science test sample]. [Master's Thesis, Akdeniz University]. Higher Education Institution National Thesis Center.
  • Akour, M.M.M. (2006). A comparison of various equipercentile and kernel equating methods under the random groups design. [Doctoral Dissertation, Graduate College of The University of Iowa]. ProQuest Dissertations and Theses Global.
  • Albano, A.D. (2016). equate: An R package for observed-score linking and equating. Journal of Statistical Software, 74, 1-36. https://doi.org/10.18637/jss.v074.i08
  • Angoff, W.H. (1984). Scales, norms, and equivalent scores. Educational Testing Service.
  • Aşiret, S. (2014). Küçük Örneklemlerde test eşitleme yöntemlerinin çeşitli faktörlere göre incelenmesi [Factors affecting the test equating method using small samples]. [Master's Thesis, Mersin University]. Higher Education Institution National Thesis Center.
  • Atar, B., & Yeşiltaş, G. (2017) Çok boyutlu eşitleme yöntemlerinin eşdeğer olmayan gruplarda ortak madde deseni için performanslarının incelenmesi [Investigation of the performance of multidimensional equating procedures for common-item nonequivalent groups design]. Journal of Measurement and Evaluation in Education and Psychology, 8(4), 421-434. https://doi.org/10.21031/epod.335284
  • Baker, F.B., & Al‐Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147-162.
  • Battauz, M. (2015). equateIRT: An R package for IRT test equating. Journal of Statistical Software, 68, 1-22. https://doi.org/10.18637/jss.v068.i07
  • Brossman, B.G., & Lee, W.C. (2013). Observed score and true score equating procedures for multidimensional item response theory. Applied Psychological Measurement, 37(6), 460-481. https://doi.org/10.1177/0146621613484083
  • Chalmers, R.P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of statistical Software, 48, 1-29. https://doi.org/10.18637/jss.v048.i06
  • Chalmers, R.P. (2016). mirtCAT: Computerized adaptive testing with multidimensional item response theory. Journal of statistical Software, 71(5), 1 38. https://doi.org/10.18637/jss.v071.i05
  • Chen, J. (2014). Model selection for IRT equating of testlet-based tests in the random groups design. [Doctoral Dissertation, Graduate College of The University of Iowa]. ProQuest Dissertations and Theses Global.
  • Chen, W.H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265 289. https://doi.org/10.1177/0265532220927487
  • Choi, J. (2019). Comparison of MIRT observed score equating methods under the common-item nonequivalent groups design. [Doctoral Dissertation, Graduate College of The University of Iowa]. ProQuest Dissertations and Theses Global.
  • Cook, L.L., & Eignor, D.R. (1991). An NCME module on IRT Equating methods. Educational Measurement: Issues and Practice, 10(3), 191 199. https://doi.org/10.1111/j.1745 3992.1991.tb00207.x
  • Çörtük, M. (2022). Çok kategorili puanlanan maddelerden oluşan testlerde klasik test kuramı ve madde tepki kuramına dayalı test eşitleme yöntemlerinin karşılaştırılması [Comparison of test equating methods based on classical test theory and item response theory in polytomously scored tests]. [Master's Thesis, Akdeniz University]. Higher Education Institution National Thesis Center.
  • Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Harcourt Brace Javonich College.
  • Cui, Z. (2006). Two new alternative smoothing methods in equating: The cubic B-spline presmoothing method and the direct presmoothing method. [Doctoral dissertation, Graduate College of The University of Iowa]. ProQuest Dissertations & Theses Global.
  • De Gruijter, D.N., & Leo, J.T. (2007). Statistical test theory for the behavioral sciences. Chapman and Hall/CRC.
  • DeMars, C. (2010). Item response theory. Oxford University.
  • Demir, S., & Güler, N. (2014). Ortak maddeli denk olmayan gruplar desenine ilişkin test eşitleme çalışması [Study of test equating on the common item nonequivalent group design]. International Journal of Human Sciences, 11(2), 190-208.
  • Donlon, T.F. (1984). The College Board technical handbook for the scholastic aptitude test and achievement tests. College Entrance Examination Board.
  • Finch, H., French, B.F., & Immekus, J.C. (2014). Applied psychometrics using SAS. IAP.
  • Gök, B. , & Kelecioğlu, H. (2014). Denk olmayan gruplarda ortak madde deseni kullanılarak madde tepki kuramına dayalı eşitleme yöntemlerinin karşılaştırılması [Comparison of IRT equating methods using the common-item nonequivalent groups design]. Mersin University Journal of the Faculty of Education, 10(1), 120-136
  • Gübeş, N.Ö. (2019). Test eşitlemede çok boyutluluğun eş zamanlı ve ayrı kalibrasyona etkisi [The effect of multidimensionality on concurrent and separate calibration in test equating]. Hacettepe University Journal of Education, 34(4), 1061 1074. https://doi.org/10.16986/HUJE.2019049186
  • Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22 (3), 144-149.
  • Hagge, S.L. (2010). The impact of equating method and format representation of common items on the adequacy of mixed-format test equating using nonequivalent groups. [Doctoral Dissertation, Graduate College of The University of Iowa]. ProQuest Dissertations and Theses Global.
  • Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response theory. Sage.
  • Han, T., Kolen, M., & Pohlmann, J. (1997). A comparison among IRT true-and observed-score equatings and traditional equipercentile equating. Applied Measurement in Education, 10(2), 105-121.
  • Hanson, B.A., & Béguin, A.A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied psychological measurement, 26(1), 3-24.
  • Hanson, B.A., Zeng, L., & Colton, D.A. (1994). A comparison of presmoothing and postsmoothing methods in equipercentile equating (No. 94). American College Testing Program.
  • Harris, D.J., & Crouse, J.D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6 (3),195-240. https://doi.org/10.1207/s15324818ame0603_3
  • Kahraman, N. (2013). Unidimensional interpretations for multidimensional test items. Journal of Educational Measurement, 50(2), 227-246. https://doi.org/10.1111/jedm.12012
  • Kahraman, N., & Kamata, A. (2004). Increasing the precision of subscale scores by usingout-of scale information. Applied Psychological Measurement, 28, 407 426. https://doi.org/10.1177/0146621604268736
  • Kahraman, N., & Thompson, T. (2011). Relating unidimensional IRT parameters to a multidi-mensional response space: A review of two alternative projection IRT models for subscalescores. Journal of Educational Measurement, 48, 146 164. https://doi.org/10.1111/j.1745-3984.2011.00138.x
  • Karagül, A.E. (2020). Küçük örneklemlerde çok kategorili puanlanan maddelerden oluşan testlerde klasik test eşitleme yöntemlerinin karşılaştırılması [Comparison of classical test equating methods with polytomously scored tests and small samples]. [Master’s thesis, Ankara University]. Higher Education Institution National Thesis Center.
  • Karkee, T.B., & Wright, K.R. (2004, April). Evaluation of linking methods for placing threeparameter logistic ıtem parameter estimates onto a one-parameter scale. Paper presented at the Annual Meeting of the American Educational Research Association in San Diego, California.
  • Kilmen, S. (2010). Comparison of equating errors estimated from test equation methods based on ıtem response theory according to the sample size and ability distribution. [Doctoral Dissertation, Ankara University]. Higher Education Institution National Thesis Center.
  • Kim, S.Y. (2018). Simple structure MIRT equating for multidimensional tests. [Doctoral Dissertation, Graduate College of The University of Iowa]. ProQuest Dissertations & Theses Global.
  • Kim, S.H., & Cohen, A.S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22(4), 345-355.
  • Kim, S.Y., Lee, W.C., & Kolen, M.J. (2020). Simple-structure multidimensional item response theory equating for multidimensional tests. Educational and Psychological Measurement, 80(1), 91-125. https://doi.org/10.1177/0013164419854208
  • Kline, R.B. (2015). Principles and practice of structural equation modeling. Guilford.
  • Kolen, M.J., & Brennan, R.L. (2014). Test equating, scaling, and linking: Methods and practices. Springer. https://doi.org/10.1007/978-1-4939-0317-7
  • Kolen, M.J., & Hendrickson, A.B. (2013). Scaling, norming, and equating. In K. F. Geisinger et al. (Eds.), In APA handbook of testing and assessment in psychology, Vol. 1: Test theory and testing and assessment in industrial and organizational psychology (pp. 201-222). American Psychological Association.
  • Kumlu, G. (2019). Test ve alt testlerde eşitlemenin farklı koşullar açısından incelenmesi [An investigation of test and sub-tests equating in terms of different conditions]. [Doctoral Dissertation, Hacettepe University]. Higher Education Institution National Thesis Center.
  • Lee, W.C., & Ban, J.C. (2009). A comparison of IRT linking procedures. Applied Measurement in Education, 23(1), 23-48. https://doi.org/10.1080/08957340903423537
  • Lee, W.C., & Brossman, B.G. (2012). Observed score equating for mixed-format tests using a simple-structure multidimensional IRT framework. In M. J. Kolen & W. C. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Volume 2) (CASMA Monograph No. 2.2.) Center for Advanced Studies in Measurement and Assessment, The University of Iowa. http://www.education.uiowa.edu/casma
  • Lee, G., & Lee, W.C. (2016). Bi-factor MIRT observed-score equating for mixed-format tests. Applied Measurement in Education, 29(3), 224 241. https://doi.org/10.1080/08957347.2016.1171770
  • Lee, E., Lee, W., & Brennan, R.L. (2014). Equating multidimensional tests under a random groups design: A comparison of various equating procedures. (CASMA Research Report No. 40). Iowa City, IA: Center for Advanced Studies in Measurement and Assessment, The University of Iowa. http://www.education.uiowa.edu/casma
  • Lee, G., Lee, W., Kolen, M.J., Park, I. –Y., Kim, D.I., & Yang, J.S. (2015). Bi-factor MIRT true-score equating for testlet-based tests. Journal of Educational Evaluation, 28, 681-700.
  • Li, Y.H., & Lissitz, R.W. (2000). An evaluation of the accuracy of multidimensional IRT linking. Applied Psychological Measurement, 24(2), 115-138.
  • Lim, E. (2016). Subscore equating with the random groups design. [Doctoral dissertation, Graduate College of The University of Iowa]. ProQuest Dissertations & Theses Global.
  • Liu, C., & Kolen, M.J. (2011). A comparison among IRT equating methods and traditional equating methods for mixed-format tests. Mixed-format tests: Psychometric properties with a primary focus on equating, 1, 75-94.
  • Livingston, S.A. (1993). Small-sample equating with log-linear smoothing. Journal of Educational Measurement,30(1), 23–29.
  • Livingston, S.A. (2014). Equating test scores (without IRT) (2th ed.). Educational testing service, ETS.
  • Livingston, S.A., & Kim, S. (2009). The circle-arc method for equating in small samples. Journal of Educational Measurement,46(3), 330 343. https://doi.org/10.1111/j.1745 3984.2009.00084.x
  • Lord, F.M. (1980) Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates.
  • Lord, F.M., & Novick M.R. (1968). Statistical theories of mental test scores. Addison-Wesley.
  • Loyd, B.H., & Hoover, H.D. (1980). Vertical Equating Using the Rasch Model. Journal of Educational Measurement, 17 (3), 179-193.
  • Marco, G.L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14(2), 139-160.
  • McDonald R.P. (2000). A basis for multidimensional item response theory. Applied Psychological Measurement, 24, 99-114.
  • Mutluer, C. (2021). Klasik test kuramına ve madde tepki kuramına dayalı test eşitleme yöntemlerinin karşılaştırması: Uluslararası öğrenci değerlendirme programı (PISA) 2012 matematik testi örneği [Comparison of test equating methods based on Classical Test Theory and Item Response Theory: International Student Assessment Program (PISA) 2012 mathematics test case]. [Doctoral dissertation, Gazi University]. Higher Education Institution National Thesis Center.
  • Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51(1), 1-23.
  • Öztürk, N., & Anıl, D. (2012). Akademik personel ve lisansüstü eğitimi giriş sınavı puanlarının eşitlenmesi üzerine bir çalışma [A study on equating academic staff and graduate education entrance examination scores]. Eğitim ve Bilim, 37(165), 180-193.
  • Petersen, N.S., Cook, L.L., & Stocking, M.L. (1983). IRT versus conventional equating methods: A comparative study of scale stability. Journal of Educational Statistics, 8(2), 137-156.
  • Peterson, J. (2014). Multidimensional item response theory observed score equating methods for mixed-format tests. [Doctoral dissertation, Graduate College of The University of Iowa].University of Iowa’s Institutional Repository. https://ir.uiowa.edu/cgi/viewcontent.cgi?article=5418&context=etd
  • Peterson, J., & Lee, W. (2014). Multidimensional item response theory observed score equating methods for mixed-format tests. In M. J. Kolen & W. Lee (Eds.), Mixedformat tests: Psychometric properties with a primary focus on equating (Volume 2) (CASMA Monograph No. 2.3). Center for Advanced Studies in Measurement and Assessment, The University of Iowa. http://www.education.uiowa.edu/casma
  • Powers, S.J., Hagge, S.L., Wang, W., He, Y., Liu, C., & Kolen, M.J. (2011). Effects of group differences on mixed-format equating, In M. J. Kolen & W. C. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Volume 1, pp. 51-73). Center for Advanced Studies in Measurement and Assessment, The University of Iowa. http://www.education.uiowa.edu/casma.
  • R Core Team (2019). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing, Vienna, Austria. http://www.R project.org/
  • Salmaner Doğan, R., & Tan, Ş. (2022). Madde tepki kuramında eşitleme hatalarının belirlenmesinde kullanılan delta ve bootstrap yöntemlerinin çeşitli değişkenlere göre incelenmesi [Investigation of delta and bootstrap methods for calculating error of test equation in IRT in terms of some variables]. Gazi University Journal of Gazi Educational Faculty (GUJGEF), 42(2), 1053-1081. https://doi.org/10.17152/gefad.913241
  • Sansivieri, V., Wiberg, M., & Matteucci, M. (2017). A review of test equating methods with a special focus on IRT based approaches. Statistica, 77(4), 329 352. https://doi.org/10.6092/issn.1973-2201/7066
  • Sass D.A., & Schmitt T.A. (2010). A comparative investigation of rotation criteria within exploratory factor analysis. Multivariate Behavioral Research, 45, 73 103. https://doi.org/10.1080/00273170903504810
  • Sireci, S.G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet‐based tests. Journal of Educational Measurement, 28(3), 237-247.
  • Skaggs, G. (2005). Accuracy of random groups equating with very small samples. Journal of Educational Measurement, 42(4), 309-330
  • Stocking, M.L., & Lord, F.M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7 (2), 201-210.
  • Tabachnick, B.G., & Fidell, L.S. (2013). Using multivariate statistics ( 6th ed.). Pearson.
  • Tan, Ş. (2015). Küçük örneklemlerde beta4 ve polynomial loglineer öndüzgünleştirme ve kübik eğri sondüzgünleştirme metotlarının uygunluğu [Accuracy of beta4 presmoothing polynomial loglineer presmoothing and cubic spline postsmoothing methods for small samples]. Gazi University Journal of Gazi Educational Faculty (GUJGEF), 35(1), 123-151.
  • Tanberkan Suna, H. (2018). Grup değişmezliği özelliğinin farklı eşitleme yöntemlerinde eşitleme fonksiyonları üzerindeki etkisi [The effect of group invariance property on equating functions obtained through various equating methods]. [Doctoral dissertation, Gazi University]. Higher Education Institution National Thesis Center.
  • Tao, W., & Cao, Y. (2016). An extension of IRT-based equating to the dichotomous testlet response theory model. Applied Measurement in Education, 29(2), 108 121. https://doi.org/10.1080/08957347.2016.1138956
  • Tsai, T.H. (1997, March). Estimating minimum sample sizes in random groups equating. Annual Meeting of the National Council on Measurement in Education, Chicago, IL.
  • Tuerlinckx, F., & De Boeck, P. (2001). The effect of ignoring item interactions on the estimated discrimination parameters in item response theory. Psychological methods, 6(2), 181.
  • Uğurlu, S. (2020). Comparison of equating methods for multidimensional tests which contain items with differential item functioning. [Doctoral dissertation, Hacettepe University]. Higher Education Institution National Thesis Center.
  • Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 Law School Admissions Test as an example. Applied Measurement in Education, 8(2), 157-86.
  • Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability?. Educational Measurement: Issues and Practice, 15(1), 22-29.
  • Wainer, H., & Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203-220.
  • Wang, S., Zhang, M., & You, S. (2020). A comparison of IRT observed score kernel equating and several equating methods. Frontiers in psychology, 11, 308.
  • Wang, X. (2012). Effect of sample size on ırt equating of uni-dimensional tests in common item non-equivalent group design: A monte carlo simulation study. [Doctoral Dissertation, Graduate College of The University of Virginia Tech]. ProQuest Dissertations and Theses Global.
  • Way, W.D., Ansley, T.N., & Forsyth, R.A. (1988). The comparative effects of compensatory and noncompensatory two-dimensional data on unidimensional IRT estimates. Applied Psychological Measurement, 12(3), 239 252. https://doi.org/10.1177/014662168801200303
  • Woodruff, D.J. (1989). A comparison of three linear equating methods for the common-item nonequivalent-populations design. Applied psychological measurement, 13(3), 257- 262.
  • Yen, W.M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145.
  • Yen, W.M. (1993). Scaling performance assessments: Strategies for managing local item independence. Journal of Educational Measurement, 30, 187–213.
  • Zhang, Z. (2010). Comparison of different equating methods and an application to link testlet-based tests. [Doctoral Dissertation, Graduate College of The University of Chinese]. ProQuest Dissertations and Theses Global.

Examining the impact of violations of local item independence assumption on test equating methods

Year 2025, Volume: 12 Issue: 3, 629 - 661, 04.09.2025
https://doi.org/10.21449/ijate.1562627

Abstract

This study investigates the impact of violating the local item independence assumption by loading certain items onto a second dimension on test equating errors in unidimensional and dichotomous tests. The research was designed as a simulation study, using data generated based on the PISA 2018 mathematics exam. Analyses were conducted under 36 different conditions, varying by sample sizes (250, 1000, and 5000), test lengths (20, 40, and 60 items), and proportions of items loaded onto the second dimension (0%, 15%, 30%, and 50%). A "random groups design" was used, resulting in the creation of 3600 datasets through 100 replications. The results revealed that the equating methods based on classical test theory (CTT) showed varying levels of error depending on the error types and conditions. Among the item response theory (IRT) scale transformation methods, the Stocking-Lord method produced the least error values and was the least affected by violations of the local independence assumption. Additionally, the observed score equating method demonstrated lower root mean square error (RMSE) values than the true score equating method and was less affected by local independence violations. The SS-MIRT observed score equating method yielded lower RMSE values compared to the other methods and was found to be more robust against the violation of the local independence assumption.

References

  • Aiken, L.R. (2000). Psychological testing and assesment (10th ed.). Allyn and Bacon.
  • Aksekioğlu, B. (2017). Madde tepki kuramına dayalı test eşitleme yöntemlerinin karşılaştırılması: PISA 2012 fen testi örneği [Comparison of test equating methods based on item response theory: PISA 2012 science test sample]. [Master's Thesis, Akdeniz University]. Higher Education Institution National Thesis Center.
  • Akour, M.M.M. (2006). A comparison of various equipercentile and kernel equating methods under the random groups design. [Doctoral Dissertation, Graduate College of The University of Iowa]. ProQuest Dissertations and Theses Global.
  • Albano, A.D. (2016). equate: An R package for observed-score linking and equating. Journal of Statistical Software, 74, 1-36. https://doi.org/10.18637/jss.v074.i08
  • Angoff, W.H. (1984). Scales, norms, and equivalent scores. Educational Testing Service.
  • Aşiret, S. (2014). Küçük Örneklemlerde test eşitleme yöntemlerinin çeşitli faktörlere göre incelenmesi [Factors affecting the test equating method using small samples]. [Master's Thesis, Mersin University]. Higher Education Institution National Thesis Center.
  • Atar, B., & Yeşiltaş, G. (2017) Çok boyutlu eşitleme yöntemlerinin eşdeğer olmayan gruplarda ortak madde deseni için performanslarının incelenmesi [Investigation of the performance of multidimensional equating procedures for common-item nonequivalent groups design]. Journal of Measurement and Evaluation in Education and Psychology, 8(4), 421-434. https://doi.org/10.21031/epod.335284
  • Baker, F.B., & Al‐Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147-162.
  • Battauz, M. (2015). equateIRT: An R package for IRT test equating. Journal of Statistical Software, 68, 1-22. https://doi.org/10.18637/jss.v068.i07
  • Brossman, B.G., & Lee, W.C. (2013). Observed score and true score equating procedures for multidimensional item response theory. Applied Psychological Measurement, 37(6), 460-481. https://doi.org/10.1177/0146621613484083
  • Chalmers, R.P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of statistical Software, 48, 1-29. https://doi.org/10.18637/jss.v048.i06
  • Chalmers, R.P. (2016). mirtCAT: Computerized adaptive testing with multidimensional item response theory. Journal of statistical Software, 71(5), 1 38. https://doi.org/10.18637/jss.v071.i05
  • Chen, J. (2014). Model selection for IRT equating of testlet-based tests in the random groups design. [Doctoral Dissertation, Graduate College of The University of Iowa]. ProQuest Dissertations and Theses Global.
  • Chen, W.H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265 289. https://doi.org/10.1177/0265532220927487
  • Choi, J. (2019). Comparison of MIRT observed score equating methods under the common-item nonequivalent groups design. [Doctoral Dissertation, Graduate College of The University of Iowa]. ProQuest Dissertations and Theses Global.
  • Cook, L.L., & Eignor, D.R. (1991). An NCME module on IRT Equating methods. Educational Measurement: Issues and Practice, 10(3), 191 199. https://doi.org/10.1111/j.1745 3992.1991.tb00207.x
  • Çörtük, M. (2022). Çok kategorili puanlanan maddelerden oluşan testlerde klasik test kuramı ve madde tepki kuramına dayalı test eşitleme yöntemlerinin karşılaştırılması [Comparison of test equating methods based on classical test theory and item response theory in polytomously scored tests]. [Master's Thesis, Akdeniz University]. Higher Education Institution National Thesis Center.
  • Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Harcourt Brace Javonich College.
  • Cui, Z. (2006). Two new alternative smoothing methods in equating: The cubic B-spline presmoothing method and the direct presmoothing method. [Doctoral dissertation, Graduate College of The University of Iowa]. ProQuest Dissertations & Theses Global.
  • De Gruijter, D.N., & Leo, J.T. (2007). Statistical test theory for the behavioral sciences. Chapman and Hall/CRC.
  • DeMars, C. (2010). Item response theory. Oxford University.
  • Demir, S., & Güler, N. (2014). Ortak maddeli denk olmayan gruplar desenine ilişkin test eşitleme çalışması [Study of test equating on the common item nonequivalent group design]. International Journal of Human Sciences, 11(2), 190-208.
  • Donlon, T.F. (1984). The College Board technical handbook for the scholastic aptitude test and achievement tests. College Entrance Examination Board.
  • Finch, H., French, B.F., & Immekus, J.C. (2014). Applied psychometrics using SAS. IAP.
  • Gök, B. , & Kelecioğlu, H. (2014). Denk olmayan gruplarda ortak madde deseni kullanılarak madde tepki kuramına dayalı eşitleme yöntemlerinin karşılaştırılması [Comparison of IRT equating methods using the common-item nonequivalent groups design]. Mersin University Journal of the Faculty of Education, 10(1), 120-136
  • Gübeş, N.Ö. (2019). Test eşitlemede çok boyutluluğun eş zamanlı ve ayrı kalibrasyona etkisi [The effect of multidimensionality on concurrent and separate calibration in test equating]. Hacettepe University Journal of Education, 34(4), 1061 1074. https://doi.org/10.16986/HUJE.2019049186
  • Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22 (3), 144-149.
  • Hagge, S.L. (2010). The impact of equating method and format representation of common items on the adequacy of mixed-format test equating using nonequivalent groups. [Doctoral Dissertation, Graduate College of The University of Iowa]. ProQuest Dissertations and Theses Global.
  • Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response theory. Sage.
  • Han, T., Kolen, M., & Pohlmann, J. (1997). A comparison among IRT true-and observed-score equatings and traditional equipercentile equating. Applied Measurement in Education, 10(2), 105-121.
  • Hanson, B.A., & Béguin, A.A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied psychological measurement, 26(1), 3-24.
  • Hanson, B.A., Zeng, L., & Colton, D.A. (1994). A comparison of presmoothing and postsmoothing methods in equipercentile equating (No. 94). American College Testing Program.
  • Harris, D.J., & Crouse, J.D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6 (3),195-240. https://doi.org/10.1207/s15324818ame0603_3
  • Kahraman, N. (2013). Unidimensional interpretations for multidimensional test items. Journal of Educational Measurement, 50(2), 227-246. https://doi.org/10.1111/jedm.12012
  • Kahraman, N., & Kamata, A. (2004). Increasing the precision of subscale scores by usingout-of scale information. Applied Psychological Measurement, 28, 407 426. https://doi.org/10.1177/0146621604268736
  • Kahraman, N., & Thompson, T. (2011). Relating unidimensional IRT parameters to a multidi-mensional response space: A review of two alternative projection IRT models for subscalescores. Journal of Educational Measurement, 48, 146 164. https://doi.org/10.1111/j.1745-3984.2011.00138.x
  • Karagül, A.E. (2020). Küçük örneklemlerde çok kategorili puanlanan maddelerden oluşan testlerde klasik test eşitleme yöntemlerinin karşılaştırılması [Comparison of classical test equating methods with polytomously scored tests and small samples]. [Master’s thesis, Ankara University]. Higher Education Institution National Thesis Center.
  • Karkee, T.B., & Wright, K.R. (2004, April). Evaluation of linking methods for placing threeparameter logistic ıtem parameter estimates onto a one-parameter scale. Paper presented at the Annual Meeting of the American Educational Research Association in San Diego, California.
  • Kilmen, S. (2010). Comparison of equating errors estimated from test equation methods based on ıtem response theory according to the sample size and ability distribution. [Doctoral Dissertation, Ankara University]. Higher Education Institution National Thesis Center.
  • Kim, S.Y. (2018). Simple structure MIRT equating for multidimensional tests. [Doctoral Dissertation, Graduate College of The University of Iowa]. ProQuest Dissertations & Theses Global.
  • Kim, S.H., & Cohen, A.S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22(4), 345-355.
  • Kim, S.Y., Lee, W.C., & Kolen, M.J. (2020). Simple-structure multidimensional item response theory equating for multidimensional tests. Educational and Psychological Measurement, 80(1), 91-125. https://doi.org/10.1177/0013164419854208
  • Kline, R.B. (2015). Principles and practice of structural equation modeling. Guilford.
  • Kolen, M.J., & Brennan, R.L. (2014). Test equating, scaling, and linking: Methods and practices. Springer. https://doi.org/10.1007/978-1-4939-0317-7
  • Kolen, M.J., & Hendrickson, A.B. (2013). Scaling, norming, and equating. In K. F. Geisinger et al. (Eds.), In APA handbook of testing and assessment in psychology, Vol. 1: Test theory and testing and assessment in industrial and organizational psychology (pp. 201-222). American Psychological Association.
  • Kumlu, G. (2019). Test ve alt testlerde eşitlemenin farklı koşullar açısından incelenmesi [An investigation of test and sub-tests equating in terms of different conditions]. [Doctoral Dissertation, Hacettepe University]. Higher Education Institution National Thesis Center.
  • Lee, W.C., & Ban, J.C. (2009). A comparison of IRT linking procedures. Applied Measurement in Education, 23(1), 23-48. https://doi.org/10.1080/08957340903423537
  • Lee, W.C., & Brossman, B.G. (2012). Observed score equating for mixed-format tests using a simple-structure multidimensional IRT framework. In M. J. Kolen & W. C. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Volume 2) (CASMA Monograph No. 2.2.) Center for Advanced Studies in Measurement and Assessment, The University of Iowa. http://www.education.uiowa.edu/casma
  • Lee, G., & Lee, W.C. (2016). Bi-factor MIRT observed-score equating for mixed-format tests. Applied Measurement in Education, 29(3), 224 241. https://doi.org/10.1080/08957347.2016.1171770
  • Lee, E., Lee, W., & Brennan, R.L. (2014). Equating multidimensional tests under a random groups design: A comparison of various equating procedures. (CASMA Research Report No. 40). Iowa City, IA: Center for Advanced Studies in Measurement and Assessment, The University of Iowa. http://www.education.uiowa.edu/casma
  • Lee, G., Lee, W., Kolen, M.J., Park, I. –Y., Kim, D.I., & Yang, J.S. (2015). Bi-factor MIRT true-score equating for testlet-based tests. Journal of Educational Evaluation, 28, 681-700.
  • Li, Y.H., & Lissitz, R.W. (2000). An evaluation of the accuracy of multidimensional IRT linking. Applied Psychological Measurement, 24(2), 115-138.
  • Lim, E. (2016). Subscore equating with the random groups design. [Doctoral dissertation, Graduate College of The University of Iowa]. ProQuest Dissertations & Theses Global.
  • Liu, C., & Kolen, M.J. (2011). A comparison among IRT equating methods and traditional equating methods for mixed-format tests. Mixed-format tests: Psychometric properties with a primary focus on equating, 1, 75-94.
  • Livingston, S.A. (1993). Small-sample equating with log-linear smoothing. Journal of Educational Measurement,30(1), 23–29.
  • Livingston, S.A. (2014). Equating test scores (without IRT) (2th ed.). Educational testing service, ETS.
  • Livingston, S.A., & Kim, S. (2009). The circle-arc method for equating in small samples. Journal of Educational Measurement,46(3), 330 343. https://doi.org/10.1111/j.1745 3984.2009.00084.x
  • Lord, F.M. (1980) Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates.
  • Lord, F.M., & Novick M.R. (1968). Statistical theories of mental test scores. Addison-Wesley.
  • Loyd, B.H., & Hoover, H.D. (1980). Vertical Equating Using the Rasch Model. Journal of Educational Measurement, 17 (3), 179-193.
  • Marco, G.L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14(2), 139-160.
  • McDonald R.P. (2000). A basis for multidimensional item response theory. Applied Psychological Measurement, 24, 99-114.
  • Mutluer, C. (2021). Klasik test kuramına ve madde tepki kuramına dayalı test eşitleme yöntemlerinin karşılaştırması: Uluslararası öğrenci değerlendirme programı (PISA) 2012 matematik testi örneği [Comparison of test equating methods based on Classical Test Theory and Item Response Theory: International Student Assessment Program (PISA) 2012 mathematics test case]. [Doctoral dissertation, Gazi University]. Higher Education Institution National Thesis Center.
  • Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51(1), 1-23.
  • Öztürk, N., & Anıl, D. (2012). Akademik personel ve lisansüstü eğitimi giriş sınavı puanlarının eşitlenmesi üzerine bir çalışma [A study on equating academic staff and graduate education entrance examination scores]. Eğitim ve Bilim, 37(165), 180-193.
  • Petersen, N.S., Cook, L.L., & Stocking, M.L. (1983). IRT versus conventional equating methods: A comparative study of scale stability. Journal of Educational Statistics, 8(2), 137-156.
  • Peterson, J. (2014). Multidimensional item response theory observed score equating methods for mixed-format tests. [Doctoral dissertation, Graduate College of The University of Iowa].University of Iowa’s Institutional Repository. https://ir.uiowa.edu/cgi/viewcontent.cgi?article=5418&context=etd
  • Peterson, J., & Lee, W. (2014). Multidimensional item response theory observed score equating methods for mixed-format tests. In M. J. Kolen & W. Lee (Eds.), Mixedformat tests: Psychometric properties with a primary focus on equating (Volume 2) (CASMA Monograph No. 2.3). Center for Advanced Studies in Measurement and Assessment, The University of Iowa. http://www.education.uiowa.edu/casma
  • Powers, S.J., Hagge, S.L., Wang, W., He, Y., Liu, C., & Kolen, M.J. (2011). Effects of group differences on mixed-format equating, In M. J. Kolen & W. C. Lee (Eds.), Mixed-format tests: Psychometric properties with a primary focus on equating (Volume 1, pp. 51-73). Center for Advanced Studies in Measurement and Assessment, The University of Iowa. http://www.education.uiowa.edu/casma.
  • R Core Team (2019). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing, Vienna, Austria. http://www.R project.org/
  • Salmaner Doğan, R., & Tan, Ş. (2022). Madde tepki kuramında eşitleme hatalarının belirlenmesinde kullanılan delta ve bootstrap yöntemlerinin çeşitli değişkenlere göre incelenmesi [Investigation of delta and bootstrap methods for calculating error of test equation in IRT in terms of some variables]. Gazi University Journal of Gazi Educational Faculty (GUJGEF), 42(2), 1053-1081. https://doi.org/10.17152/gefad.913241
  • Sansivieri, V., Wiberg, M., & Matteucci, M. (2017). A review of test equating methods with a special focus on IRT based approaches. Statistica, 77(4), 329 352. https://doi.org/10.6092/issn.1973-2201/7066
  • Sass D.A., & Schmitt T.A. (2010). A comparative investigation of rotation criteria within exploratory factor analysis. Multivariate Behavioral Research, 45, 73 103. https://doi.org/10.1080/00273170903504810
  • Sireci, S.G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet‐based tests. Journal of Educational Measurement, 28(3), 237-247.
  • Skaggs, G. (2005). Accuracy of random groups equating with very small samples. Journal of Educational Measurement, 42(4), 309-330
  • Stocking, M.L., & Lord, F.M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7 (2), 201-210.
  • Tabachnick, B.G., & Fidell, L.S. (2013). Using multivariate statistics ( 6th ed.). Pearson.
  • Tan, Ş. (2015). Küçük örneklemlerde beta4 ve polynomial loglineer öndüzgünleştirme ve kübik eğri sondüzgünleştirme metotlarının uygunluğu [Accuracy of beta4 presmoothing polynomial loglineer presmoothing and cubic spline postsmoothing methods for small samples]. Gazi University Journal of Gazi Educational Faculty (GUJGEF), 35(1), 123-151.
  • Tanberkan Suna, H. (2018). Grup değişmezliği özelliğinin farklı eşitleme yöntemlerinde eşitleme fonksiyonları üzerindeki etkisi [The effect of group invariance property on equating functions obtained through various equating methods]. [Doctoral dissertation, Gazi University]. Higher Education Institution National Thesis Center.
  • Tao, W., & Cao, Y. (2016). An extension of IRT-based equating to the dichotomous testlet response theory model. Applied Measurement in Education, 29(2), 108 121. https://doi.org/10.1080/08957347.2016.1138956
  • Tsai, T.H. (1997, March). Estimating minimum sample sizes in random groups equating. Annual Meeting of the National Council on Measurement in Education, Chicago, IL.
  • Tuerlinckx, F., & De Boeck, P. (2001). The effect of ignoring item interactions on the estimated discrimination parameters in item response theory. Psychological methods, 6(2), 181.
  • Uğurlu, S. (2020). Comparison of equating methods for multidimensional tests which contain items with differential item functioning. [Doctoral dissertation, Hacettepe University]. Higher Education Institution National Thesis Center.
  • Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 Law School Admissions Test as an example. Applied Measurement in Education, 8(2), 157-86.
  • Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability?. Educational Measurement: Issues and Practice, 15(1), 22-29.
  • Wainer, H., & Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203-220.
  • Wang, S., Zhang, M., & You, S. (2020). A comparison of IRT observed score kernel equating and several equating methods. Frontiers in psychology, 11, 308.
  • Wang, X. (2012). Effect of sample size on ırt equating of uni-dimensional tests in common item non-equivalent group design: A monte carlo simulation study. [Doctoral Dissertation, Graduate College of The University of Virginia Tech]. ProQuest Dissertations and Theses Global.
  • Way, W.D., Ansley, T.N., & Forsyth, R.A. (1988). The comparative effects of compensatory and noncompensatory two-dimensional data on unidimensional IRT estimates. Applied Psychological Measurement, 12(3), 239 252. https://doi.org/10.1177/014662168801200303
  • Woodruff, D.J. (1989). A comparison of three linear equating methods for the common-item nonequivalent-populations design. Applied psychological measurement, 13(3), 257- 262.
  • Yen, W.M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145.
  • Yen, W.M. (1993). Scaling performance assessments: Strategies for managing local item independence. Journal of Educational Measurement, 30, 187–213.
  • Zhang, Z. (2010). Comparison of different equating methods and an application to link testlet-based tests. [Doctoral Dissertation, Graduate College of The University of Chinese]. ProQuest Dissertations and Theses Global.
There are 93 citations in total.

Details

Primary Language English
Subjects Measurement Theories and Applications in Education and Psychology, Similation Study
Journal Section Articles
Authors

Mehmet Fatih Doğuyurt 0000-0001-9206-3321

Şeref Tan 0000-0002-9892-3369

Early Pub Date July 21, 2025
Publication Date September 4, 2025
Submission Date October 7, 2024
Acceptance Date February 21, 2025
Published in Issue Year 2025 Volume: 12 Issue: 3

Cite

APA Doğuyurt, M. F., & Tan, Ş. (2025). Examining the impact of violations of local item independence assumption on test equating methods. International Journal of Assessment Tools in Education, 12(3), 629-661. https://doi.org/10.21449/ijate.1562627

23823             23825             23824