Validity of Simulation Studies: A Case Research in the Context of Differential Item Functioning Detection

Özkan Saatçioğlu

doi:10.5281/zenodo.15036409

Research Article

Simülasyon Çalışmalarının Geçerliği: Değişen Madde Fonksiyonu Belirleme Çalışması Bağlamında Bir Örnek Araştırma

Year 2025, Volume: 3 Issue: 1, 24 - 40, 17.03.2025

Özkan Saatçioğlu

https://doi.org/10.5281/zenodo.15036409

Abstract

Bu çalışmanın amacı, simülasyon sürecinin beklenen şekilde gerçeğe yakın sonuçlar ortaya çıkarıp çıkarmadığını belirlemek amacıyla Değişen Madde Fonksiyonu (DMF) içeren yapay verilerin doğru bir şekilde üretilip üretilmediğine yönelik simülasyon geçerliğinin incelenmesidir. Bir referans iki odak olmak üzere üç grubun ele alındığı araştırmada, referans grubun örneklem büyüklüğü, odak grupların örneklem büyüklüğü oranları, DMF miktarı ve DMF tekniği faktörleri dikkate alınarak 2250 farklı koşul simüle edilmiştir. Veri üretim sürecinde, İki Parametreli Lojistik Model (2PLM) ile güçlük ve ayırt edicilik parametreleri için rastgele veriler oluşturulmuş ve testteki maddelerin %20'sinin DMF içermesi planlanmıştır. Simülasyonun geçerliğini test etmek amacıyla, güçlük ve ayırt edicilik parametrelerine ilişkin ortalama mutlak yanlılık ve RMSE değerleri hem madde düzeyinde hem de ilgili faktörler dikkate alınarak hesaplanmıştır. Analizler sonucunda, güçlük ve ayırt edicilik parametreleri için hesaplanan ortalama mutlak yanlılık ve RMSE değerlerinin düşük ve sıfıra yakın olduğu bulunmuştur. Bu durum kestirim hatalarının az olduğunu ve sonuçların geçerliğinin desteklendiğini ortaya koymuştur. Ayrıca referans grubun örneklem büyüklüğünün ve odak grupların örneklem büyüklüğü oranlarının hem güçlük hem de ayırt edicilik parametreleri için ortalama mutlak yanlılık ve RMSE değerleri üzerinde istatistiksel olarak manidar bir etkiye sahip olduğu belirlenmiş ve örneklem büyüklüğü arttıkça ortalama mutlak yanlılık ve RMSE değerlerinin azaldığı tespit edilmiştir. Bununla birlikte, odak gruplara eklenen DMF miktarlarının, parametre kestirimlerinin doğruluğu üzerinde anlamlı bir etki oluşturmadığı sonucuna ulaşılmıştır. Elde edilen bulgular, örneklem büyüklüğünün parametre kestirimlerinin doğruluğu üzerinde kritik bir rol oynadığını ve DMF miktarının bu süreçte anlamlı bir etki yaratmadığını ortaya koymuş ve çalışmanın bulguları alanyazındaki ilgili araştırmalar ile tutarlılık göstermiştir. Yapılan bu araştırma sonucunda DMF inceleme çalışmalarının yanı sıra psikometrinin farklı konu alanlarında yapılacak olan simülasyon çalışmalarında da simülasyonun geçerlik kanıtlarının sunulması gerektiği önerilmiştir.

Keywords

Simülasyon , Simülasyonun Geçerliği , Değişen Madde Fonksiyonu , Madde Güçlük Parametresi , Madde Ayırt Edicilik Parametresi

Ethical Statement

Ankara Üniversitesi Sosyal Bilimler Alt Etik Kurulu, 05-169, 22.04.2019

References

Alfons, A., Templ, M., & Filzmoser, P. (2010). An object-oriented framework for statistical simulation: The r package simFrame. Journal of Statistical Software, 37(3), 1-35. https://doi.org/10.18637/jss.v037.i03
Atar, B. (2007). Differential item functioning analyses for mixed response data using IRT likelihood-ratio test, logistic regression, and GLLAMM procedures [Doctoral dissertation]. Florida State University, Florida.
Berends, P., & Romme, G. (1999). Simulation as a research tool in management studies. European Management Journal, 17(6), 576-583. https://doi.org/10.1016/S0263-2373(99)00048-1
Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2002). Item parameter estimation under conditions of test speededness: Application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39(4), 331-348. https://doi.org/10.1111/j.1745-3984.2002.tb01146.x
Bulut, O., & Sünbül, Ö. (2017). Monte carlo simulation studies in item response theory with the R programming language. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 266-287. https://doi.org/10.21031/epod.305821
Choi, Y. J., & Asilkalkan, A. (2019) R packages for item response theory analysis: Descriptions and features. Measurement: Interdisciplinary Research and Perspectives, 17(3), 168-175, https://doi.org/10.1080/15366367.2019.1586404
Chung, C. A. (2004). Simulation modeling handbook: A practical approach. CRC Press.
Davis, J. P., Eisenhardt, K. M., & Bingham, C. B. (2007). Developing theory through simulation methods. Academy of Management Review, 32(2), 480-499. https://doi.org/10.5465/amr.2007.24351453
DeMars, C. E. (2003). Sample size and recovery of nominal response model item parameters. Applied Psychological Measurement, 27(4), 275-288. https://doi.org/10.1177/0146621603027004003
DeMars, C. E., & Lau, A. (2011). Differential item functioning detection with latent classes: How accurately can we detect who is responding differentially? Educational and Psychological Measurement, 71(4), 597-616. https://doi.org/10.1177/0013164411404221
Feinberg, R. A., & Rubright, J. D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36-49. https://doi.org/10.1111/emip.12111
Finch, W. H. (2016). Detection of item functioning for more than two groups: A monte carlo comparison of methods. Applied Measurement in Education, 29(1), 30-45. https://doi.org/10.1080/08957347.2015.1102916
Gao, X. (2019). A comparison of six DIF detection methods [Master thesis]. University of Connecticut Graduate School.
Gray, C. D., & Kinnear, P. R. (2012). IBM SPSS statistics 19 made simple. Psychology Press, Taylor & Francis Group.
Hallgren, K. A. (2013). Conducting simulation studies in the R programming environment. Tutorials in Quantitative Methods for Psychology, 9(2), 43-60. https://doi.org/10.20982/tqmp.09.2.p043
Happach, R. M., & Tilebein, M. (2015). Simulation as research method: Modeling social interactions in management science. In C. Misselhorn (Ed.), Collective agency and cooperation in natural and articial systems (pp. 239-259). Springer
Harwell, M. R., Kohli, N., & Peralta, Y. (2017). Experimental design and data analysis in computer simulation studies in the behavioral sciences. Journal of Modern Applied Statistical Methods, 16(2), 3-28. https://doi.org/10.22237/jmasm/1509494520
Harwell, M. R., Kohli, N., & Peralta-Torres, Y. (2018). A survey of reporting practices of computer simulation studies in statistical research. The American Statistician, 72(4), 321-327. https://doi.org/10.1080/00031305.2017.1342692
Harwell, M. R., Rubinstein, E. N., Hayes, W. S., & Olds, C. C. (1992). Summarizing monte carlo results in methodological research: The one-and two-factor fixed effects ANOVA cases. Journal of Educational Statistics, 17(4), 315-339. https://doi.org/10.2307/1165127
Harwell, M. R., Stone, C. A., Hsu, T., & Kirisci, L. (1996). Monte carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125. https://doi.org/10.1177/014662169602000201
Hulin, C. L., Lissak, R. I., & Drasgow, F. (1982). Recovery of two- and three-parameter logistic item characteristic curves: A monte carlo study. Applied Psychological Measurement, 6(3), 249-260. https://doi.org/10.1177/014662168200600301
Keppel, G., & Wickens, T. D. (2004). Design and analysis: A researcher’s handbook (4th ed.). Pearson.
Kim, J. (2010). Controlling type 1 error rate in evaluating differential item functioning for four DIF methods: Use of three procedures for adjustment of multiple item testing [Doctoral dissertation]. Georgia State University, Atlanta.
Law, A. M. (2003). How to conduct a successful simulation study. Proceedings of the 2003 Winter Simulation Conference, New Orleans, LA. U.S.A.
Li, Y., Brooks, G. P., & Johanson, G. A. (2012). Item discrimination and Type I error in the detection of differential item functioning. Educational and Psychological Measurement, 72(5), 847-861. https://doi.org/10.1177/0013164411432333
Li, Z., & Zumbo, B. D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score data. Psicológica, 30(2), 343-370. https://psycnet.apa.org/record/2009-18227-011
Liu, H., Zhang, Y., & Luo, F. (2015). Mediation analysis for ordinal outcome variables. In Millsap, Bolt, Ark & Wang, (Eds.), Quantitative psychology research (pp. 429-450). Springer International Publishing.
Lopez Rivas, G. E. (2012). Detection and classification of DIF types using parametric and nonparametric methods: A comparison of the IRT-likelihood ratio test, crossing-SIBTEST, and logistic regression procedures [Doctoral dissertation]. University of South Florida, Florida.
Magis, D., Raîche, G., Béland, S., & Gérard, P. (2011). A generalized logistic regression procedure to detect differential item functioning among multiple groups. International Journal of Testing, 11(4), 365-386. https://doi.org/10.1080/15305058.2011.602810
Mooney, C. Z. (1997). Monte carlo simulation. Sage.
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Tutorial in Biostatistics, 38(11), 2074-2102. https://doi.org/10.1002/sim.8086
Paxton, P., Curran, P. J., Bollen, K. A., Kirby, J., & Chen, F. (2001). Monte carlo experiments: Design and implementation. Structural Equation Modeling, 8(2), 287-312. https://doi.org/10.1207/S15328007SEM0802_7
R Core Team (2019). R: A language and environment for statistical computing. Vienna, Austria. URL http://www.R-project.org/
Rockoff, D. (2018). A randomization test for the detection of differential item functioning [Doctoral dissertation]. The University of Arizona, Arizona.
Rollins III, J. D. (2018). A comparison of observed score approaches to detecting differential item functioning among multiple groups [Doctoral dissertation]. The University of North Carolina at Greensboro, Greensboro.
Rubinstein, R. Y., & Kroese, D. P. (2017). Simulation and the monte carlo method. John Wiley & Sons.
Rusch, T., Mair, P., & Hatzinger, R. (2013). Psychometrics with R: A review of CRAN packages for item response theory (Discussion Paper). Center for Empirical Research Methods.
Sandilands, D. A. (2014). Accuracy of differential item functioning detection methods in structurally missing data due to booklet design [Doctoral dissertation]. The University of British Columbia, Vancouver.
Scott, L. (2014). Controlling Analytic selection of a valid subtest for DIF analysis when DIF has multiple potential causes among multiple groups [Doctoral dissertation]. Arizona State University, Arizona.
Seybert, J., & Stark, S. (2012). Iterative linking with the differential functioning of items and tests (DFIT) method: Comparison of testwide and item parameter replication (IPR) critical values. Applied Psychological Measurement, 36(6), 494-515. https://doi.org/10.1177/0146621612445182
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with monte carlo simulation. Journal of Statistics Education, 24(3), 136-156. https://doi.org/10.1080/10691898.2016.1246953
Socha, A., DeMars, C. E., Zilberberg, A., & Phan, H. (2015). Differential item functioning detection with the Mantel-Haenszel procedure: The effects of matching types and other factors. International Journal of Testing, 15(3), 193-215. https://doi.org/10.1080/15305058.2014.984066
Spence, I. (1983). Monte carlo simulation studies. Applied Psychological Measurement, 7(4), 405-425. https://doi.org/10.1177/014662168300700403
Svetina, D., & Rutkowski, L. (2014). Detecting differential item functioning using generelized logistic regression in the context of large-scale assessments. Large Scale Assessments in Education, 2(4), 1-17. https://doi.org/10.1186/s40536-014-0004-5
Tay, L., Meade, A. W., & Cao, M. (2015). An overview and practical guide to IRT measurement equivalence analysis. Organizational Research Methods, 18(1), 3-46. https://doi.org/10.1177/1094428114553062
Tureson, K., & Odland, A. (2018). Monte Carlo simulation studies. In Bruce B. Frey (Ed.). The SAGE encylopedia of educational research, measurement, and evaluation (pp. 1085-1089). SAGE Publications, Inc.
Wang, W. C., & Chen, C. T. (2005). Item parameter recovery, standard error estimates, and fit statistics of the winsteps program for the family of Rasch models. Educational and Psychological Measurement, 65(3), 376-404. https://doi.org/10.1177/0013164404268673
Wen, Y. (2014). DIF analyses in multilevel data: Identification and effects on ability estimates [Doctoral dissertation]. The University of Wisconsin, Milwaukee.
Wood, W. S. (2011). Differential item functioning procedures for polytomous items when examinee sample sizes are small [Doctoral dissertation]. Graduate College of The University of Iowa, Iowa.
Woods, C. M. (2009). Evaluation of MIMIC-model methods for DIF testing with comparison to two-group analysis. Multivariate Behavioral Research, 44(1), 1-27. https://doi.org/10.1080/00273170802620121
Yuan, K.-H., Tong, X., & Zhang, Z. (2015). Bias and efficiency for SEM with missing data and auxiliary variables: Two-stage robust method versus two-stage ML. Structural Equation Modeling: A Multidisciplinary Journal, 22(2), 178-192. https://doi.org/10.1080/10705511.2014.935750

Validity of Simulation Studies: A Case Research in the Context of Differential Item Functioning Detection

Year 2025, Volume: 3 Issue: 1, 24 - 40, 17.03.2025

Özkan Saatçioğlu

https://doi.org/10.5281/zenodo.15036409

Abstract

The aim of this study is to examine the simulation validity by determining whether the simulation process produces results that are realistically close to expectations, through the generation of artificial data containing Differential Item Functioning (DIF) and assessing whether the data were accurately generated. In the study, which involves one reference group and two focal groups, 2250 different conditions were simulated by considering factors such as the sample size of the reference group, the sample size ratios of the focal groups, the amount of DIF, and the DIF technique. During the data generation process, random data for difficulty and discrimination parameters were generated using the Two-Parameter Logistic Model (2PLM), and it was planned that 20% of the items in the test would contain DIF. To test the validity of the simulation, mean absolute bias and RMSE values for the difficulty and discrimination parameters were calculated both at the item level and by considering the relevant factors. The analysis results revealed that the mean absolute bias and RMSE values calculated for the difficulty and discrimination parameters were low and close to zero. This indicates that estimation errors were minimal and supports the validity of the results. Additionally, it was found that the sample size of the reference group and the sample size ratios of the focal groups had a statistically significant effect on the mean absolute bias and RMSE values for both difficulty and discrimination parameters, and it was observed that as the sample size increased, the mean absolute bias and RMSE values decreased. However, it was concluded that the amount of DIF added to the focal groups did not have a significant effect on the accuracy of parameter estimations. The findings demonstrate that sample size plays a critical role in the accuracy of parameter estimations, while the amount of DIF does not significantly impact this process, and the results of the study are consistent with relevant research in the literature. As a result of this research, it has been recommended that validity evidence for the simulation should be provided not only in DIF investigation studies but also in simulation studies conducted in various subject areas within the field of psychometrics.

Keywords

Simulation , Validity of Simulation , Differential Item Functioning , Item Difficulty Parameter , Item Discrimination Parameter

Ethical Statement

Ankara University Social Sciences Sub-Ethics Committee, 05-169, 22.04.2019

References

Alfons, A., Templ, M., & Filzmoser, P. (2010). An object-oriented framework for statistical simulation: The r package simFrame. Journal of Statistical Software, 37(3), 1-35. https://doi.org/10.18637/jss.v037.i03
Atar, B. (2007). Differential item functioning analyses for mixed response data using IRT likelihood-ratio test, logistic regression, and GLLAMM procedures [Doctoral dissertation]. Florida State University, Florida.
Berends, P., & Romme, G. (1999). Simulation as a research tool in management studies. European Management Journal, 17(6), 576-583. https://doi.org/10.1016/S0263-2373(99)00048-1
Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2002). Item parameter estimation under conditions of test speededness: Application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39(4), 331-348. https://doi.org/10.1111/j.1745-3984.2002.tb01146.x
Bulut, O., & Sünbül, Ö. (2017). Monte carlo simulation studies in item response theory with the R programming language. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 266-287. https://doi.org/10.21031/epod.305821
Choi, Y. J., & Asilkalkan, A. (2019) R packages for item response theory analysis: Descriptions and features. Measurement: Interdisciplinary Research and Perspectives, 17(3), 168-175, https://doi.org/10.1080/15366367.2019.1586404
Chung, C. A. (2004). Simulation modeling handbook: A practical approach. CRC Press.
Davis, J. P., Eisenhardt, K. M., & Bingham, C. B. (2007). Developing theory through simulation methods. Academy of Management Review, 32(2), 480-499. https://doi.org/10.5465/amr.2007.24351453
DeMars, C. E. (2003). Sample size and recovery of nominal response model item parameters. Applied Psychological Measurement, 27(4), 275-288. https://doi.org/10.1177/0146621603027004003
DeMars, C. E., & Lau, A. (2011). Differential item functioning detection with latent classes: How accurately can we detect who is responding differentially? Educational and Psychological Measurement, 71(4), 597-616. https://doi.org/10.1177/0013164411404221
Feinberg, R. A., & Rubright, J. D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36-49. https://doi.org/10.1111/emip.12111
Finch, W. H. (2016). Detection of item functioning for more than two groups: A monte carlo comparison of methods. Applied Measurement in Education, 29(1), 30-45. https://doi.org/10.1080/08957347.2015.1102916
Gao, X. (2019). A comparison of six DIF detection methods [Master thesis]. University of Connecticut Graduate School.
Gray, C. D., & Kinnear, P. R. (2012). IBM SPSS statistics 19 made simple. Psychology Press, Taylor & Francis Group.
Hallgren, K. A. (2013). Conducting simulation studies in the R programming environment. Tutorials in Quantitative Methods for Psychology, 9(2), 43-60. https://doi.org/10.20982/tqmp.09.2.p043
Happach, R. M., & Tilebein, M. (2015). Simulation as research method: Modeling social interactions in management science. In C. Misselhorn (Ed.), Collective agency and cooperation in natural and articial systems (pp. 239-259). Springer
Harwell, M. R., Kohli, N., & Peralta, Y. (2017). Experimental design and data analysis in computer simulation studies in the behavioral sciences. Journal of Modern Applied Statistical Methods, 16(2), 3-28. https://doi.org/10.22237/jmasm/1509494520
Harwell, M. R., Kohli, N., & Peralta-Torres, Y. (2018). A survey of reporting practices of computer simulation studies in statistical research. The American Statistician, 72(4), 321-327. https://doi.org/10.1080/00031305.2017.1342692
Harwell, M. R., Rubinstein, E. N., Hayes, W. S., & Olds, C. C. (1992). Summarizing monte carlo results in methodological research: The one-and two-factor fixed effects ANOVA cases. Journal of Educational Statistics, 17(4), 315-339. https://doi.org/10.2307/1165127
Harwell, M. R., Stone, C. A., Hsu, T., & Kirisci, L. (1996). Monte carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125. https://doi.org/10.1177/014662169602000201
Hulin, C. L., Lissak, R. I., & Drasgow, F. (1982). Recovery of two- and three-parameter logistic item characteristic curves: A monte carlo study. Applied Psychological Measurement, 6(3), 249-260. https://doi.org/10.1177/014662168200600301
Keppel, G., & Wickens, T. D. (2004). Design and analysis: A researcher’s handbook (4th ed.). Pearson.
Kim, J. (2010). Controlling type 1 error rate in evaluating differential item functioning for four DIF methods: Use of three procedures for adjustment of multiple item testing [Doctoral dissertation]. Georgia State University, Atlanta.
Law, A. M. (2003). How to conduct a successful simulation study. Proceedings of the 2003 Winter Simulation Conference, New Orleans, LA. U.S.A.
Li, Y., Brooks, G. P., & Johanson, G. A. (2012). Item discrimination and Type I error in the detection of differential item functioning. Educational and Psychological Measurement, 72(5), 847-861. https://doi.org/10.1177/0013164411432333
Li, Z., & Zumbo, B. D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score data. Psicológica, 30(2), 343-370. https://psycnet.apa.org/record/2009-18227-011
Liu, H., Zhang, Y., & Luo, F. (2015). Mediation analysis for ordinal outcome variables. In Millsap, Bolt, Ark & Wang, (Eds.), Quantitative psychology research (pp. 429-450). Springer International Publishing.
Lopez Rivas, G. E. (2012). Detection and classification of DIF types using parametric and nonparametric methods: A comparison of the IRT-likelihood ratio test, crossing-SIBTEST, and logistic regression procedures [Doctoral dissertation]. University of South Florida, Florida.
Magis, D., Raîche, G., Béland, S., & Gérard, P. (2011). A generalized logistic regression procedure to detect differential item functioning among multiple groups. International Journal of Testing, 11(4), 365-386. https://doi.org/10.1080/15305058.2011.602810
Mooney, C. Z. (1997). Monte carlo simulation. Sage.
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Tutorial in Biostatistics, 38(11), 2074-2102. https://doi.org/10.1002/sim.8086
Paxton, P., Curran, P. J., Bollen, K. A., Kirby, J., & Chen, F. (2001). Monte carlo experiments: Design and implementation. Structural Equation Modeling, 8(2), 287-312. https://doi.org/10.1207/S15328007SEM0802_7
R Core Team (2019). R: A language and environment for statistical computing. Vienna, Austria. URL http://www.R-project.org/
Rockoff, D. (2018). A randomization test for the detection of differential item functioning [Doctoral dissertation]. The University of Arizona, Arizona.
Rollins III, J. D. (2018). A comparison of observed score approaches to detecting differential item functioning among multiple groups [Doctoral dissertation]. The University of North Carolina at Greensboro, Greensboro.
Rubinstein, R. Y., & Kroese, D. P. (2017). Simulation and the monte carlo method. John Wiley & Sons.
Rusch, T., Mair, P., & Hatzinger, R. (2013). Psychometrics with R: A review of CRAN packages for item response theory (Discussion Paper). Center for Empirical Research Methods.
Sandilands, D. A. (2014). Accuracy of differential item functioning detection methods in structurally missing data due to booklet design [Doctoral dissertation]. The University of British Columbia, Vancouver.
Scott, L. (2014). Controlling Analytic selection of a valid subtest for DIF analysis when DIF has multiple potential causes among multiple groups [Doctoral dissertation]. Arizona State University, Arizona.
Seybert, J., & Stark, S. (2012). Iterative linking with the differential functioning of items and tests (DFIT) method: Comparison of testwide and item parameter replication (IPR) critical values. Applied Psychological Measurement, 36(6), 494-515. https://doi.org/10.1177/0146621612445182
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with monte carlo simulation. Journal of Statistics Education, 24(3), 136-156. https://doi.org/10.1080/10691898.2016.1246953
Socha, A., DeMars, C. E., Zilberberg, A., & Phan, H. (2015). Differential item functioning detection with the Mantel-Haenszel procedure: The effects of matching types and other factors. International Journal of Testing, 15(3), 193-215. https://doi.org/10.1080/15305058.2014.984066
Spence, I. (1983). Monte carlo simulation studies. Applied Psychological Measurement, 7(4), 405-425. https://doi.org/10.1177/014662168300700403
Svetina, D., & Rutkowski, L. (2014). Detecting differential item functioning using generelized logistic regression in the context of large-scale assessments. Large Scale Assessments in Education, 2(4), 1-17. https://doi.org/10.1186/s40536-014-0004-5
Tay, L., Meade, A. W., & Cao, M. (2015). An overview and practical guide to IRT measurement equivalence analysis. Organizational Research Methods, 18(1), 3-46. https://doi.org/10.1177/1094428114553062
Tureson, K., & Odland, A. (2018). Monte Carlo simulation studies. In Bruce B. Frey (Ed.). The SAGE encylopedia of educational research, measurement, and evaluation (pp. 1085-1089). SAGE Publications, Inc.
Wang, W. C., & Chen, C. T. (2005). Item parameter recovery, standard error estimates, and fit statistics of the winsteps program for the family of Rasch models. Educational and Psychological Measurement, 65(3), 376-404. https://doi.org/10.1177/0013164404268673
Wen, Y. (2014). DIF analyses in multilevel data: Identification and effects on ability estimates [Doctoral dissertation]. The University of Wisconsin, Milwaukee.
Wood, W. S. (2011). Differential item functioning procedures for polytomous items when examinee sample sizes are small [Doctoral dissertation]. Graduate College of The University of Iowa, Iowa.
Woods, C. M. (2009). Evaluation of MIMIC-model methods for DIF testing with comparison to two-group analysis. Multivariate Behavioral Research, 44(1), 1-27. https://doi.org/10.1080/00273170802620121
Yuan, K.-H., Tong, X., & Zhang, Z. (2015). Bias and efficiency for SEM with missing data and auxiliary variables: Two-stage robust method versus two-stage ML. Structural Equation Modeling: A Multidisciplinary Journal, 22(2), 178-192. https://doi.org/10.1080/10705511.2014.935750

There are 51 citations in total.

Details

Primary Language	English
Subjects	Testing, Assessment and Psychometrics (Other)
Journal Section	Research Articles
Authors	Özkan Saatçioğlu 0000-0001-8131-9619
Early Pub Date	March 17, 2025
Publication Date	March 17, 2025
Submission Date	February 14, 2025
Acceptance Date	March 14, 2025
Published in Issue	Year 2025 Volume: 3 Issue: 1

Cite

APA	Saatçioğlu, Ö. (2025). Validity of Simulation Studies: A Case Research in the Context of Differential Item Functioning Detection. Journal of Psychometric Research, 3(1), 24-40. https://doi.org/10.5281/zenodo.15036409

Download Cover Image

Article Files

Full Text

Journal of Psychometric Research is licensed under a Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0).

30434