This study compares the performance of the Multidimensional Item Response Theory (MIRT), Higher-Order IRT (HO-IRT), and Bifactor models for the simultaneous estimation of total and subscale scores in multidimensional tests. Using both simulated data and real data from an English proficiency exam, model performance was evaluated in terms of accuracy (RMSE), reliability, and classification accuracy. The simulation included 5,000 respondents, 120 items, and a four-dimensional structure, manipulating item format, test difficulty, and inter-dimensional correlation. Results indicated that MIRT consistently outperformed the other models, yielding the lowest RMSE and highest reliability and classification accuracy across conditions. HO-IRT also showed strong performance, while the Bifactor model underperformed, particularly in subscore estimation. Model performance was sensitive to test characteristics and dimensional relationships. Findings from the real data analysis supported the simulation results, underscoring the value of multidimensional modeling for diagnostic feedback and informed decision-making.
Subscores total score MIRT Higher-Order IRT Bifactor model reliability classification accuracy
| Primary Language | English |
|---|---|
| Subjects | Item Response Theory, Testing, Assessment and Psychometrics (Other) |
| Journal Section | Research Article |
| Authors | |
| Submission Date | July 24, 2025 |
| Acceptance Date | November 30, 2025 |
| Publication Date | December 31, 2025 |
| Published in Issue | Year 2025 Volume: 16 Issue: 4 |