The findings of previous research into the
compatibility of stakeholders’ perceptions with statistical estimations of item
difficulty are not seemingly consistent. Furthermore, most research shows that
teachers’ estimation of item difficulty is not reliable since they tend to
overestimate the difficulty of easy items and underestimate the difficulty of
difficult items. Therefore, the present study aims to analyze a high stakes
test in terms of heuristic (test takers’ standpoint) and statistical difficulty
(CTT and IRT) and investigate the extent to which the findings from the two
perspectives converge. Results indicate that, 1) the whole test along
with its sub-tests is difficult which might lead to test invalidity; 2) the
respondents’ ratings of the total test in terms of difficulty level are almost
convergent with the difficulty values indicated by IRT and CTT, except for the two subtests where students underestimated the
difficulty values, and 3) CTT difficulty estimates are convergent with IRT
difficulty estimates. Therefore, it can be concluded that students’
perceptions of item difficulty might be a better estimate of test difficulty
and a combination of test takers’ perceptions and statistical difficulty might
provide a better picture of item difficulty in assessment contexts.
Classical true score theory Heuristic difficulty High stakes test Item response theory Statistical difficulty
The findings of previous research into the compatibility of stakeholders’ perceptions with statistical estimations of item difficulty are not seemingly consistent. Furthermore, most research shows that teachers’ estimation of item difficulty is not reliable since they tend to overestimate the difficulty of easy items and underestimate the difficulty of difficult items. Therefore, the present study aims to analyze a high stakes test in terms of heuristic (test takers’ standpoint) and statistical difficulty (CTT and IRT) and investigate the extent to which the findings from the two perspectives converge. Results indicate that, 1) the whole test along with its sub-tests is difficult which might lead to test invalidity; 2) the respondents’ ratings of the total test in terms of difficulty level are almost convergent with the difficulty values indicated by IRT and CTT, except for the two subtests where students underestimated the difficulty values, and 3) CTT difficulty estimates are convergent with IRT difficulty estimates. Therefore, it can be concluded that students’ perceptions of item difficulty might be a better estimate of test difficulty and a combination of test takers’ perceptions and statistical difficulty might provide a better picture of item difficulty in assessment contexts.
Classical true score theory Heuristic difficulty High stakes test Item response theory Statistical difficulty
Primary Language | English |
---|---|
Subjects | Studies on Education |
Journal Section | Articles |
Authors | |
Publication Date | October 15, 2019 |
Submission Date | March 29, 2019 |
Published in Issue | Year 2019 Volume: 6 Issue: 3 |