On the Statistical and Heuristic Difficulty Estimates of a High Stakes Test in Iran
Abstract
The findings of previous research into the
compatibility of stakeholders’ perceptions with statistical estimations of item
difficulty are not seemingly consistent. Furthermore, most research shows that
teachers’ estimation of item difficulty is not reliable since they tend to
overestimate the difficulty of easy items and underestimate the difficulty of
difficult items. Therefore, the present study aims to analyze a high stakes
test in terms of heuristic (test takers’ standpoint) and statistical difficulty
(CTT and IRT) and investigate the extent to which the findings from the two
perspectives converge. Results indicate that, 1) the whole test along
with its sub-tests is difficult which might lead to test invalidity; 2) the
respondents’ ratings of the total test in terms of difficulty level are almost
convergent with the difficulty values indicated by IRT and CTT, except for the two subtests where students underestimated the
difficulty values, and 3) CTT difficulty estimates are convergent with IRT
difficulty estimates. Therefore, it can be concluded that students’
perceptions of item difficulty might be a better estimate of test difficulty
and a combination of test takers’ perceptions and statistical difficulty might
provide a better picture of item difficulty in assessment contexts.
Keywords
References
- Alderson, J. C. (1993). Judgments in language testing. In D. Douglas & C. Chapelle (eds.), A new decade of language testing (pp. 46–57). Arlington. VA: TESOL.
- Apostolou, E. (2010). Comparing perceived and actual task and text difficulty in the assessment of listening comprehension. In Lancaster University Postgraduate Conference in Linguistics & Language Teaching (pp. 26-47).
- Bachman, L. (2002). Some reflections on task-based language performance assessment. Language Testing, 19, 453–476.
- Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford university press.
- Baker, F. (2001). The basics of item response theory., College Park: ERIC Clearinghouse on Assessment and Evaluation, University of Maryland.
- Bejar, I. (1983). Subject matter experts’ assessment of item statistics. Applied Psychological Measurement, 7, 303–310
- Bereby-Meijer, Y., Meijer, J., & Flascher, O. M. (2002). Prospect theory analysis of guessing in multiple choice tests. Journal of Behavioral Decision Making, 15, 313–327.
- Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord. & M. R. Novick (Eds.), statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Details
Primary Language
English
Subjects
Studies on Education
Journal Section
Research Article
Authors
Alireza Ahmadi
Iran
Publication Date
October 15, 2019
Submission Date
March 29, 2019
Acceptance Date
July 9, 2019
Published in Issue
Year 2019 Volume: 6 Number: 3
Cited By
Çoktan Seçmeli Maddelerde Uzmanlarca Öngörülen ve Ampirik Olarak Hesaplanan Güçlük İndekslerinin Karşılaştırılması
Journal of Computer and Education Research
https://doi.org/10.18009/jcer.1000934Dimensionality, discrimination power and difficulty of English test items: the case of graduate exam for healthcare applicants
Journal of Medical Education Development
https://doi.org/10.61186/edcj.17.55.108