Is This Reliable Enough? Examining Classification Consistency and Accuracy in a Criterion-Referenced Test
Abstract
One important step for assessing the quality of
a test is to examine the reliability of test score interpretation. Which aspect
of reliability is the most relevant depends on what type of test it is and how
the scores are to be used. For criterion-referenced tests, and in particular
certification tests, where students are classified into performance categories,
primary focus need not be on the size of error but on the impact of this error
on classification. This impact can be described in terms of classification
consistency and classification accuracy. In this article selected methods from
classical test theory for estimating classification consistency and
classification accuracy were applied to the theory part of the Swedish driving
licence test, a high-stakes criterion-referenced test which is rarely studied
in terms of reliability of classification. The results for this particular test
indicated a level of classification consistency that falls slightly short of
the recommended level which is why lengthening the test should be considered.
More evidence should also be gathered as to whether the placement of the
cut-off score is appropriate since this has implications for the validity of
classifications.
Keywords
References
- Alger, S., & Sundström, A. (2013). Agreement of driving examiners’ assessments – Evaluating the reliability of the Swedish driving test. Transportation Research Part F: Traffic Psychology and Behaviour, 19(0), 22-30. doi: http://dx.doi.org/10.1016/j.trf.2013.02.004
- American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
- Baughan, C. J., & Simpson, H. (1999). Consistency of driving performance at the time of the L-test, and implications for driver testing. In G. B. Grayson (Ed.), Behavioural Research in Road Safety IX. Crowthorne: Transport Research Laboratory.
- Berk, R. A. (1980). A Consumers' Guide to Criterion-Referenced Test Reliability. Journal of Educational Measurement, 17(4), 323-349. doi: 10.1111/j.1745-3984.1980.tb00835.x
- Brennan, R. L. (2004). Manual for BB-CLASS: A Computer Program that uses the Beta- Binomial Model for Classification Consistency and Accuracy. Version 1. (CASMA Research Report No. 9). Retrieved from the Center for Advanced Studies in Measurement http://www.education.uiowa.edu/docs/default-source/casma--- research/09casmareport.pdf?sfvrsn=2 at The University of Iowa website:
- Brennan, R. L. (Ed.) (2006). Educational measurement. (4th ed.) Westport, CT: Praeger Publishers.
- Brennan, R. L., & Wan, L. (2004). Bootstrap procedures for estimating decision consistency for single-administration complex assessments (CASMA Research Report No. 7). Iowa City: University of Iowa, Center for Advanced Studies in Measurement and Assessment. Retrieved from http://www.education.uiowa.edu/centers/casma/publications-data-file
- Breyer, F. J., & Lewis, C. (1994). Pass-Fail Reliability for Tests with Cut-Scores: A Simplified Method. ETS Research Report Series, 1994(2), i-30. doi: 10.1002/j.2333- 8504.1994.tb01612.x
Details
Primary Language
English
Subjects
Studies on Education
Journal Section
Research Article
Authors
Susanne Alger
This is me
Publication Date
July 1, 2016
Submission Date
January 15, 2016
Acceptance Date
April 10, 2016
Published in Issue
Year 2016 Volume: 3 Number: 2