Research Article

Is This Reliable Enough? Examining Classification Consistency and Accuracy in a Criterion-Referenced Test

Volume: 3 Number: 2 July 1, 2016
  • Susanne Alger
EN TR

Is This Reliable Enough? Examining Classification Consistency and Accuracy in a Criterion-Referenced Test

Abstract

One important step for assessing the quality of a test is to examine the reliability of test score interpretation. Which aspect of reliability is the most relevant depends on what type of test it is and how the scores are to be used. For criterion-referenced tests, and in particular certification tests, where students are classified into performance categories, primary focus need not be on the size of error but on the impact of this error on classification. This impact can be described in terms of classification consistency and classification accuracy. In this article selected methods from classical test theory for estimating classification consistency and classification accuracy were applied to the theory part of the Swedish driving licence test, a high-stakes criterion-referenced test which is rarely studied in terms of reliability of classification. The results for this particular test indicated a level of classification consistency that falls slightly short of the recommended level which is why lengthening the test should be considered. More evidence should also be gathered as to whether the placement of the cut-off score is appropriate since this has implications for the validity of classifications. 

Keywords

References

  1. Alger, S., & Sundström, A. (2013). Agreement of driving examiners’ assessments – Evaluating the reliability of the Swedish driving test. Transportation Research Part F: Traffic Psychology and Behaviour, 19(0), 22-30. doi: http://dx.doi.org/10.1016/j.trf.2013.02.004
  2. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
  3. Baughan, C. J., & Simpson, H. (1999). Consistency of driving performance at the time of the L-test, and implications for driver testing. In G. B. Grayson (Ed.), Behavioural Research in Road Safety IX. Crowthorne: Transport Research Laboratory.
  4. Berk, R. A. (1980). A Consumers' Guide to Criterion-Referenced Test Reliability. Journal of Educational Measurement, 17(4), 323-349. doi: 10.1111/j.1745-3984.1980.tb00835.x
  5. Brennan, R. L. (2004). Manual for BB-CLASS: A Computer Program that uses the Beta- Binomial Model for Classification Consistency and Accuracy. Version 1. (CASMA Research Report No. 9). Retrieved from the Center for Advanced Studies in Measurement http://www.education.uiowa.edu/docs/default-source/casma--- research/09casmareport.pdf?sfvrsn=2 at The University of Iowa website:
  6. Brennan, R. L. (Ed.) (2006). Educational measurement. (4th ed.) Westport, CT: Praeger Publishers.
  7. Brennan, R. L., & Wan, L. (2004). Bootstrap procedures for estimating decision consistency for single-administration complex assessments (CASMA Research Report No. 7). Iowa City: University of Iowa, Center for Advanced Studies in Measurement and Assessment. Retrieved from http://www.education.uiowa.edu/centers/casma/publications-data-file
  8. Breyer, F. J., & Lewis, C. (1994). Pass-Fail Reliability for Tests with Cut-Scores: A Simplified Method. ETS Research Report Series, 1994(2), i-30. doi: 10.1002/j.2333- 8504.1994.tb01612.x

Details

Primary Language

English

Subjects

Studies on Education

Journal Section

Research Article

Authors

Susanne Alger This is me

Publication Date

July 1, 2016

Submission Date

January 15, 2016

Acceptance Date

April 10, 2016

Published in Issue

Year 2016 Volume: 3 Number: 2

APA
Alger, S. (2016). Is This Reliable Enough? Examining Classification Consistency and Accuracy in a Criterion-Referenced Test. International Journal of Assessment Tools in Education, 3(2), 137-150. https://doi.org/10.21449/ijate.245198
AMA
1.Alger S. Is This Reliable Enough? Examining Classification Consistency and Accuracy in a Criterion-Referenced Test. Int. J. Assess. Tools Educ. 2016;3(2):137-150. doi:10.21449/ijate.245198
Chicago
Alger, Susanne. 2016. “Is This Reliable Enough? Examining Classification Consistency and Accuracy in a Criterion-Referenced Test”. International Journal of Assessment Tools in Education 3 (2): 137-50. https://doi.org/10.21449/ijate.245198.
EndNote
Alger S (July 1, 2016) Is This Reliable Enough? Examining Classification Consistency and Accuracy in a Criterion-Referenced Test. International Journal of Assessment Tools in Education 3 2 137–150.
IEEE
[1]S. Alger, “Is This Reliable Enough? Examining Classification Consistency and Accuracy in a Criterion-Referenced Test”, Int. J. Assess. Tools Educ., vol. 3, no. 2, pp. 137–150, July 2016, doi: 10.21449/ijate.245198.
ISNAD
Alger, Susanne. “Is This Reliable Enough? Examining Classification Consistency and Accuracy in a Criterion-Referenced Test”. International Journal of Assessment Tools in Education 3/2 (July 1, 2016): 137-150. https://doi.org/10.21449/ijate.245198.
JAMA
1.Alger S. Is This Reliable Enough? Examining Classification Consistency and Accuracy in a Criterion-Referenced Test. Int. J. Assess. Tools Educ. 2016;3:137–150.
MLA
Alger, Susanne. “Is This Reliable Enough? Examining Classification Consistency and Accuracy in a Criterion-Referenced Test”. International Journal of Assessment Tools in Education, vol. 3, no. 2, July 2016, pp. 137-50, doi:10.21449/ijate.245198.
Vancouver
1.Susanne Alger. Is This Reliable Enough? Examining Classification Consistency and Accuracy in a Criterion-Referenced Test. Int. J. Assess. Tools Educ. 2016 Jul. 1;3(2):137-50. doi:10.21449/ijate.245198

Cited By

23823             23825             23824