An Evaluation of Pass/Fail Decisions through Norm- and Criterion-Referenced Assessments
Year 2021,
Volume: 8 Issue: 1, 9 - 20, 15.03.2021
Ismail Cuhadar
,
Selahattin Gelbal
Abstract
The institutions in education use various assessment methods to decide on the proficiency levels of students in a particular construct. This study investigated whether the decisions differed based on the type of assessment: norm- and criterion-referenced assessment. An achievement test with 20 multiple-choice items was administered to 107 students in guidance and psychological counseling department to assess their mastery in the course of measurement and evaluation. First, the raw scores were transformed into T-scores for the decisions from norm-referenced assessments. Two decisions were made to classify students as passed/failed comparing each student’s T-score with two common cutoffs in education: 50 and 60. Second, two standard-setting methods (i.e., Angoff and Nedelsky) were conducted to get two cut scores for the criterion-referenced assessment with the help of experts in measurement and evaluation. Two more decisions were made on the classification of students into pass/fail group by comparing the raw scores and the cut scores from two standard-setting methods. The proportions of students in pass/fail categories were found to be statistically different across each pair of four decisions from norm- and criterion-referenced assessments. Cohen’s Kappa showed that the decisions based on Nedelsky method indicated a moderate agreement with the pass/fail decisions from the students’ semester scores in measurement and evaluation while the other three decisions showed a lower agreement. Therefore, the criterion-referenced assessment with Nedelsky method might be considered in making pass/fail decisions in education due to its criterion validity from the agreement with the semester scores.
References
- Aiken, L. R. (2000). Psychological testing and assessment. Boston: Allyn and Bacon.
- Akdeniz University. (2017, September 17). Akdeniz Üniversitesi Ön Lisans ve Lisans Eğitim-Öğretim ve Sınav Yönetmeliği [Akdeniz University Regulations for Associate and Undergraduate Degree Education and Examinations]. Retrieved May 27, 2020, from http://oidb.akdeniz.edu.tr/wp-content/uploads/2017/02/Akdeniz-Üniversitesi-Ön-Lisans-ve-Lisans-Eğitim-Öğretim-ve-Sınav-Yönetmeliği-17.09.2017.pdf
- Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational Measurement. Washington, DC: American Council on Education.
- Ankara University. (2018, September 4). Ankara Üniversitesi Ön Lisans ve Lisans Eğitim-Öğretim Yönetmeliği [Ankara University Regulations for Associate and Undergraduate Degree Education and Examinations]. Retrieved May 27, 2020, from http://oidb.ankara.edu.tr/files/2018/04/ÖN-LİSANS-VE-LİSANS-EĞİTİM-ÖĞRETİM-YÖNETMELİĞİ.pdf
- Arrasmith, D. G. (1986). Investigation of judges’ errors in Angoff and contrasting groups cut of score methods [Doctoral dissertation, University of Massachusetts]. ProQuest Dissertations and Theses.
- Baykul, Y. (2010). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması [Measurement in Education and Psychology: Classical test theory and applications]. Ankara: Pegem Yayıncılık.
- Berk, R. A. (1976). Determination of optimal cutting scores in criterion-referenced measurement. Journal of Experimental Education, 45(2), 4-9.
- Calmorin, L. P., & Calmorin, M. A. (2007). Research methods and thesis writing. Manila: Rex Book Store.
- Cizek, G. J. (1993). Reconsidering standards and criteria. Journal of Educational Measurement, 30(2), 93-106.
- Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests. Thousands Oaks, CA: Sage Publications.
- Clark-Carter, D. (2005). Quantitative psychological research: a student handbook. New York, NY: Psychology Press.
- Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37-46.
- Crocker, L., & Algina, J. (2008). Introduction to classical and modern test theory. Ohio: Cengage Learning.
- Ebel, R. L. (1972). Essentials of educational measurement. Englewood Cliffs, NJ: Prentice Hall.
- Erciyes University. (2015, December 27). Erciyes Üniversitesi Ön Lisans ve Lisans Eğitim-Öğretim Yönetmeliği [Erciyes University Regulations for Associate and Undergraduate Degree Education and Examinations]. Retrieved May 27, 2020, from https://www.erciyes.edu.tr/kategori/ERU-E-BELGE/Yonetmelikler/131/136
- Ertürk, S. (1998). Eğitimde program geliştirme [Program development in education]. Ankara: Meteksan.
- Freeman, L., & Miller, A. (2001). Norm-referenced, criterion-referenced, and dynamic assessment: What exactly is the point? Educational Psychology in Practice, 17(1), 3-16.
- Çetin, S., & Gelbal, S. (2010). Impact of standard setting methodologies over passing scores. Ankara University, Journal of Faculty of Educational Sciences, 43(1), 79–95.
- Impara, J. C., & Plake, B. S. (1997). Standard setting: An alternative approach. Journal of Educational Measurement, 34(4), 353-366.
- Jaeger, R. M. (1989). Certification of student competence. In R. L. Linn (Ed.), Educational measurement (3rd. ed.; pp. 485-514). Washington, DC: American Council on Education and National Council on Measurement in Education.
- Jacobson, R. Y. (2008). Examination of the potential of selected norm-referenced tests and selected locally developed criterion-referenced tests to classify students into performance categories [Doctoral dissertation, University of Nebraska]. ProQuest Dissertations and Theses.
- Jekel, J. F. (2007). Epidemiology, biostatistics and preventive medicine. Philadelphia: Saunders/Elsevier.
- Johnson, D. L., & Martin, S. (1980). Criterion-referenced testing: New wine in old bottles. Academic Therapy, 16(2), 167 - 173.
- Kendall, M. G., & Smith, B. B. (1939). The problem of m rankings. The Annals of Mathematical Statistics, 10(3), 275-287.
- Kline, P. (2000). Handbook of psychological testing. London and New York: Routledge.
- Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174.
- Livingston, S. A., & Zieky, M. J. (1989). A comparative study of standard - setting methods. Applied Measurement in Education, 2(2), 121–141.
- McCauley, R. J., & Swisher, L. (1984). Use and misuse of norm-referenced tests in clinical assessment: A hypothetical case. Journal of Speech and Hearing Disorders, 49(4), 338-348.
- McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia Medica, 22(3), 276-282.
- Microsoft Corporation. (2013). Microsoft Excel. Retrieved from https://office.microsoft.com/excel
- Milli Eğitim Bakanlığı. (2014, July 26). Milli Eğitim Bakanlığı okul öncesi eğitim ve ilköğretim kurumları yönetmeliği [Ministry of National Education regulations for preschool and primary school education institutions]. Retrieved May 27, 2020, from http://mevzuat.meb.gov.tr/dosyalar/1703.pdf
- Milli Eğitim Bakanlığı. (2016, October 28). Milli Eğitim Bakanlığı ortaöğretim kurumları yönetmeliği [Ministry of National Education regulations for secondary school education institutions]. Retrieved May 27, 2020, from https://ogm.meb.gov.tr/meb_iys_dosyalar/ 2016_11/03111224_ooky.pdf
- Mohr, A. K. (2006). Criterion referenced and norm referenced predictors of student achievement: Teacher perceptions of, and correlations between, Iowa test of basic skills and the palmetto achievement challenge test [Doctoral dissertation, University of South Carolina]. ProQuest Dissertations and Theses.
- Montgomery, P. C., & Connolly, B. H. (1987). Norm-referenced and criterion referenced tests: Use in pediatrics and application to task analysis of motor skill. Physical Therapy, 67(12), 1873-1876.
- Murphy, K. R., & Davidshofer, C. O. (2005). Psychological testing: Principles and applications. New Jersey: Pearson.
- Nartgün, Z. (2007). Aynı puanlar üzerinden yapılan mutlak ve bağıl değerlendirme uygulamalarının notlarda farklılık oluşturup oluşturmadığına ilişkin bir inceleme [An investigation on whether the applications of the norm- and criterion-referenced assessments over the same scores make a difference in grading]. Ege Eğitim Dergisi, 8(1), 19-40.
- Nedelsky, L. (1954). Absolute grading standards for objective tests. Educational and Psychological Measurement, 14(1), 3–19.
- Oescher, J., Kirby, P. C., & Paradise, L. V. (1992). Validating state-mandating criterion-referenced achievement tests with norm-referenced test results for elementary and secondary students. Journal of Experimental Education, 60(2), 141-150.
- Paura, L., & Arhipova, I. (2014). Cause analysis of students’ dropout rate in higher education study program. Procedia-Social and Behavioral Sciences, 109, 1282-1286.
- Pester, A. M. (2003). Language intervention effects of norm-referenced and criterion referenced test scores [Master’s thesis, Miami University]. https://etd.ohiolink.edu/!etd.send_file?accession=miami1050351250&disposition=inline
- Sakarya University. (2019, April 18). Sakarya Üniversitesi Ön Lisans ve Lisans Eğitim-Öğretim ve Sınav Yönetmeliği [Sakarya University Regulations for Associate and Undergraduate Degree Education and Examinations]. Retrieved May 27, 2020, from https://www.sakarya.edu.tr/yeni-lisans-ve-onlisans-egitim-ogretim-ve-sinav-yonetmeligi-d330.html
- SPSS, Inc. (2009). PASW Statistics for Windows (Version 18.0) [Computer Program]. Chicago: SPSS Inc.
- Sönmez, V., & Alacapınar, F. G. (2011). Örneklendirilmiş bilimsel araştırma yöntemleri [Scientific research methods with examples]. Ankara: Anı Yayıncılık.
- Toprakçı, E., Baydemir, G., Koçak, A., & Akkuş, Ö. (2007, September). Eğitim fakültelerinin eğitim-öğretim ve sınav yönetmeliklerinin karşılaştırılması [A comparison of regulations for education and examinations in faculty of education]. Paper presented at the meeting of 16. Ulusal Eğitim Bilimleri Kongresi, Tokat, Turkey.
- Turgut, M. F., & Baykul, Y. (2010). Eğitimde ölçme ve değerlendirme [Measurement and evaluation in education]. Ankara: Pegem Yayıncılık.
- Tyler, R. W. (2013). Basic principles of curriculum and instruction. Chicago: The University of Chicago Press.
- Urbina, S. (2004). Essentials of psychological testing. New York: Wiley
- Visintainer, C. (2002). The relationship between two state-mandated, standardized tests using norm-referenced Terranova and the criteria-referenced, performance assessment developed for the Maryland school performance assessment program [Doctoral dissertation, Wilmington College]. ProQuest Dissertations and Theses.
- Vossensteyn, J. J., Kottmann, A., Jongbloed, B. W., Kaiser, F., Cremonini, L., Stensaker, B., Hovdhaugen, E., & Wollscheid, S. (2015). Dropout and completion in higher education in Europe: Main report.
- Yıldırım, C. (2011). Bilim felsefesi [Philosophy of science]. İstanbul: Remzi Kitabevi.
- Zieky, M. J., & Livingston, S. A. (1977). Manual for setting standards on the Basic Skills Assessment Tests. Princeton, NJ: Educational Testing Service.
An Evaluation of Pass/Fail Decisions through Norm- and Criterion-Referenced Assessments
Year 2021,
Volume: 8 Issue: 1, 9 - 20, 15.03.2021
Ismail Cuhadar
,
Selahattin Gelbal
Abstract
The institutions in education use various assessment methods to decide on the proficiency levels of students in a particular construct. This study investigated whether the decisions differed based on the type of assessment: norm- and criterion-referenced assessment. An achievement test with 20 multiple-choice items was administered to 107 students in guidance and psychological counseling department to assess their mastery in the course of measurement and evaluation. First, the raw scores were transformed into T-scores for the decisions from norm-referenced assessments. Two decisions were made to classify students as passed/failed comparing each student’s T-score with two common cutoffs in education: 50 and 60. Second, two standard-setting methods (i.e., Angoff and Nedelsky) were conducted to get two cut scores for the criterion-referenced assessment with the help of experts in measurement and evaluation. Two more decisions were made on the classification of students into pass/fail group by comparing the raw scores and the cut scores from two standard-setting methods. The proportions of students in pass/fail categories were found to be statistically different across each pair of four decisions from norm- and criterion-referenced assessments. Cohen’s Kappa showed that the decisions based on Nedelsky method indicated a moderate agreement with the pass/fail decisions from the students’ semester scores in measurement and evaluation while the other three decisions showed a lower agreement. Therefore, the criterion-referenced assessment with Nedelsky method might be considered in making pass/fail decisions in education due to its criterion validity from the agreement with the semester scores.
References
- Aiken, L. R. (2000). Psychological testing and assessment. Boston: Allyn and Bacon.
- Akdeniz University. (2017, September 17). Akdeniz Üniversitesi Ön Lisans ve Lisans Eğitim-Öğretim ve Sınav Yönetmeliği [Akdeniz University Regulations for Associate and Undergraduate Degree Education and Examinations]. Retrieved May 27, 2020, from http://oidb.akdeniz.edu.tr/wp-content/uploads/2017/02/Akdeniz-Üniversitesi-Ön-Lisans-ve-Lisans-Eğitim-Öğretim-ve-Sınav-Yönetmeliği-17.09.2017.pdf
- Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational Measurement. Washington, DC: American Council on Education.
- Ankara University. (2018, September 4). Ankara Üniversitesi Ön Lisans ve Lisans Eğitim-Öğretim Yönetmeliği [Ankara University Regulations for Associate and Undergraduate Degree Education and Examinations]. Retrieved May 27, 2020, from http://oidb.ankara.edu.tr/files/2018/04/ÖN-LİSANS-VE-LİSANS-EĞİTİM-ÖĞRETİM-YÖNETMELİĞİ.pdf
- Arrasmith, D. G. (1986). Investigation of judges’ errors in Angoff and contrasting groups cut of score methods [Doctoral dissertation, University of Massachusetts]. ProQuest Dissertations and Theses.
- Baykul, Y. (2010). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması [Measurement in Education and Psychology: Classical test theory and applications]. Ankara: Pegem Yayıncılık.
- Berk, R. A. (1976). Determination of optimal cutting scores in criterion-referenced measurement. Journal of Experimental Education, 45(2), 4-9.
- Calmorin, L. P., & Calmorin, M. A. (2007). Research methods and thesis writing. Manila: Rex Book Store.
- Cizek, G. J. (1993). Reconsidering standards and criteria. Journal of Educational Measurement, 30(2), 93-106.
- Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests. Thousands Oaks, CA: Sage Publications.
- Clark-Carter, D. (2005). Quantitative psychological research: a student handbook. New York, NY: Psychology Press.
- Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37-46.
- Crocker, L., & Algina, J. (2008). Introduction to classical and modern test theory. Ohio: Cengage Learning.
- Ebel, R. L. (1972). Essentials of educational measurement. Englewood Cliffs, NJ: Prentice Hall.
- Erciyes University. (2015, December 27). Erciyes Üniversitesi Ön Lisans ve Lisans Eğitim-Öğretim Yönetmeliği [Erciyes University Regulations for Associate and Undergraduate Degree Education and Examinations]. Retrieved May 27, 2020, from https://www.erciyes.edu.tr/kategori/ERU-E-BELGE/Yonetmelikler/131/136
- Ertürk, S. (1998). Eğitimde program geliştirme [Program development in education]. Ankara: Meteksan.
- Freeman, L., & Miller, A. (2001). Norm-referenced, criterion-referenced, and dynamic assessment: What exactly is the point? Educational Psychology in Practice, 17(1), 3-16.
- Çetin, S., & Gelbal, S. (2010). Impact of standard setting methodologies over passing scores. Ankara University, Journal of Faculty of Educational Sciences, 43(1), 79–95.
- Impara, J. C., & Plake, B. S. (1997). Standard setting: An alternative approach. Journal of Educational Measurement, 34(4), 353-366.
- Jaeger, R. M. (1989). Certification of student competence. In R. L. Linn (Ed.), Educational measurement (3rd. ed.; pp. 485-514). Washington, DC: American Council on Education and National Council on Measurement in Education.
- Jacobson, R. Y. (2008). Examination of the potential of selected norm-referenced tests and selected locally developed criterion-referenced tests to classify students into performance categories [Doctoral dissertation, University of Nebraska]. ProQuest Dissertations and Theses.
- Jekel, J. F. (2007). Epidemiology, biostatistics and preventive medicine. Philadelphia: Saunders/Elsevier.
- Johnson, D. L., & Martin, S. (1980). Criterion-referenced testing: New wine in old bottles. Academic Therapy, 16(2), 167 - 173.
- Kendall, M. G., & Smith, B. B. (1939). The problem of m rankings. The Annals of Mathematical Statistics, 10(3), 275-287.
- Kline, P. (2000). Handbook of psychological testing. London and New York: Routledge.
- Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174.
- Livingston, S. A., & Zieky, M. J. (1989). A comparative study of standard - setting methods. Applied Measurement in Education, 2(2), 121–141.
- McCauley, R. J., & Swisher, L. (1984). Use and misuse of norm-referenced tests in clinical assessment: A hypothetical case. Journal of Speech and Hearing Disorders, 49(4), 338-348.
- McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia Medica, 22(3), 276-282.
- Microsoft Corporation. (2013). Microsoft Excel. Retrieved from https://office.microsoft.com/excel
- Milli Eğitim Bakanlığı. (2014, July 26). Milli Eğitim Bakanlığı okul öncesi eğitim ve ilköğretim kurumları yönetmeliği [Ministry of National Education regulations for preschool and primary school education institutions]. Retrieved May 27, 2020, from http://mevzuat.meb.gov.tr/dosyalar/1703.pdf
- Milli Eğitim Bakanlığı. (2016, October 28). Milli Eğitim Bakanlığı ortaöğretim kurumları yönetmeliği [Ministry of National Education regulations for secondary school education institutions]. Retrieved May 27, 2020, from https://ogm.meb.gov.tr/meb_iys_dosyalar/ 2016_11/03111224_ooky.pdf
- Mohr, A. K. (2006). Criterion referenced and norm referenced predictors of student achievement: Teacher perceptions of, and correlations between, Iowa test of basic skills and the palmetto achievement challenge test [Doctoral dissertation, University of South Carolina]. ProQuest Dissertations and Theses.
- Montgomery, P. C., & Connolly, B. H. (1987). Norm-referenced and criterion referenced tests: Use in pediatrics and application to task analysis of motor skill. Physical Therapy, 67(12), 1873-1876.
- Murphy, K. R., & Davidshofer, C. O. (2005). Psychological testing: Principles and applications. New Jersey: Pearson.
- Nartgün, Z. (2007). Aynı puanlar üzerinden yapılan mutlak ve bağıl değerlendirme uygulamalarının notlarda farklılık oluşturup oluşturmadığına ilişkin bir inceleme [An investigation on whether the applications of the norm- and criterion-referenced assessments over the same scores make a difference in grading]. Ege Eğitim Dergisi, 8(1), 19-40.
- Nedelsky, L. (1954). Absolute grading standards for objective tests. Educational and Psychological Measurement, 14(1), 3–19.
- Oescher, J., Kirby, P. C., & Paradise, L. V. (1992). Validating state-mandating criterion-referenced achievement tests with norm-referenced test results for elementary and secondary students. Journal of Experimental Education, 60(2), 141-150.
- Paura, L., & Arhipova, I. (2014). Cause analysis of students’ dropout rate in higher education study program. Procedia-Social and Behavioral Sciences, 109, 1282-1286.
- Pester, A. M. (2003). Language intervention effects of norm-referenced and criterion referenced test scores [Master’s thesis, Miami University]. https://etd.ohiolink.edu/!etd.send_file?accession=miami1050351250&disposition=inline
- Sakarya University. (2019, April 18). Sakarya Üniversitesi Ön Lisans ve Lisans Eğitim-Öğretim ve Sınav Yönetmeliği [Sakarya University Regulations for Associate and Undergraduate Degree Education and Examinations]. Retrieved May 27, 2020, from https://www.sakarya.edu.tr/yeni-lisans-ve-onlisans-egitim-ogretim-ve-sinav-yonetmeligi-d330.html
- SPSS, Inc. (2009). PASW Statistics for Windows (Version 18.0) [Computer Program]. Chicago: SPSS Inc.
- Sönmez, V., & Alacapınar, F. G. (2011). Örneklendirilmiş bilimsel araştırma yöntemleri [Scientific research methods with examples]. Ankara: Anı Yayıncılık.
- Toprakçı, E., Baydemir, G., Koçak, A., & Akkuş, Ö. (2007, September). Eğitim fakültelerinin eğitim-öğretim ve sınav yönetmeliklerinin karşılaştırılması [A comparison of regulations for education and examinations in faculty of education]. Paper presented at the meeting of 16. Ulusal Eğitim Bilimleri Kongresi, Tokat, Turkey.
- Turgut, M. F., & Baykul, Y. (2010). Eğitimde ölçme ve değerlendirme [Measurement and evaluation in education]. Ankara: Pegem Yayıncılık.
- Tyler, R. W. (2013). Basic principles of curriculum and instruction. Chicago: The University of Chicago Press.
- Urbina, S. (2004). Essentials of psychological testing. New York: Wiley
- Visintainer, C. (2002). The relationship between two state-mandated, standardized tests using norm-referenced Terranova and the criteria-referenced, performance assessment developed for the Maryland school performance assessment program [Doctoral dissertation, Wilmington College]. ProQuest Dissertations and Theses.
- Vossensteyn, J. J., Kottmann, A., Jongbloed, B. W., Kaiser, F., Cremonini, L., Stensaker, B., Hovdhaugen, E., & Wollscheid, S. (2015). Dropout and completion in higher education in Europe: Main report.
- Yıldırım, C. (2011). Bilim felsefesi [Philosophy of science]. İstanbul: Remzi Kitabevi.
- Zieky, M. J., & Livingston, S. A. (1977). Manual for setting standards on the Basic Skills Assessment Tests. Princeton, NJ: Educational Testing Service.