Research Article
BibTex RIS Cite
Year 2019, Volume: 10 Issue: 3, 219 - 234, 04.09.2019
https://doi.org/10.21031/epod.477857

Abstract

References

  • Arıkan, S., & Kilmen, S. (2018). Sınıf İçi Ölçme ve Değerlendirmede Puanlara Anlam Kazandırma: %70 Doğru Yanıt Yöntemi. İlköğretim Online, 17(2), 888-908.
  • Beaton, A. E., & Allen, N. L. (1992). Interpreting scales through scale anchoring. Journal of Educational Statistics, 17(2), 191-204
  • Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 137–162). Newbury Park, CA: Sage.
  • Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9, 233–255. doi:10.1207/S15328007SEM0902_5.
  • Demirtaşlı, N. (2009). Eğitimde niteliği sağlamak: ölçme ve değerlendirme sistemi örneği olarak CİTO Türkiye öğrenci izleme sistemi (ÖİS). Cito Eğitim: Kuram ve Uygulama, 3, 25-38.
  • Draney, K., & Wilson, M. (2009). Selecting cut scores with a composite of item types: The ConstructMapping procedure. In E. V. Smith Jr. & G. E. Stone (Eds.), Criterion referenced testing: Practice analysis to score reporting using Rasch measurement models (pp. 276–293). Maple Grove, MN: JAM Press
  • Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists. London: Lawrence Elbaum Associates, Publishers.
  • George, D., & Mallery, P. (2003). SPSS for Windows step by step: A simple guide and reference. 11.0 update (4th ed.). Boston: Allyn ve Bacon.
  • Goodman, D. P., & Hambleton, R. K. (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education, 17(2), 145-220
  • Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47. doi:10.1111/j.1745-3992.1993.tb00543.x
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. (1991). Fundamentals of Item Response Theory. Newbury Park CA: Sage.
  • Hu, L.-T. & Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives, Structural Equation Modeling, 6, 1–55. doi:10.1080/10705519909540118
  • Huynh, H. (2006). A clarification on the response probability criterion RP67 for standard settings based on bookmark and item mapping. Educational Measurement: Issues and Practice, 25(2), 19-20.
  • Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329- 349.
  • Karantonis, A. (2017). Using Exemplar Items to Define Performance Categories: A Comparison of Item Mapping Methods. (Unpublished doctoral dissertation). University of Massachusetts, Amherst.
  • Karantonis, A., & Sireci, S. G. (2006). The bookmark standard‐setting method: A literature review. Educational Measurement: Issues and Practice, 25(1), 4-12.
  • Kennedy, C. A., Wilson, M. R., Draney, K., Tutunciyan, S., & Vorp, R. (2010). ConstructMap 4.6. [computer software]. Berkeley, California.
  • Kolstad, A., Cohen, J., Baldi, S., Chan, T., DeFur, E., & Angeles, J. (1998). The response probability convention used in reporting data from IRT assessment scales: Should NCES adopt a standard? Washington, DC: American Institutes for Research.
  • Muthen, B. O., & Muthen, L. K. (2015). Mplus (Version 7.4). California. Los Angeles.
  • Shulman, L. S. (2009). Assessment of teaching or assessment for teaching? Reflections on the invitational conference. In G. H. Gitomer (Ed.), Measurement issues and assessment for teaching quality. Thousand Oaks, CA: Sage Publications.
  • Ullman, J. B. (2001). Structural equation modeling. In B. Tabachnick & L. S. Fidell (Eds.), Using multivariate statistics (4th ed., pp.653-771). Boston: Allyn & Bacon.
  • Van de Vijver, F. J. R. (2017). Capturing bias in structural equation modeling. In E. Davidov, P. Schmidt, & J. Billiet (Eds.), Cross-cultural analysis. Methods and applications (2nd, revised edition). New York, NY: Routledge.
  • Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-Type (Ordinal) Item Scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zwick, R., Senturk, D., Wang, J., & Loomis, S. C. (2001). An investigation of alternative methods for item mapping in the National Assessment of Educational Progress. Educational Measurement: Issues and Practice, 20(2), 15-25.

An Example of Empirical and Model Based Methods for Performance Descriptors: English Proficiency Test

Year 2019, Volume: 10 Issue: 3, 219 - 234, 04.09.2019
https://doi.org/10.21031/epod.477857

Abstract

Great
emphasis is given to the development of high-stake tests all around the world
and in Turkey. However, limited emphasis is given to adequate score reporting.
Too much emphasis on rankings and almost no emphasis on performance level
descriptors (meaning of the scores) have leaded a “ranking culture” in Turkey.
There is an immense need to raise awareness about score reporting and
performance level descriptions in Turkey. This study aims to raise awareness
about the use of performance level descriptors in a high-stake exam in Turkey,
an English proficiency exam. The study sample is consisted of 630 undergraduate
students who took the 2016-2017 English proficiency exam of a public university
in the southwest of the Turkey. In order to identify the potential exemplars,
two types of item mapping methods (i.e. experimental based method and
model-based method) were used in the present study. Item grouping for
performance level descriptors provided hierarchical and interpretable
structure. Using these performance level descriptors, it is possible to give
criterion referenced feedback to each student about his/her reading abilities.

References

  • Arıkan, S., & Kilmen, S. (2018). Sınıf İçi Ölçme ve Değerlendirmede Puanlara Anlam Kazandırma: %70 Doğru Yanıt Yöntemi. İlköğretim Online, 17(2), 888-908.
  • Beaton, A. E., & Allen, N. L. (1992). Interpreting scales through scale anchoring. Journal of Educational Statistics, 17(2), 191-204
  • Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 137–162). Newbury Park, CA: Sage.
  • Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9, 233–255. doi:10.1207/S15328007SEM0902_5.
  • Demirtaşlı, N. (2009). Eğitimde niteliği sağlamak: ölçme ve değerlendirme sistemi örneği olarak CİTO Türkiye öğrenci izleme sistemi (ÖİS). Cito Eğitim: Kuram ve Uygulama, 3, 25-38.
  • Draney, K., & Wilson, M. (2009). Selecting cut scores with a composite of item types: The ConstructMapping procedure. In E. V. Smith Jr. & G. E. Stone (Eds.), Criterion referenced testing: Practice analysis to score reporting using Rasch measurement models (pp. 276–293). Maple Grove, MN: JAM Press
  • Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists. London: Lawrence Elbaum Associates, Publishers.
  • George, D., & Mallery, P. (2003). SPSS for Windows step by step: A simple guide and reference. 11.0 update (4th ed.). Boston: Allyn ve Bacon.
  • Goodman, D. P., & Hambleton, R. K. (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education, 17(2), 145-220
  • Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47. doi:10.1111/j.1745-3992.1993.tb00543.x
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. (1991). Fundamentals of Item Response Theory. Newbury Park CA: Sage.
  • Hu, L.-T. & Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives, Structural Equation Modeling, 6, 1–55. doi:10.1080/10705519909540118
  • Huynh, H. (2006). A clarification on the response probability criterion RP67 for standard settings based on bookmark and item mapping. Educational Measurement: Issues and Practice, 25(2), 19-20.
  • Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329- 349.
  • Karantonis, A. (2017). Using Exemplar Items to Define Performance Categories: A Comparison of Item Mapping Methods. (Unpublished doctoral dissertation). University of Massachusetts, Amherst.
  • Karantonis, A., & Sireci, S. G. (2006). The bookmark standard‐setting method: A literature review. Educational Measurement: Issues and Practice, 25(1), 4-12.
  • Kennedy, C. A., Wilson, M. R., Draney, K., Tutunciyan, S., & Vorp, R. (2010). ConstructMap 4.6. [computer software]. Berkeley, California.
  • Kolstad, A., Cohen, J., Baldi, S., Chan, T., DeFur, E., & Angeles, J. (1998). The response probability convention used in reporting data from IRT assessment scales: Should NCES adopt a standard? Washington, DC: American Institutes for Research.
  • Muthen, B. O., & Muthen, L. K. (2015). Mplus (Version 7.4). California. Los Angeles.
  • Shulman, L. S. (2009). Assessment of teaching or assessment for teaching? Reflections on the invitational conference. In G. H. Gitomer (Ed.), Measurement issues and assessment for teaching quality. Thousand Oaks, CA: Sage Publications.
  • Ullman, J. B. (2001). Structural equation modeling. In B. Tabachnick & L. S. Fidell (Eds.), Using multivariate statistics (4th ed., pp.653-771). Boston: Allyn & Bacon.
  • Van de Vijver, F. J. R. (2017). Capturing bias in structural equation modeling. In E. Davidov, P. Schmidt, & J. Billiet (Eds.), Cross-cultural analysis. Methods and applications (2nd, revised edition). New York, NY: Routledge.
  • Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-Type (Ordinal) Item Scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zwick, R., Senturk, D., Wang, J., & Loomis, S. C. (2001). An investigation of alternative methods for item mapping in the National Assessment of Educational Progress. Educational Measurement: Issues and Practice, 20(2), 15-25.
There are 24 citations in total.

Details

Primary Language English
Journal Section Articles
Authors

Serkan Arıkan

Sevilay Kilmen 0000-0002-5432-7338

Mehmet Abi 0000-0002-4976-5173

Eda Üstünel 0000-0003-2137-1671

Publication Date September 4, 2019
Acceptance Date June 30, 2019
Published in Issue Year 2019 Volume: 10 Issue: 3

Cite

APA Arıkan, S., Kilmen, S., Abi, M., Üstünel, E. (2019). An Example of Empirical and Model Based Methods for Performance Descriptors: English Proficiency Test. Journal of Measurement and Evaluation in Education and Psychology, 10(3), 219-234. https://doi.org/10.21031/epod.477857