Research Article
BibTex RIS Cite

Investigating The Effect Of Testlets Consisting Of Open-Ended And Multiple-Choice Items On Reliability Via Generalizability Theory

Year 2024, Volume: 15 Issue: 1, 65 - 78, 31.03.2024
https://doi.org/10.21031/epod.1429423

Abstract

This study aimed to reveal the effect of testlets consisting of open-ended and multiple-choice items with similar content on reliability. For this purpose, the Mathematics Achievement Test with four testlets, one of which consisted of open-ended items and the other of multiple-choice items, was administered to 128 8th-grade students. Reliability estimations on the obtained data were conducted in the Edu-G program based on the Generalizability Theory. A decision study was also performed in the study. In the achievement test with testlets consisting of open-ended items, p×i×r (p: person, i: item, r: rater) fully crossed design was used when testlet effect was not considered; p×(i:t)×r (t: testlet) nested design was used when testlet effect was considered. According to the results obtained, the reliability coefficient was estimated higher when the testlet effect was not considered. Similarly, in the achievement test with testlets consisting of multiple-choice items, the p×i crossed design was used when the testlet effect was not considered, and the p×(i:t) nested design was used when the testlet effect was considered. According to the results, the reliability coefficient was similarly estimated higher when the testlet effect was not considered. According to the data obtained within the scope of the study, the reliability coefficient was estimated according to the treatment of the testlet effect in the test with open-ended items.

References

  • Attali, Y., Laitusis, C., & Stone, E. (2016). Differences in reaction to immediate feedback and opportunity to revise answers for multiple-choice and open-ended questions. Educational and Psychological Measurement, 76(5), 787-802. https://journals.sagepub.com/doi/10.1177/0013164415612548
  • Baykul, Y. (2000). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. ÖSYM.
  • Berberoğlu, G. (2006). Sınıf içi ölçme değerlendirme teknikleri. Morpa Kültür.
  • Brennan, R. L. (2001). Generalizability theory. Springer-Verlag.
  • Bridgeman, B. (1992). A comparison of quantitative questions in open-ended and multiple-choice formats. Journal of Educational Measurement, 29(3), 253-271. https://doi.org/10.2307/145138
  • Doğan, N. (2009). Yazılı yoklamalar. In H. Atılgan (Ed.), Eğitimde ölçme ve değerlendirme (p.148). Anı.
  • Doğan, N. (2019a). Geleneksel ölçme ve değerlendirme teknikleri I: Yanıtı seçmeyi gerektiren ölçme araçları. In N. Doğan (Ed.), Eğitimde ölçme ve değerlendirme (pp. 113-138). Pegem Akademi.
  • Doğan, N. (2019b). Geleneksel ölçme ve değerlendirme teknikleri II: Yanıtı yapılandırmayı gerektiren ölçme araçları. In N. Doğan (Ed.), Eğitimde ölçme ve değerlendirme (pp:140-179). Pegem Akademi.
  • Downing, S. M. (2006). Twelve steps for effective test development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 3-26).
  • Gessaroli, M. E., & Folske, J.C. (2002). Generalizing the reliability of tests comprised of testlets. International Journal of Testing, 2(3-4), 277-295. https://doi.org/10.1080/15305058.2002.9669496
  • Güler, N., Kaya Uyanık, G., & Taşdelen Teker, G. (2012). Genellenebilirlik Kuramı. Pegem Akademi.
  • Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Taylor & Francis Group. https://ebookcentral.proquest.com/lib/gazi-ebooks/detail.action?docID=255610
  • Hendrickson, A. B. (2001). Reliability of scores from tests composed of testlets: A comparison of methods. Paper presented at the Annual Meeting of the National Council on Measurement in Education.
  • Henson, R. K. (2001). Understanding internal consistency reliability estimates: A conceptual primer on coefficient alpha. Measurement and Evaluation in Counseling and Development, 34(3), 177-189. https://www.tandfonline.com/doi/abs/10.1080/07481756.2002.12069034
  • Karasar, N. (1994). Bilimsel Araştırma Yöntemi. 3A Araştırma Eğitim Danışmanlık.
  • Karatoprak Erşen, R., & Gündüz, T. (2023). Seçme ve katkı gerektiren maddelerin yazımı ve düzenlenmesi için kontrol listeleri. Dokuz Eylül Üniversitesi Buca Eğitim Fakültesi Dergisi (58), 2473-2493. https://doi.org/10.53444/deubefd.1279240
  • Kaya Uyanık, G., & Ertuna, L. (2022). Examination of testlet effect in open-ended items. SAGE Open, 1-12. https://doi.org/10.1177/21582440221079849
  • Kaya Uyanık, G., & Gelbal, S. (2018). Madde tepki modellemesinde genellenebilirlik ile iki yüzeyli desenlerin incelenmesi. Journal of Measurement and Evaluation in Education and Psychology, 9(1), 17-32. https://doi.org/10.21031/epod.349718
  • Ko, M. H. (2010). A comparision of reading comprehension tests: Multiple-choice vs. open-ended. English Teaching, 65(1), 137-159. doi:10.15858/engtea.65.1.201003.137
  • Lee, G., & Frisbie, D. A. (1999). Estimating reliability under a generalizability theory model for test scores composed of testlets. Applied Measurement in Education, 12(3), 237–255. https://doi.org/10.1207/S15324818AME1203_2
  • Lee, G., & Park, I.-Y. (2012). A comparison of the approaches of generalizability theory and item response theory in estimating the reliability of test scores for testlet-composed tests. Asia Pacific Education Review, 13(1), 47-54. https://doi.org/10.1007/s12564-011-9170-0
  • Lee, G., Brennan, R. L., & Frisbie, D. A. (2000). Incorporating the testlet concept in test score analyses. Educational Measurement: Issues and Pratice, 19(4), 9-15. https://doi.org/10.1111/j.1745-3992.2000.tb00041.x
  • Moskal, B. M., & Leydens, J. A. (2000). Scoring rubric development: Validity and reliability. Practical Assessment, Research and Evaluation, 7(10), 1-6. https://doi.org/10.7275/q7rm-gg74
  • Miles, M. B., & Huberman, A. M. (1994). Qualitative data analyses: An expanded sourcebook. CA: Sage Publications.
  • Nitko, A. J., & Brookhart, S. M. (2014). Educational assessments of students (6th ed.). Essex: Pearson International.
  • Özçelik, D.A. (2013). Test hazırlama kılavuzu. Pegem Akademi.
  • Popham, J.W. (2014). Selected-response tests. In Classroom assessment: What teachers need to know (7th ed, pp. 155-180). Pearson Education Ltd.
  • Russell, M. & Airasian, P.(2008). Designing, administering, and scoring achievement tests. Classroom assessment: Concepts and applications içinde (7th ed, pp. 144-175). McGrawHill Higher Education.
  • Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: a primer. Sage Publicatons.
  • Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliabilty of testlet-based tests. Journal of Educational Measurement, 28(3), 237-247. https://doi.org/10.1111/j.1745-3984.1991.tb00356.x
  • Taşdelen Teker, G. (2014). Madde takımlarının güvenirlik ve değişen madde fonksiyonu üzerine etkisi. Doctoral Dissertation, Hacettepe Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
  • Taşdelen Teker, G., Şahin, M. G., & Baytemir, K. (2016). Using generalizability theory to investigate the reliability of peer assessment. Journal of Human Sciences, 13(3), 5574-5586. https://doi.org/10.14687/jhs.v13i3.4155
  • Tekin, H. (2009). Eğitimde ölçme ve değerlendirme. Yargı.
  • Thissen, D., Steinberg, L., & Mooney, J. A. (1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26(3), 247-260. https://doi.org/10.1111/j.1745-3984.1989.tb00331.x
  • Turgut, M. F. (1992). Eğitimde ölçme ve değerlendirme metotları. Saydam.
  • Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24(3), 185-201. https://doi.org/10.1111/j.1745-3984.1987.tb00274.x
  • Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 law school admissions test as an example. Applied Measurement in Education, 8(2), 157-186. https://doi.org/10.1207/s15324818ame0802_4
  • Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issues and Practice, 15(1), 22-29. http://doi: 10.1111/j.1745-3992.1996tb00803.x
  • Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & G. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–269). Springer. https://doi.org/10.1007/0-306-47531-6_13
  • Yaman, S. (2016). Çoktan seçmeli madde tipleri ve fen eğitiminde kullanılan örnekleri. Gazi Eğitim Bilimleri Dergisi, 2(2), 151-170. https://dergipark.org.tr/tr/pub/gebd/issue/35205/390659
Year 2024, Volume: 15 Issue: 1, 65 - 78, 31.03.2024
https://doi.org/10.21031/epod.1429423

Abstract

References

  • Attali, Y., Laitusis, C., & Stone, E. (2016). Differences in reaction to immediate feedback and opportunity to revise answers for multiple-choice and open-ended questions. Educational and Psychological Measurement, 76(5), 787-802. https://journals.sagepub.com/doi/10.1177/0013164415612548
  • Baykul, Y. (2000). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. ÖSYM.
  • Berberoğlu, G. (2006). Sınıf içi ölçme değerlendirme teknikleri. Morpa Kültür.
  • Brennan, R. L. (2001). Generalizability theory. Springer-Verlag.
  • Bridgeman, B. (1992). A comparison of quantitative questions in open-ended and multiple-choice formats. Journal of Educational Measurement, 29(3), 253-271. https://doi.org/10.2307/145138
  • Doğan, N. (2009). Yazılı yoklamalar. In H. Atılgan (Ed.), Eğitimde ölçme ve değerlendirme (p.148). Anı.
  • Doğan, N. (2019a). Geleneksel ölçme ve değerlendirme teknikleri I: Yanıtı seçmeyi gerektiren ölçme araçları. In N. Doğan (Ed.), Eğitimde ölçme ve değerlendirme (pp. 113-138). Pegem Akademi.
  • Doğan, N. (2019b). Geleneksel ölçme ve değerlendirme teknikleri II: Yanıtı yapılandırmayı gerektiren ölçme araçları. In N. Doğan (Ed.), Eğitimde ölçme ve değerlendirme (pp:140-179). Pegem Akademi.
  • Downing, S. M. (2006). Twelve steps for effective test development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 3-26).
  • Gessaroli, M. E., & Folske, J.C. (2002). Generalizing the reliability of tests comprised of testlets. International Journal of Testing, 2(3-4), 277-295. https://doi.org/10.1080/15305058.2002.9669496
  • Güler, N., Kaya Uyanık, G., & Taşdelen Teker, G. (2012). Genellenebilirlik Kuramı. Pegem Akademi.
  • Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Taylor & Francis Group. https://ebookcentral.proquest.com/lib/gazi-ebooks/detail.action?docID=255610
  • Hendrickson, A. B. (2001). Reliability of scores from tests composed of testlets: A comparison of methods. Paper presented at the Annual Meeting of the National Council on Measurement in Education.
  • Henson, R. K. (2001). Understanding internal consistency reliability estimates: A conceptual primer on coefficient alpha. Measurement and Evaluation in Counseling and Development, 34(3), 177-189. https://www.tandfonline.com/doi/abs/10.1080/07481756.2002.12069034
  • Karasar, N. (1994). Bilimsel Araştırma Yöntemi. 3A Araştırma Eğitim Danışmanlık.
  • Karatoprak Erşen, R., & Gündüz, T. (2023). Seçme ve katkı gerektiren maddelerin yazımı ve düzenlenmesi için kontrol listeleri. Dokuz Eylül Üniversitesi Buca Eğitim Fakültesi Dergisi (58), 2473-2493. https://doi.org/10.53444/deubefd.1279240
  • Kaya Uyanık, G., & Ertuna, L. (2022). Examination of testlet effect in open-ended items. SAGE Open, 1-12. https://doi.org/10.1177/21582440221079849
  • Kaya Uyanık, G., & Gelbal, S. (2018). Madde tepki modellemesinde genellenebilirlik ile iki yüzeyli desenlerin incelenmesi. Journal of Measurement and Evaluation in Education and Psychology, 9(1), 17-32. https://doi.org/10.21031/epod.349718
  • Ko, M. H. (2010). A comparision of reading comprehension tests: Multiple-choice vs. open-ended. English Teaching, 65(1), 137-159. doi:10.15858/engtea.65.1.201003.137
  • Lee, G., & Frisbie, D. A. (1999). Estimating reliability under a generalizability theory model for test scores composed of testlets. Applied Measurement in Education, 12(3), 237–255. https://doi.org/10.1207/S15324818AME1203_2
  • Lee, G., & Park, I.-Y. (2012). A comparison of the approaches of generalizability theory and item response theory in estimating the reliability of test scores for testlet-composed tests. Asia Pacific Education Review, 13(1), 47-54. https://doi.org/10.1007/s12564-011-9170-0
  • Lee, G., Brennan, R. L., & Frisbie, D. A. (2000). Incorporating the testlet concept in test score analyses. Educational Measurement: Issues and Pratice, 19(4), 9-15. https://doi.org/10.1111/j.1745-3992.2000.tb00041.x
  • Moskal, B. M., & Leydens, J. A. (2000). Scoring rubric development: Validity and reliability. Practical Assessment, Research and Evaluation, 7(10), 1-6. https://doi.org/10.7275/q7rm-gg74
  • Miles, M. B., & Huberman, A. M. (1994). Qualitative data analyses: An expanded sourcebook. CA: Sage Publications.
  • Nitko, A. J., & Brookhart, S. M. (2014). Educational assessments of students (6th ed.). Essex: Pearson International.
  • Özçelik, D.A. (2013). Test hazırlama kılavuzu. Pegem Akademi.
  • Popham, J.W. (2014). Selected-response tests. In Classroom assessment: What teachers need to know (7th ed, pp. 155-180). Pearson Education Ltd.
  • Russell, M. & Airasian, P.(2008). Designing, administering, and scoring achievement tests. Classroom assessment: Concepts and applications içinde (7th ed, pp. 144-175). McGrawHill Higher Education.
  • Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: a primer. Sage Publicatons.
  • Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliabilty of testlet-based tests. Journal of Educational Measurement, 28(3), 237-247. https://doi.org/10.1111/j.1745-3984.1991.tb00356.x
  • Taşdelen Teker, G. (2014). Madde takımlarının güvenirlik ve değişen madde fonksiyonu üzerine etkisi. Doctoral Dissertation, Hacettepe Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
  • Taşdelen Teker, G., Şahin, M. G., & Baytemir, K. (2016). Using generalizability theory to investigate the reliability of peer assessment. Journal of Human Sciences, 13(3), 5574-5586. https://doi.org/10.14687/jhs.v13i3.4155
  • Tekin, H. (2009). Eğitimde ölçme ve değerlendirme. Yargı.
  • Thissen, D., Steinberg, L., & Mooney, J. A. (1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26(3), 247-260. https://doi.org/10.1111/j.1745-3984.1989.tb00331.x
  • Turgut, M. F. (1992). Eğitimde ölçme ve değerlendirme metotları. Saydam.
  • Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24(3), 185-201. https://doi.org/10.1111/j.1745-3984.1987.tb00274.x
  • Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 law school admissions test as an example. Applied Measurement in Education, 8(2), 157-186. https://doi.org/10.1207/s15324818ame0802_4
  • Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issues and Practice, 15(1), 22-29. http://doi: 10.1111/j.1745-3992.1996tb00803.x
  • Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & G. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–269). Springer. https://doi.org/10.1007/0-306-47531-6_13
  • Yaman, S. (2016). Çoktan seçmeli madde tipleri ve fen eğitiminde kullanılan örnekleri. Gazi Eğitim Bilimleri Dergisi, 2(2), 151-170. https://dergipark.org.tr/tr/pub/gebd/issue/35205/390659
There are 40 citations in total.

Details

Primary Language English
Subjects Test Theories, Testing, Assessment and Psychometrics (Other)
Journal Section Articles
Authors

Serpil Kocaoğlu 0009-0008-0566-4371

Melek Gülşah Şahin 0000-0001-5139-9777

Publication Date March 31, 2024
Submission Date January 31, 2024
Acceptance Date March 25, 2024
Published in Issue Year 2024 Volume: 15 Issue: 1

Cite

APA Kocaoğlu, S., & Şahin, M. G. (2024). Investigating The Effect Of Testlets Consisting Of Open-Ended And Multiple-Choice Items On Reliability Via Generalizability Theory. Journal of Measurement and Evaluation in Education and Psychology, 15(1), 65-78. https://doi.org/10.21031/epod.1429423