An Investigation of Ordering Test Items Differently Depending on Their Difficulty Level by Differential Item Functioning

Ebru Balta; Secil Omur Sunbul

Research Article

Maddeleri Güçlüklerine Göre Farklı Sıralamanın Birey Tepkilerine Etkisinin Değişen Madde Fonksiyonuyla İncelenmesi

Year 2017, Volume: 17 Issue: 72, 23 - 42, 20.11.2017

Ebru Balta , Secil Omur Sunbul

Abstract

Problem Durumu: Bireylerin maddelere verdiği tepki davranışlarının, maddenin, test içerisindeki sırasından beklenmedik şekilde etkilenmesi sıra etkisi (position effect) olarak tanımlanmaktadır. Sıra etkisi bireyin test performansını çeşitli şekillerde etkilemektedir. Madde güçlüğü açısından kolay ve zor maddelerin testin başında ya da sonunda yer almasına bağlı olarak öğrencilerin test boyunca motivasyonları artıp ya da azalmakta ve böylece test puanları etkilenmektedir. Ayrıca, maddelerin güçlük düzeylerine göre kolaydan zora doğru sıralandığı, yani madde güçlüğü açısından zor maddenin testin sonlarına doğru yer aldığı durumlarda pratik ya da öğrenme etkisi (learning effect), madde güçlüğü açısından kolay maddelerin testin sonlarına doğru yer aldığı durumlarda ise yorgunluk etkisi (fatique effect) gözlenmekte ve böylece maddelerin güçlük düzeyleri farklı değerler alabilmektedir. Literatür incelendiğinde madde sıra etkisinin göz önünde bulundurulması test geçerliğini değerlendirmede önemli olduğu görülmektedir.

Araştırmanın Amacı ve Önemi: Bu çalışmada, maddelerin güçlük düzeylerine göre test içerisinde farklı sıralarda (kolaydan zora ve zordan kolaya) yerleştirildiği farklı test formlarının verilmesinin, testte yer alan maddelerde DMF oluşturup

40 Ebru BALTA – Secil OMUR SUNBUL / Eurasian Journal of Educational Research 72 (2017) 23-42

oluşturmadığının ve kullanılan DMF belirleme yöntemleri arasındaki uyumun belirlenmesi amaçlanmıştır. Çoktan seçmeli bir testte yer alan maddelerin güçlük düzeylerine göre sıralanmasının maddelerde Değişen Madde Fonksiyonu (DMF) yaratıp yaratmadığına ilişkin yurtdışında çok az çalışmaya rastlanmış olup yurtiçinde ise doğrudan bir çalışmaya rastlanamamıştır. Bu açıdan, bu çalışmanın alan yazına katkı sunacağı ve bu tarz çalışmalara ve geniş çapta yapılan sınavlara da ışık tutacağı düşünülmektedir.

Araştırmanın Yöntemi: Araştırmada, araştırmacı tarafından ikisi paralel olmak üzere toplamda üç adet Matematik Başarı Testi kullanılmıştır. Testlerden biri, odak ve referans gruplarının oluşturulması için, öğrencilerin Temel Matematik dersindeki bilgi ve becerileri açısından yetenek düzeylerinin belirlenmesinde ve paralel olan diğer iki test ise, maddelerin güçlük düzeylerine göre kolaydan-zora ve zordan-kolaya sıralanarak verilmesi durumunun DMF yaratıp yaratmadığının tespit edilmesinde kullanılmıştır. Öğrencilerin testlerdeki maddeleri, testlerde yer alan sıraya göre cevapladıklarından emin olmak için testler, bilgisayar ortamında Moodle açık kaynak kodlu uzaktan eğitim sistemi kullanılarak uygulanmıştır. Araştırmanın çalışma grubunu, amaçlı örnekleme yöntemiyle seçilen, araştırmanın tekrarlı ölçümlere dayanmasından kaynaklı olarak yapılan eşleştirme ve veri ön izleme süreçlerinin ardından belirlenen, toplamda 300 (odak grup (150 öğrenci) ve referans grup (150 öğrenci )) öğrenci oluşturmaktadır. Uygulamaya katılan öğrencilerin üç test formunu da alması sağlanmıştır. Karşıt dengelenmiş desen kullanılarak testlerdeki sıra etkisi ortadan kaldırılmıştır. Testlerde yer alan maddelerin DMF içerip içermediği Mantel-Haenszel (MH) ve Lojistik Regresyon(LR) yöntemleriyle odak grubunun KZ (maddelerin kolaydan zora doğru sıralandığı test formu), referans grubunun ZK (maddelerin zordan kolaya doğru sıralandığı test formu) test formunu alması (birinci durum) ve odak grubunun ZK, referans grubunun KZ test formunu alması durumuna (ikinci durum) göre belirlenmiştir. Bu testlerden elde edilen veriler R-3.2.0 ve “difR” paketi kullanılarak analiz edilmiştir.

Araştırmanın Bulguları: Birinci duruma göre, MH yöntemiyle yapılan analiz sonuçlarına göre DMF gösteren maddelerden, dört tanesinin orta düzeyde (B), bir tanesinin de yüksek düzeyde (C) DMF gösterdiği belirlenmiştir. B düzeyinde DMF gösteren maddelerden bir tanesinin orta güçlükte madde, bir tanesinin ise zor madde ve KZ test formunu alan öğrencilerin (odak grup) lehine olduğu ve diğer iki tanesinin ise zor madde ve ZK test formunu alan öğrencilerin (referans grup) lehine olduğu görülmektedir. C düzeyinde DMF içeren maddenin ise, orta güçlükte bir madde olduğu ve ZK test formunu alan öğrencilerin lehine olduğu görülmektedir. LR yöntemiyle hem TB DMF hem de TBO DMFyi belirlemek için yapılan analizlerde ise, orta düzeyde (B) ve yüksek düzeyde (C) DMF gösteren maddenin bulunmadığı görülmektedir. İkinci duruma göre, MH yöntemiyle yapılan analiz sonuçlarına göre, bir maddenin orta düzeyde (B), beş maddenin de yüksek düzeyde (C) DMF gösterdiği belirlenmiştir. B düzeyinde DMF içeren maddenin zor madde olduğu ve KZ test formunu alan öğrencilerin lehine olduğu, C düzeyinde DMF gösteren iki maddeden bir tanesinin kolay madde, bir tanesinin ise orta güçlükte madde ve ZK test formunu alan öğrencilerin lehine olduğu ve üç maddeden iki tanesinin orta

Ebru BALTA – Secil OMUR SUNBUL / Eurasian Journal of Educational Research 72 (2017) 23-42 41

güçlükte madde ve bir tanesinin ise zor madde olduğu ve KZ test formunu alan öğrencilerin lehine olduğu görülmektedir. LR yöntemi ile TB DMF’yi belirlemek için yapılan analiz sonucuna göre iki maddenin orta düzeyde (B) DMF gösterdiği belirlenmiştir. Orta düzeyde (B) DMF gösterdiği belirlenen iki maddenin de orta güçlükte madde olduğu ve maddelerden bir tanesinin ZK test formunu alan öğrencilerin lehine diğerinin ise KZ test formunu alan öğrencilerin lehine işlediği görülmektedir. Yöntemlerin maddelerdeki DMF miktarlarının büyüklük sıralaması bakımından benzerliklerinin belirlenebilmesi için, test maddelerinin TB DMF gösterip göstermediğini belirleyebilmek için yapılan LR ve MH yöntemlerine göre elde edilen ki-kare değerleri arasında Spearman sıra farkları korelasyon katsayısı hesaplanmıştır. Hesaplamalar sonucunda her iki durum için de, iki yöntemin DMF büyüklük sıralamaları arasında istatistiksel olarak manidar bir ilişkinin bulunduğu görülmektedir ( r1 = .90, r2 = .92; p< .01).

Araştırmanın Sonuçları ve Önerileri: Araştırmanın bulguları, güçlük düzeyi düşük olan maddeler açısından incelendiğinde, kolay maddeleri sonra alan grubun, maddeleri doğru cevaplama olasılıklarında artışların olduğu söylenebilir. Orta güçlükte yer alan maddelerin, her iki uygulamada da hem odak hem de referans grubunun lehine işlediği görülmektedir. Bu durumda, orta güçlükteki maddelerin kolay ya da zor maddeden sonra gelmesinin maddenin doğru cevaplama olasılığını etkilediği söylenebilir. Güçlük düzeyi yüksek olan maddeler (zor maddeler) açısından, her iki analize dair bulgular incelendiğinde, zor maddelerin hem testin başında yer aldığı durumda hem de testin sonunda yer aldığı durumda, maddelerin doğru cevaplandırılma olasılığını etkilediği söylenebilir. Böylece bu çalışmada, maddelerin güçlük düzeylerine göre farklı şekilde sıralanmasının farklı gruplarda yer alan bireylerin, maddelere, doğru cevap verme olasılıklarını etkilediği sonucuna ulaşılmıştır. Ayrıca, zor maddelerin test formunun sonunda yer alması, maddelerin cevaplanma olasılığındaki farklılığın artmasına neden olmaktadır. Ayrıca yapılan her iki analizde kullanılan LR ve MH yöntemlerinin, DMF miktarlarındaki büyüklük sıralamalarında benzer, DMF’li maddeler bakımından farklı sonuçlar ürettiği sonucuna ulaşılmıştır. DMF içeren madde sayısı bakımından, MH yönteminin LR yönteminden, daha duyarlı olduğu görülmektedir. Bu araştırma kapsamında, DMF’nin belirlenmesinde, Klasik Test Kuramı’na dayalı yöntemlerden MH ve LR yöntemleri kullanılmıştır. Daha sonraki yapılacak olan çalışmalarda, KTK’ya dayalı diğer yöntemler ve IRT’ye dayalı yöntemlerle DMF belirlenebilir. Farklı yöntemlerden elde edilecek sonuçlar karşılaştırılabilir

Keywords

Madde Sıralamaları, Mantel-Haenszel, Lojistik Regresyon, Moodle

References

Barcikovski, R. S., & Olsen, H. (1975). Test item arrangement and adaptation level. The Journal of Psychology, 90(1), 87-93. doi: 10.1080/00223980.1975.9923929.
Bertrand, R., & Boiteau, N. (2003). Comparing the stability of IRT-based and non IRT- based DIF methods in different cultural context using TIMSS data. (EDRS Reports – Research -143 , ED 476 924, TM 034 975). Quebec, Canada: NA. (ERIC Document Reproduction Service No. ED476924).
Bulut, O. (2015). An empirical analysis of gender-based DIF due to test booklet effect. European Journal of Research on Education, 3(1), 7-16. Retrieved from http://iassr2.org/rs/030102.pdf
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Hollywood: Sage Publication.
Clauser, B. E., & Mazor, K. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement, Issues and Practice. 17(1), 31–44. doi: 10.1111/j.1745-3992.1998.tb00619.x.
Chiu, P. (2012,April) . The Effect of item position on state mathematics assessment. Paper presented at the Annual Meeting of the American Educational Research Association, Vancouver, Canada.
Gomez‐Benito, J., & Navas‐Ara, M. J. (2000). A comparison of Ki‐kare, RFA and IRT based procedures in the detection of DIF. Quality ve Quantity, 34(1),17–31. doi: 10.1023/A:1004703709442.
Hahne, J. (2008). Analyzing position effects within reasoning items using the LLTM for structurally Incomplete data. Psychology Science Quarterly, 50(3), 379-390. Retrieved from http://journaldatabase.info/articles/analyzing_position_effects_within.html
Hohensinn, C., Kubinger, K. D., Reif, M., Schleicher, E., & Khorramdel, L. (2011). Analysing item position effects due to test booklet design within large-scale assessment. Educational Research and Evaluation, 17(6), 497-509. doi: 10.1080/13803611.2011.632668.
Holland, P. W., & Wainer, H. E. (1993). Differential item functioning. Hillsdale, NewJersey: Lawrence Erlbaum Associates.
Jodoin, M. G., & Gierl, M.J. (2001). Evaluating type I error and power rates using an effect size measure with logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329-349. doi:10.1207/S15324818AME1404_2.
Kingston, N. M., & Dorans, N. J. (1984). Item location effects and their implications for IRT equating and adaptive testing. Applied Psychological Measurement, 8(2), 147–154. Retrieved from doi: 10.1177/014662168400800202.
Kleinke, D. J. (1980). Item order, response location, and examinee sex and handedness on performance on multiple-choice tests. Journal of Educational Research, 73(4), 225–229. doi:10.1080/00220671.1980.10885240.
Klimko, I. P. (1984). Item arrangement, cognitive entry characteristics, sex, and test anxiety as predictors of achievement examination performance. Journal of Experimental Education, 52(4), 214-219. doi: 10.1080/00220973.1984.11011896.
Leary, L. F., & Dorans, N. J. (1985). Implications for altering the context in which test items appear: A Historical Perspective on Immediate Concern. Review of Educational Research, 55(33), 387-413. doi: 10.3102/00346543055003387. Lord, F. M.,& Novick, M. R. (1968). Statistical theories of mental test scores. Reading, Mass.: Addison-Welsley Publishing Company.
Louisa, N. (2013). Effect of item arrangement on test reliability coefficients: implications for testing. Journal of Research in Education and Society, 4(3), 54- 62. Retrieved from http://journals.sagepub.com/doi/abs/10.1177/001316447303300224?journalCode=epma
Magis, D., Beland, S., & Raiche, G. (2015). difR: Collection of methods to detect dichotomous differential item functioning (DIF). [Computer software]. Available from http://CRAN.R-project.org/package=difR Miller, S. K. (1989). Interaction effects of gender and item arrangement on test and item performance (Unpublished doctoral dissertation). University of Nebraska, Lincoln.
Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 18(4), 315-338. doi: 10.1177/014662169401800403.
Pang, X. L., & et all. (1994,April). Performance of Mantel-Haenszel and Logistic Regression DIF procedures over replications using real data. Paper presented at the Annual Meeting of the American Educational Research Association,New Orleans, LA.
Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics psychometrics (Vol.26, pp. 125–167). Amsterdam: Elsevier.
Ryan, K. E. ,& Chiu, S. (2001). An examination of item context effects, DIF, and gender DIF. Applied Measurement in Education, 14 (1), 73–90. doi:10.1207/S15324818AME1401_06.
Wiberg, M. (2007). Measuring and detecting differential item functioning in criterion-referenced licensing test: a theoretic comparison of methods (EM No. 60). Umea, Sweden: Umea University, Department of Educational Measurement.

An Investigation of Ordering Test Items Differently Depending on Their Difficulty Level by Differential Item Functioning

Year 2017, Volume: 17 Issue: 72, 23 - 42, 20.11.2017

Ebru Balta , Secil Omur Sunbul

Abstract

Purpose : Position effects may influence examinees’ test performances in several ways and trigger other psychometric issues, such as Differential Item Functioning (DIF) .This study aims to supply test forms in which items in the test are ordered differently, depending on their difficulty level (from easy to difficult or difficult to easy), to determine whether the items in the test form result in DIF and whether a consistency exists between the methods for detecting DIF. Research Methods: Methods of Mantel Haenszel (MH) and Logistic Regression (LR) have been taken into consideration to identify whether the items in the tests involve DIF.The data of the work includes the answers of 300 students in the focal group and the reference group, who sat for three mathematics achievement tests. The data obtained from the tests have been statistically analyzed by using the R- 3.2.0. software program. Findings: Results of this study can be summarized with the following findings: “ordering the items differently, depending on their difficulty level, affects the probability of individuals in various groups answering the items correctly; also, LR and MH methods produce different results with respect to the items with DIF, which they have identified similar in terms of magnitude order in the amount of DIF. Implications for Research and Practice: In further test-developing studies, in order to identify if DIF emerges when giving the test form which has a different ordering of items, with regard to subjects and cognitive difficulty levels.

Keywords

Item Orderings, Mantel-Haenszel, Logistic Regression, Moodle

References

Barcikovski, R. S., & Olsen, H. (1975). Test item arrangement and adaptation level. The Journal of Psychology, 90(1), 87-93. doi: 10.1080/00223980.1975.9923929.
Bertrand, R., & Boiteau, N. (2003). Comparing the stability of IRT-based and non IRT- based DIF methods in different cultural context using TIMSS data. (EDRS Reports – Research -143 , ED 476 924, TM 034 975). Quebec, Canada: NA. (ERIC Document Reproduction Service No. ED476924).
Bulut, O. (2015). An empirical analysis of gender-based DIF due to test booklet effect. European Journal of Research on Education, 3(1), 7-16. Retrieved from http://iassr2.org/rs/030102.pdf
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Hollywood: Sage Publication.
Clauser, B. E., & Mazor, K. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement, Issues and Practice. 17(1), 31–44. doi: 10.1111/j.1745-3992.1998.tb00619.x.
Chiu, P. (2012,April) . The Effect of item position on state mathematics assessment. Paper presented at the Annual Meeting of the American Educational Research Association, Vancouver, Canada.
Gomez‐Benito, J., & Navas‐Ara, M. J. (2000). A comparison of Ki‐kare, RFA and IRT based procedures in the detection of DIF. Quality ve Quantity, 34(1),17–31. doi: 10.1023/A:1004703709442.
Hahne, J. (2008). Analyzing position effects within reasoning items using the LLTM for structurally Incomplete data. Psychology Science Quarterly, 50(3), 379-390. Retrieved from http://journaldatabase.info/articles/analyzing_position_effects_within.html
Hohensinn, C., Kubinger, K. D., Reif, M., Schleicher, E., & Khorramdel, L. (2011). Analysing item position effects due to test booklet design within large-scale assessment. Educational Research and Evaluation, 17(6), 497-509. doi: 10.1080/13803611.2011.632668.
Holland, P. W., & Wainer, H. E. (1993). Differential item functioning. Hillsdale, NewJersey: Lawrence Erlbaum Associates.
Jodoin, M. G., & Gierl, M.J. (2001). Evaluating type I error and power rates using an effect size measure with logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329-349. doi:10.1207/S15324818AME1404_2.
Kingston, N. M., & Dorans, N. J. (1984). Item location effects and their implications for IRT equating and adaptive testing. Applied Psychological Measurement, 8(2), 147–154. Retrieved from doi: 10.1177/014662168400800202.
Kleinke, D. J. (1980). Item order, response location, and examinee sex and handedness on performance on multiple-choice tests. Journal of Educational Research, 73(4), 225–229. doi:10.1080/00220671.1980.10885240.
Klimko, I. P. (1984). Item arrangement, cognitive entry characteristics, sex, and test anxiety as predictors of achievement examination performance. Journal of Experimental Education, 52(4), 214-219. doi: 10.1080/00220973.1984.11011896.
Leary, L. F., & Dorans, N. J. (1985). Implications for altering the context in which test items appear: A Historical Perspective on Immediate Concern. Review of Educational Research, 55(33), 387-413. doi: 10.3102/00346543055003387. Lord, F. M.,& Novick, M. R. (1968). Statistical theories of mental test scores. Reading, Mass.: Addison-Welsley Publishing Company.
Louisa, N. (2013). Effect of item arrangement on test reliability coefficients: implications for testing. Journal of Research in Education and Society, 4(3), 54- 62. Retrieved from http://journals.sagepub.com/doi/abs/10.1177/001316447303300224?journalCode=epma
Magis, D., Beland, S., & Raiche, G. (2015). difR: Collection of methods to detect dichotomous differential item functioning (DIF). [Computer software]. Available from http://CRAN.R-project.org/package=difR Miller, S. K. (1989). Interaction effects of gender and item arrangement on test and item performance (Unpublished doctoral dissertation). University of Nebraska, Lincoln.
Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 18(4), 315-338. doi: 10.1177/014662169401800403.
Pang, X. L., & et all. (1994,April). Performance of Mantel-Haenszel and Logistic Regression DIF procedures over replications using real data. Paper presented at the Annual Meeting of the American Educational Research Association,New Orleans, LA.
Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics psychometrics (Vol.26, pp. 125–167). Amsterdam: Elsevier.
Ryan, K. E. ,& Chiu, S. (2001). An examination of item context effects, DIF, and gender DIF. Applied Measurement in Education, 14 (1), 73–90. doi:10.1207/S15324818AME1401_06.
Wiberg, M. (2007). Measuring and detecting differential item functioning in criterion-referenced licensing test: a theoretic comparison of methods (EM No. 60). Umea, Sweden: Umea University, Department of Educational Measurement.

There are 22 citations in total.

Details

Primary Language	English
Journal Section	Articles
Authors	Ebru Balta Secil Omur Sunbul
Publication Date	November 20, 2017
Published in Issue	Year 2017 Volume: 17 Issue: 72

Cite

APA	Balta, E., & Sunbul, S. O. (2017). An Investigation of Ordering Test Items Differently Depending on Their Difficulty Level by Differential Item Functioning. Eurasian Journal of Educational Research, 17(72), 23-42.
AMA	Balta E, Sunbul SO. An Investigation of Ordering Test Items Differently Depending on Their Difficulty Level by Differential Item Functioning. Eurasian Journal of Educational Research. November 2017;17(72):23-42.
Chicago	Balta, Ebru, and Secil Omur Sunbul. “An Investigation of Ordering Test Items Differently Depending on Their Difficulty Level by Differential Item Functioning”. Eurasian Journal of Educational Research 17, no. 72 (November 2017): 23-42.
EndNote	Balta E, Sunbul SO (November 1, 2017) An Investigation of Ordering Test Items Differently Depending on Their Difficulty Level by Differential Item Functioning. Eurasian Journal of Educational Research 17 72 23–42.
IEEE	E. Balta and S. O. Sunbul, “An Investigation of Ordering Test Items Differently Depending on Their Difficulty Level by Differential Item Functioning”, Eurasian Journal of Educational Research, vol. 17, no. 72, pp. 23–42, 2017.
ISNAD	Balta, Ebru - Sunbul, Secil Omur. “An Investigation of Ordering Test Items Differently Depending on Their Difficulty Level by Differential Item Functioning”. Eurasian Journal of Educational Research 17/72 (November 2017), 23-42.
JAMA	Balta E, Sunbul SO. An Investigation of Ordering Test Items Differently Depending on Their Difficulty Level by Differential Item Functioning. Eurasian Journal of Educational Research. 2017;17:23–42.
MLA	Balta, Ebru and Secil Omur Sunbul. “An Investigation of Ordering Test Items Differently Depending on Their Difficulty Level by Differential Item Functioning”. Eurasian Journal of Educational Research, vol. 17, no. 72, 2017, pp. 23-42.
Vancouver	Balta E, Sunbul SO. An Investigation of Ordering Test Items Differently Depending on Their Difficulty Level by Differential Item Functioning. Eurasian Journal of Educational Research. 2017;17(72):23-42.

Download Cover Image

Article Files

Full Text