Comparison of Different Forms of a Test with or without Items that Exhibit DIF

Onder Kamil Tulek; İbrahim Alper Kose

Araştırma Makalesi

Comparison of Different Forms of a Test with or without Items that Exhibit DIF

Yıl 2019, Cilt: 19 Sayı: 83, 167 - 182, 31.10.2019

Öz

Purpose: This research investigates Tests that
include DIF items and which are purified from DIF items. While doing this, the
ability estimations and purified DIF items are compared to understand whether
there is a correlation between the estimations.

Method:
The researcher used to R 3.4.1 in order to compare the items and after this
situation; according to manipulated factors, we carried out the data production
under different circumstances with the help of simulation study. The
manipulated factors were determined levels of sample size (1000, 2000), test length
(40, 60) and percentage of DIF (%5, %10). By using the new data each
condition’s DIF items’ ability estimations were carried out. Afterward, DIF
items purified from the tests and later the abilities were estimated. The
correlation between the ability parameters was calculated by using the
Spearman's Rank Correlation Coefficient and these parameters were calculated
separately according to the eight conditions.

Findings: After calculations, all of the
coefficients of correlations (rs)’ values were almost zero (p<0.01). In
other words the test length 40 and 60, sample size 1000 and 2000, percentage of
DIF %5 and %10, when we crossed these parameters in different eight conditions,
there was no familiar correlation between the tests that include DIF items and
tests of that purified from DIF items. Besides, there was no correlation
between the tests thinking the ability estimations; if we exclude DIF items
from the tests, the individuals’ test ranking changes, too.

Implication for Research and Practice: This
study showed that tests that include DIF items affect the ability estimation of
individuals. In the frame of this result, teachers, administrators, and
policymakers should bear in mind tests DIF potential. Also, this study may be carried
out by using various conditions.

Anahtar Kelimeler

purification , the estimate of ability , DIF

Kaynakça

Atalay Kabasakal, K. (2014). The effect of differential item functioning on test equating (Unpublished doctoral dissertation). Hacettepe University, Ankara.Turkey
Atar, B. & Kamata, A. (2011). Comparison of IRT likelihood ratio test and logistic regression DIF detection procedures. Hacettepe University Journal of Education, 41, 36-47.
Bakan Kalaycıoğlu, D. & Kelecioğlu, H. (2011). Item Bias Analysis of the University Entrance Examination Education and Science, 36 (161), 3-12.
Basusta, N. B. (2013). An investigation of item bias in PISA 2006 Science Test in terms of the language and culture (Unpublished mastery dissertation). Hacettepe University, Ankara.Turkey
Camilli, G. & Shepard, L. A. (1994). Methods for identifying biased test items. Hollywood: Sage.
Cepni, Z. (2011). Differential item functioning analysis using SIBTEST, Mantel Haenszel, logistic regression and item Response Theory Methods (Unpublished doctoral dissertation). Hacettepe University, Ankara.Turkey
Chu, K. L. (2002). Equivalent group test equating with the presence of differential item functioning (Unpublished doctorate dissertation). The Florida State University.
Chu, K. L., ve Kamata, A. (2005). Test equating in the presence of dif items. Journal of Applied Measurement. Special Issue: The Multilevel Measurement Model, 6 (3), 342-354.
Clauser, E. B., Mazor, K., ve Hambleton, K. R. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel-Haenszel procedure. Applied Measurement in Education, 6(4), 269–279.
Clauser, B. & Mazor, K. (1998). Using statistical procedures to identify differential item functioning test items. Educational Measurement: Issue and Practice, 17, 31-44.
Corey, D. M., Dunlap P. W. ve Burke, M. J. (1998). Averaging Correlations: Expected values and bias in combined Pearson rs and Fisher's z transformations. The Journal of General Psychology, 125(3), 245-261, doi: 10.1080/00221309809595548
Demir, S. (2013). An analysis of the differential item function for the items available in the PISA 2009 mathematics literacy sub-test through Mantel-Haenszel, SIBTEST and logistic regression methods (Unpublished mastery dissertation). Abant İzzet Baysal University, Bolu.Turkey
Dogan, N. & Ogretmen, T. (2008). The Comparison of Mantel – Haenszel, Chi‐Square and Logistic Regression Techniques For Identifying Differential Item Education and Science,, 33, 100-112.
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35-66). Hillsdale, NJ: Erlbaum.
Erdem, B. (2015). Investigation of common exams used in transition to high schools in terms of differential item functioning regarding booklet types with different methods (Unpublished mastery dissertation). Hacettepe University, Ankara.Turkey
French, B. F. & Maller, S. J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67(3), 373-393.
Gok, B., Kelecioglu, H. & Dogan N. (2010). The Comparison of Mantel-Haenszel and Logistic Regression Techniques in Determining the Differential Item Functioning∗ Education and Science, 35(156).
Hambleton, R. K., Swaminathan, H. ve Rogers, H. J. (1991). Fundamentals of item response theory. USA, California: Sage.
Han, K. T. (2008). Impact of item parameter drift on test equating and proficiency estimates (Unpublished Doctorate Dissertation). University of Massachusetts, Amherst.
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Erlbaum.
Jodoin, G. M., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection, Applied Measurement in Education, 14(4), 329-349, doi: 10.1207/S15324818AME1404_2
Kelecioglu, H. & Gocer Sakin, S. (2014). Validity from Past to Present. Journal of measurement and Evaluation in Education and Psychology. 5(2), 1-11.
Lee, H. & Geisinger, K. F. (2016). The matching criterion purification for differential item functioning analyses in a large-scale assessment. Educational and Psychological Measurement, 76(1), 141-163.
Miller, T. R. (1992). Practical considerations for conducting studies of differential item functioning in a CAT environment. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
Narayanan, P., & Swaminathan, H. (1996). Identification of items that show non uniform DIF. Applied Psychological Measurement, 20(3), 257–274. doi: https://doi.org/10.1177%2F014662169602000306
Ogretmen, T. (2006). The investigation of psychometric properties of the test of progress in international reading literacy (PIRLS) 2001: The model of Turkey-United States of America (Unpublished doctorate dissertation). Hacettepe University, Ankara.Turkey
Turhan, A. (2006). Multilevel 2PL item response model vertical equating with the presence of differential item functioning. Unpublished doctorate dissertation, The Florida State University.
Turgut, M. F. & Baykul, Y. (2015). Measurement and Evaluation in Education (7. Baskı). Ankara: Pegem Akademi.
Yildırim, A. (2017). Investigation of differential item functioning of the items in PISA 2009 reading literacy test through univariate and multivariate matching dif (Unpublished dostoral dissertation). Ankara Üniversity, Ankara, Turkey
Yurdugul, H. (2003). The Investigation of the student selection and placement examination for secondary education in terms of item bias (Unpublished dostoral dissertation). Hacettepe University, Ankara.Turkey
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistics regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zwick, R., Thayer, D. T., & Wingersky, M. (1995). Effect of Rasch calibration on ability and DIF estimation in computer-adaptive tests. Journal of Educational Measurement, 32, 341–363.
Zwick R. (2000). The assessment of differential item functioning in comput adaptive tests. In van der Linden W. J., Glas G.A. (eds) Computerized Adaptive Testing: Theory and Practice. Springer, Dordrecht.

Bir Testin DMF’li Madde İçeren ve DMF’li Maddeden Arındırılmış Formlarının Karşılaştırılması

Yıl 2019, Cilt: 19 Sayı: 83, 167 - 182, 31.10.2019

Onder Kamil Tulek İbrahim Alper Kose

Öz

Problem Durumu: Bir
ölçme aracında bulunması gereken yapısal niteliklerden en önemlisi olarak kabul
edilen geçerlik, klasik anlamıyla bir ölçme aracının ölçmek istediği özelliği
başka özelliklerle karıştırmadan ölçebilmesi olarak açıklanabilir. Ancak bir testten
elde edilen puanların test ile ölçülmek istenen özellik dışında farklı
değişkenlerden de etkilenmesi her ne kadar istenmeyen bir durum da olsa
pratikte bu durum kaçınılmazdır. Testi alan bireylerin bulunduğu alt grupların
da bu değişkenlerden ne derece etkilendiği önemlidir. Değişkenlerin alt
grupları farklı biçimlerde etkilemesi ise madde yanlılığına sebep olabilmektedir. Yanlılığının ilk
koşulu olan Değişen Madde Fonksiyonunun (DMF’nin) bir maddede bulunması o
maddenin, maddeyi yanıtlayan farklı alt gruplardan herhangi birine ya da
birkaçına avantaj sağlamasına neden olmaktadır. Bir testin madde ya da
maddelerinde DMF’nin bulunabilme ihtimali özellikle sonuçlarına bakarak
bireyler hakkında çeşitli kararların alındığı geniş ölçekli sınavlar için ayrıca
dikkat edilmesini zorunlu hâle getirmiştir. Öyle ki eğitimin birçok alanında,
sıralama ya da seçme amaçlı uygulanan sınavlarda alınan kararlar bireyler için
hayati olabilmekte ve bu sınavların niteliği alınan kararların doğruluğuna,
isabetli ve yerinde olmasına direkt olarak etki etmektedir. Peki bahsi geçen yanlı maddelerin
testten arındırılması bireyler hakkında verilen hayati kararları değiştirmekte
midir? Yanlılık üzerine yapılan birçok çalışmada, SBS, TEOG, ÖSS, PISA, ALES,
KPSS gibi geniş ölçekli sınavlarda DMF içeren maddeler tespit edilmiştir Ancak
geniş ölçekli bu sınavlarda DMF içeren maddelerin testten çıkarılmasının
sonuçlar üzerinde nasıl bir etki oluşturduğuna dair; başka bir ifadeyle DMF’li
maddelerin testten çıkarılmasıyla yeniden belirlenen sonuçlara göre bireylerin
sınavdaki başarı sıralamalarının etkilenip etkilenmediğine dair çalışmalar
sınırlı sayıdadır.

Araştırmanın
Amacı: Bireyler
hakkında hayati kararların alındığı sınavlarda belirli bir gruba avantaj
sağlayan maddelerin testte bulunmasının bireyler arasında eşitsizliğe ve
adaletsizliğe neden olabileceği düşünülmektedir. Bu nedenle bu maddelerin
testten arındırılması gerekli olabilmektedir. Bu düşünceyle gerçekleştirilen
araştırmanın amacı bir testin DMF’li madde içeren ve DMF’li maddeden
arındırılmış formlarından kestirilen yetenek kestirimlerinin farklı madde
sayısı, farklı örneklem büyüklüğü ve farklı DMF oranı koşulları altında
karşılaştırmaktır.

Araştırmanın
Yöntemi: Araştırma
kapsamında araştırmacı tarafından R 3.4.1 paket programı kullanılarak manipüle
edilen değişkenlere göre farklı koşullar altında simülasyon çalışmasıyla veri
üretimi gerçekleştirilmiştir. Manipüle edilen değişkenler düzeylerine göre
örneklem büyüklüğü (n=1000 ve n=2000), madde sayısı (k=40 ve k=60) ve DMF oranı
(d=%5 ve d=%10) olarak belirlenmiştir. Değişkenlerin çaprazlanması sonucunda
sekiz koşulun her birine uygun olacak şekilde DMF’li madde içeren veriler
üretilmiştir. Çeşitli düzeylerde DMF’li maddeler içerecek şekilde verilerinin
üretildiği bir testin öncelikle DMF’li maddeler içeriyorken yetenek
kestirimleri gerçekleştirilmiştir. Testin DMF’li maddeler içeren hâliyle
kestirilen yetenek kestirimlerine θ₁ismi verilerek veriler saklı
tutulmuştur. Ardından bu
testte yer alan DMF’li maddeler testten arındırılarak aynı şekilde yetenekler
kestirilmiştir. Testin DMF’li maddeler
içermeyen hâliyle kestirilen yetenek kestirimleri ise θ₂şeklinde
saklanmıştır. Son olarak da aynı testin θ₁veθ₂adıyla
elde edilmiş olan bu kestirimleri arasındaki ilişkiye bakılmıştır. Bu yetenek
kestirimleri ilişkisine göre bireylerin sıralamalarının farklılaşıp
farklılaşmadığını tespit etmek amaçlandığı için spearman sıra farkları
korelasyon analizi uygulanmıştır.

Araştırmanın
Bulguları: Yöntem
bölümünde özetlenen bir testin DMF’li madde içeren ve DMF’li maddeden
arındırılmış formlarından kestirilen yetenek kestirimlerini (θ₁veθ₂) arasındaki ilişki düzeyine bakmak için gerçekleştirilen spearman sıra farkları korelasyon
analizi sonucunda elde edilen katsayıların 0’a yakın olmasından dolayı yetenek
kestirimleri arasında pozitif ya da negatif yönlü bir ilişki görülmemiştir.
Yetenek kestirimleri arasında ilişki görülmemesi ise bireylerin test
sonuçlarındaki sıralamalarının değiştiğini işaret etmektedir. Başka bir
ifadeyle test DMF’li maddeden arındırıldıktan sonra bireylerin testteki
sıralamaları, bir önceki DMF’li madde içeren test formu sıralamalarına göre
farklılaşmıştır. Bu tespit, çeşitli koşulların araştırıldığı tüm alt
problemlerde benzer şekilde olmuştur. Başka bir ifadeyle madde sayısının 40 ve
60, örneklem büyüklüğünün 1000 ve 2000, DMF oranın %5 ve %10 olarak
çaprazlandığı 8 farklı koşulda da testin DMF’li maddeden arındırılmasının
bireylerin sıralamalarını değiştirdiğini belirlenmiştir.

Araştırmanın
Sonuçları ve Öneriler: Bu
çalışma ile bir testin DMF’li madde içeren ve DMF’li maddeden arındırılmış
formlarından kestirilen yetenek kestirimleri arasında ilişki bulunmadığı, başka
bir ifadeyle DMF’li maddelerin testten çıkarılmasıyla bireylerin başarı
sıralamalarının değiştiği sonucuna ulaşılmıştır. Bir testin DMF’li maddelerden
arındırılmasıyla testi alan bireylerin sıralamalarının farklılaşması o testin
geçerliğini yani özelliğe sahip olanla olmayanı ayırt etme derecesini problemli
hâle getirebilecektir. Öyle ki testte DMF’li madde bulunması testin geçerliğine
önemli bir tehdit oluştururken bu maddelerin testten çıkarılmasıyla bireylerin
sıralamaları değişiyorsa, yapılan arındırma işleminin önemli bir etkisinin
olduğu görülmektedir. Bu durum, gerek ulusal gerekse de uluslar arası düzeyde
bireyler hakkında hayati kararların alındığı, sonuçlarına bakılarak seçme ve
yerleştirme işlemlerinin gerçekleştirildiği sınavların bireyler arasındaki
farklılıkları ölçme derecelerinin sorgulanabilir olduğunu gösterebilmektedir.

Anahtar Kelimeler

anlılık , değişen madde fonksiyonu , yetenek kestirimi , arındırma

Kaynakça

Atalay Kabasakal, K. (2014). The effect of differential item functioning on test equating (Unpublished doctoral dissertation). Hacettepe University, Ankara.Turkey
Atar, B. & Kamata, A. (2011). Comparison of IRT likelihood ratio test and logistic regression DIF detection procedures. Hacettepe University Journal of Education, 41, 36-47.
Bakan Kalaycıoğlu, D. & Kelecioğlu, H. (2011). Item Bias Analysis of the University Entrance Examination Education and Science, 36 (161), 3-12.
Basusta, N. B. (2013). An investigation of item bias in PISA 2006 Science Test in terms of the language and culture (Unpublished mastery dissertation). Hacettepe University, Ankara.Turkey
Camilli, G. & Shepard, L. A. (1994). Methods for identifying biased test items. Hollywood: Sage.
Cepni, Z. (2011). Differential item functioning analysis using SIBTEST, Mantel Haenszel, logistic regression and item Response Theory Methods (Unpublished doctoral dissertation). Hacettepe University, Ankara.Turkey
Chu, K. L. (2002). Equivalent group test equating with the presence of differential item functioning (Unpublished doctorate dissertation). The Florida State University.
Chu, K. L., ve Kamata, A. (2005). Test equating in the presence of dif items. Journal of Applied Measurement. Special Issue: The Multilevel Measurement Model, 6 (3), 342-354.
Clauser, E. B., Mazor, K., ve Hambleton, K. R. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel-Haenszel procedure. Applied Measurement in Education, 6(4), 269–279.
Clauser, B. & Mazor, K. (1998). Using statistical procedures to identify differential item functioning test items. Educational Measurement: Issue and Practice, 17, 31-44.
Corey, D. M., Dunlap P. W. ve Burke, M. J. (1998). Averaging Correlations: Expected values and bias in combined Pearson rs and Fisher's z transformations. The Journal of General Psychology, 125(3), 245-261, doi: 10.1080/00221309809595548
Demir, S. (2013). An analysis of the differential item function for the items available in the PISA 2009 mathematics literacy sub-test through Mantel-Haenszel, SIBTEST and logistic regression methods (Unpublished mastery dissertation). Abant İzzet Baysal University, Bolu.Turkey
Dogan, N. & Ogretmen, T. (2008). The Comparison of Mantel – Haenszel, Chi‐Square and Logistic Regression Techniques For Identifying Differential Item Education and Science,, 33, 100-112.
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35-66). Hillsdale, NJ: Erlbaum.
Erdem, B. (2015). Investigation of common exams used in transition to high schools in terms of differential item functioning regarding booklet types with different methods (Unpublished mastery dissertation). Hacettepe University, Ankara.Turkey
French, B. F. & Maller, S. J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67(3), 373-393.
Gok, B., Kelecioglu, H. & Dogan N. (2010). The Comparison of Mantel-Haenszel and Logistic Regression Techniques in Determining the Differential Item Functioning∗ Education and Science, 35(156).
Hambleton, R. K., Swaminathan, H. ve Rogers, H. J. (1991). Fundamentals of item response theory. USA, California: Sage.
Han, K. T. (2008). Impact of item parameter drift on test equating and proficiency estimates (Unpublished Doctorate Dissertation). University of Massachusetts, Amherst.
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Erlbaum.
Jodoin, G. M., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection, Applied Measurement in Education, 14(4), 329-349, doi: 10.1207/S15324818AME1404_2
Kelecioglu, H. & Gocer Sakin, S. (2014). Validity from Past to Present. Journal of measurement and Evaluation in Education and Psychology. 5(2), 1-11.
Lee, H. & Geisinger, K. F. (2016). The matching criterion purification for differential item functioning analyses in a large-scale assessment. Educational and Psychological Measurement, 76(1), 141-163.
Miller, T. R. (1992). Practical considerations for conducting studies of differential item functioning in a CAT environment. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
Narayanan, P., & Swaminathan, H. (1996). Identification of items that show non uniform DIF. Applied Psychological Measurement, 20(3), 257–274. doi: https://doi.org/10.1177%2F014662169602000306
Ogretmen, T. (2006). The investigation of psychometric properties of the test of progress in international reading literacy (PIRLS) 2001: The model of Turkey-United States of America (Unpublished doctorate dissertation). Hacettepe University, Ankara.Turkey
Turhan, A. (2006). Multilevel 2PL item response model vertical equating with the presence of differential item functioning. Unpublished doctorate dissertation, The Florida State University.
Turgut, M. F. & Baykul, Y. (2015). Measurement and Evaluation in Education (7. Baskı). Ankara: Pegem Akademi.
Yildırim, A. (2017). Investigation of differential item functioning of the items in PISA 2009 reading literacy test through univariate and multivariate matching dif (Unpublished dostoral dissertation). Ankara Üniversity, Ankara, Turkey
Yurdugul, H. (2003). The Investigation of the student selection and placement examination for secondary education in terms of item bias (Unpublished dostoral dissertation). Hacettepe University, Ankara.Turkey
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistics regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zwick, R., Thayer, D. T., & Wingersky, M. (1995). Effect of Rasch calibration on ability and DIF estimation in computer-adaptive tests. Journal of Educational Measurement, 32, 341–363.
Zwick R. (2000). The assessment of differential item functioning in comput adaptive tests. In van der Linden W. J., Glas G.A. (eds) Computerized Adaptive Testing: Theory and Practice. Springer, Dordrecht.

Toplam 33 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Bölüm	Makaleler
Yazarlar	Onder Kamil Tulek Bu kişi benim İbrahim Alper Kose
Yayımlanma Tarihi	31 Ekim 2019
Yayımlandığı Sayı	Yıl 2019 Cilt: 19 Sayı: 83

Kaynak Göster

APA	Tulek, O. K., & Kose, İ. A. (2019). Comparison of Different Forms of a Test with or without Items that Exhibit DIF. Eurasian Journal of Educational Research, 19(83), 167-182.
AMA	Tulek OK, Kose İA. Comparison of Different Forms of a Test with or without Items that Exhibit DIF. Eurasian Journal of Educational Research. Ekim 2019;19(83):167-182.
Chicago	Tulek, Onder Kamil, ve İbrahim Alper Kose. “Comparison of Different Forms of a Test with or without Items that Exhibit DIF”. Eurasian Journal of Educational Research 19, sy. 83 (Ekim 2019): 167-82.
EndNote	Tulek OK, Kose İA (01 Ekim 2019) Comparison of Different Forms of a Test with or without Items that Exhibit DIF. Eurasian Journal of Educational Research 19 83 167–182.
IEEE	O. K. Tulek ve İ. A. Kose, “Comparison of Different Forms of a Test with or without Items that Exhibit DIF”, Eurasian Journal of Educational Research, c. 19, sy. 83, ss. 167–182, 2019.
ISNAD	Tulek, Onder Kamil - Kose, İbrahim Alper. “Comparison of Different Forms of a Test with or without Items that Exhibit DIF”. Eurasian Journal of Educational Research 19/83 (Ekim2019), 167-182.
JAMA	Tulek OK, Kose İA. Comparison of Different Forms of a Test with or without Items that Exhibit DIF. Eurasian Journal of Educational Research. 2019;19:167–182.
MLA	Tulek, Onder Kamil ve İbrahim Alper Kose. “Comparison of Different Forms of a Test with or without Items that Exhibit DIF”. Eurasian Journal of Educational Research, c. 19, sy. 83, 2019, ss. 167-82.
Vancouver	Tulek OK, Kose İA. Comparison of Different Forms of a Test with or without Items that Exhibit DIF. Eurasian Journal of Educational Research. 2019;19(83):167-82.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin