Angoff’un Dönüştürülmüş Madde Güçlükleri Yöntemi’nin Değişen Madde Fonksiyonu Belirlemede Kullanımı

Metehan Güngör; Ergul Demir

Araştırma Makalesi

The Use of Angoff’s Transformed Item Difficulties Method in Detecting Differential Item Functioning

Yıl 2025, Cilt: 25 Sayı: 1, 398 - 424

Öz

In this study, the Angoff’s Transformed Item Difficulties method of detecting differential item functioning (DIF) is introduced with its important details and criticized aspects. The strengths and weaknesses of this Method are discussed. Among the pioneering methods developed for DIF detection in tests consisting of single-dimension items scored on a 0-1 scale, there are some criticisms in the literature regarding Angoff's Method. It is suggested that this Method may not be used due to criticisms such as its exclusive focus on item difficulties and the possibility of viewing real differences between groups as item bias. On the other hand, the Method has practical advantages such as ease of application, the possibility of graphical interpretation, and usability in relatively small samples. Within the scope of this study, the algorithm, general characteristics, strengths, and limitations of Angoff's Method are discussed. In addition, the step-by-step command lines for using Angoff's Method in DIF analysis with the "difR" package in the R programming language are explained. The discussions conducted indicate that using Angoff's Method for DIF detection comes with some important limitations that need to be taken into account. However, the Method's ease of application and visualization capabilities can be beneficial in explaining the fundamentals of bias and DIF concepts. This Method can provide more meaningful results when the test score averages of groups are close or equal in terms of the measured characteristics. The Method's limited use can be considered for the purpose of identifying potentially biased items in a test.

Anahtar Kelimeler

Differential Item Functioning, Transformed Item Difficulties, Difr, Delta Plot, DIF Analysis.

Kaynakça

Aituariagbon, K. E., & Osarumwense, H. J. (2022). Non-parametric method of detecting differential item functioning in Senior School Certificate Examination (SSCE) 2019 Economics multiple choice items. Kashere Journal of Education, 3(1), 146-158. https://dx.doi.org/10.4314/kje.v3i1.19
American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME]. (1999). Standards for educational and psychological testing. American Educational Research Association.
American Psychological Association [APA] (1988). Code of fair testing practices in education. Washington, DC: Author.
Anastasi, A., & Urbina, S. (1997). Psychological testing (9th Ed.). Prentice-Hall, Inc.
Angoff, W. H. (1972). A technique for the investigation of cultural differences [Paper presentation]. American Psychological Association Annual Meeting, Honolulu.
Angoff, W. H. (1975). The investigation of test bias in the absence of an outside criterion [Paper presentation]. NIE Conference on Test Bias, Washington, D.C.
Angoff, W. H. (1982). Use of difficulty and discrimination indices for detecting item bias. In R. A. Beck (Ed.), Handbook of methods for detecting item bias (pp. 96-116). Johns Hopkins University Press.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 3-23). Lawrence Erlbaum Associates.
Angoff, W. H., & Cook, L. L. (1988). Equating the scores of the prueba de aptitud académica and the scholastic aptitude test (Report No. 88-3). ETS Research Report Series. https://doi.org/10.1002/j.2330-8516.1988.tb00259.x
Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10(2), 95-105. https://doi.org/10.1111/j.1745-3984.1973.tb00787.x
Angoff, W. H., & Modu, C. C. (1973). Equating the scales of the Prueba de Aptitud Académica and the Scholastic Aptitude Test (Report No. CEEB-RR-3). College Entrance Examination Board.
Bezruczko, N., Schulz, E. M., Reynolds, A. J., Perlman, C. L. & Rice, W. K. (1989). The stability of four methods for estimating item bias (Report No. ED-392-823). Department of Research and Evaluation, Chicago Public Schools.
Binet, A., & Simon, T. (1905). New methods for the diagnosis of the intellectual level of subnormals. In H. H. Goddard (Ed.), Development of intelligence in children (the Binet-Simon Scale). Williams & Wilkins.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Sage Publications.
Choi, S. W., Gibbons, L. E., & Crane, P. K. (2011). Lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and monte carlo simulations. Journal of Statistical Software, 39(8), 1-30. https://doi.org/10.18637/jss.v039.i08
Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31-44. https://doi.org/10.1111/j.1745-3992.1998.tb00619.x
de Ayala, R. J. (2009). The theory and practice of item response theory. The Guilford Press.
de Ruiter, L. E., & Bers, M. U. (2022). The Coding Stages Assessment: Development and validation of an instrument for assessing young children’s proficiency in the ScratchJr programming language. Computer Science Education, 32(4), 1-30. https://doi.org/10.1080/08993408.2021.1956216
Devine, P. J., & Raju, N. S. (1982). Extent of overlap among four item bias methods. Educational and Psychological Measurement, 42(4), 1049-1066. https://doi.org/10.1177/001316448204200412
Dodeen, H., & Johanson, G. A. (2003). An analysis of sex-related differential item functioning in attitude assessment. Assessment & Evaluation in Higher Education, 28(2), 129-134. https://doi.org/10.1080/02602930301667
Dorans, N. J. (1989). Two new approaches to assessing differential item functioning: Standardization and the Mantel-Haenszel Method. Applied Measurement in Education, 2(3), 217-233. https://doi.org/10.1207/s15324818ame0203_3
Dorans, N. J., & Kulick, E. (1983). Assessing unexpected differential item performance of female candidates on SAT and TSWE forms administered in December 1977: An application of the standardization approach (Report No. RR-83-9). ETS Research Report Series. https://doi.org/10.1002/j.2330-8516.1983.tb00009.x
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23(4), 355-368. https://doi.org/10.1111/j.1745-3984.1986.tb00255.x
Dzul-Garcia, C., & Atar, B. (2020). Investigation of possible item bias on PISA 2015 science items across Chile, Costa Rica and Mexico. Culture and Education, 32(3), 470-505. https://doi.org/10.1080/11356405.2020.1785158
Elosua, P., & Wells, C. S. (2013). Detecting DIF in polytomous items using MACS, IRT and ordinal logistic regression. Psicológica, 34(2), 327-342.
Facon, B., & Nuchadee, M-L. (2010). An item analysis of Raven’s Colored Progressive Matrices among participants with Down syndrome. Research in Developmental Disabilities, 31(1), 243-249. https://doi.org/10.1016/j.ridd.2009.09.011
Farcomeni, A., Pittau, M. G., Viviani, S., & Zelli, R. (2022). A European measurement scale for material deprivation. Research Square, 1-32. https://doi.org/10.21203/rs.3.rs-2250804/v1
Finch, W. H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29(4), 278-295. https://doi.org/10.1177/0146621605275728
Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education (8th Ed.). Mc-Graw Hill.
Gamerman, D., Gonçalves, F. B., & Soares, T. M. (2018). Differential item functioning. In W. J. van der Linden (Ed.), Handbook of item response theory (pp. 67-84). CRC Press.
Gao, Y., & Zhu, W. (2009). Identifying culturally sensitive physical activities using DIF analysis. Medicine & Science in Sports & Exercise, 41(5), 416-417. http://dx.doi.org/10.1249/01.MSS.0000355818.07045.09
Gelin, M. N., Carleton, B. C., Smith, A. A., & Zumbo, B. D. (2004). The dimensionality and gender differential item functioning of the mini asthma quality of life questionnaire (MINIAQLQ). Social Indicators Research, 68(1), 91-105. https://doi.org/10.1023/B:SOCI.0000025580.54702.90
Gómez-Benito, J., Sireci, S., Padilla, J.-L., Hidalgo, M. D., & Benítez, I. (2018). Differential item functioning: Beyond validity evidence based on internal structure. Psicothema, 30(1), 104-109. http://doi.org/10.7334/psicothema2017.183
Gould, S. J. (1981). The mismeasure of man. W. W. Norton & Company. Hauger, J. B., & Sireci, S. G. (2008). Detecting differential item functioning across examinees tested in their dominant language and examinees tested in a second language. International Journal of Testing, 8(3), 237-250. http://dx.doi.org/10.1080/15305050802262183
Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Lawrence Erlbaum Associates.
Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item functioning: Theory and practice. Erlbaum Publishers.
Hunter, J. E. (1975). A critical analysis of the use of item means and item-test correlations to determine the presence or absence of content bias in achievement test items [Paper presentation]. National Institute of Education Conference on Test Bias, Annapolis, MD.
Ironson, G. H., & Subkoviak, M. J. (1979). A comparison of several methods of assessing item bias. Journal of Educational Measurement, 16(4), 209-225. https://doi.org/10.1111/j.1745-3984.1979.tb00103.x
Iwata, N., Turner, R. J., & Lloyd, D. A. (2002). Race/ethnicity and depressive symptoms in community-dwelling young adults: A differential item functioning analysis. Psychiatry Research, 110(3), 281-289. https://doi.org/10.1016/S0165-1781(02)00102-6
Jensen, A. R. (1973). Educability and group differences. Basic Books.
Jensen, A. R. (1976). Test bias and construct validity. The Phi Delta Kappan, 58(4), 340-346.
Kelderman, H. (1989). Item bias detection using loglinear IRT. Psychometrika, 54(4), 681-697. https://doi.org/10.1007/BF02296403
Korkmaz, M. (2006). Test ve ölçek geliştirmede yeni yaklaşımlar: Madde cevap kuramı kapsamında madde işlevsel farklılık (madde yanlılık) yöntemleri. Türk Psikoloji Yazıları, 9(18), 63-80.
Li, Z., & Zumbo, B. D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score data. Psicológica, 30(2), 343-370.
Lord, F. M. (1977). A study of item bias using item characteristic curve theory. In N. H. Poortinga (Ed.), Basic problems in cross-cultural psychology (pp. 19-29). Swets and Zeitlinger.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates, Inc.
Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847-862. https://doi.org/10.3758/BRM.42.3.847
Magis, D., & Facon, B. (2012). Angoff's delta method revisited: Improving DIF detection under small samples. British Journal of Mathematical and Statistical Psychology, 65(2), 302-321. https://doi.org/10.1111/j.2044-8317.2011.02025.x
Magis, D., & Facon, B. (2014). deltaPlotR: An R package for differential item functioning analysis with Angoff’s Delta Plot. Journal of Statistical Software, 59(1), 1-19. https://doi.org/10.18637/jss.v059.c01
Mellenberg, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127-143. https://doi.org/10.1016/0883-0355(89)90002-5
Muñiz, J., Hambleton, R. K., & Xing, D. (2001). Small sample studies to detect flaws in item translations. International Journal of Testing, 1(2), 115-135. https://doi.org/10.1207/S15327574IJT0102_2
Oosterhof, A. C., Atash, M. N., & Lassiter, K. L. (1984). Facilitating identification of item bias through use of delta plots. Educational and Psychological Measurement, 44(3), 619-627. https://doi.org/10.1177/0013164484443009
Osterlind, S. J. (1983). Test item bias. Sage Publications.
Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd Ed.). Sage Publications.
Ozarkan, H. B., Kucam, E. ve Demir, E. (2017). Merkezi Ortak Sınav Matematik alt testinde değişen madde fonksiyonunun görme engeli durumuna göre incelenmesi. Curr Res Educ, 3(1), 24-34.
Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26, pp. 125-167). Elsevier.
Pine, S. M. (1977). Applications of item response theory to the problem of test bias. In D. J. Weiss (Ed.), Applications of computerized adaptive testing (pp. 37-43). University of Minnesota, Psychometric Methods Program.
R Development Core Team. (2023). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing.
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495-502. https://doi.org/10.1007/BF02294403
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197- 207. https://doi.org/10.1177/014662169001400208
Raju, N. S., Drasgow, F., & Slinde, J. A. (1993). An empirical comparison of the area methods, Lord’S Chi-Square Test, and the Mantel-Haenszel technique for assessing differential item functioning. Educational and Psychological Measurement, 53(2), 301-314. https://doi.org/10.1177/0013164493053002001
Revelle, W. (2023). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University, Evanston, Illinois. R package version 2.3.6, https://CRAN.R-project.org/package=psych.
Robin, F., Sireci, S. G., & Hambleton, R. K. (2003). Evaluating the equivalence of different language versions of a credentialing exam. International Journal of Testing, 3(1), 1-20, https://doi.org/10.1207/S15327574IJT0301_1
Rudner, L. M. (1978). Using standard tests with the hearing impaired: The problems of item bias. Volta Review, 80, 31-40.
Scarr, S., & Weinberg, R. A. (1976). IQ test performance of Black children adopted by White families. American Psychologist, 31(10), 726-739. https://doi.org/10.1037/0003-066X.31.10.726
Scheuneman, J. (1979). A method of assessing bias in test items. Journal of Educational Measurement, 16(3), 143-152. https://doi.org/10.1111/j.1745-3984.1979.tb00095.x
Seong, T-J., & Subkoviak, M. J. (1987). A comparative study of recently proposed item bias detection methods [Paper presentation]. Annual Meeting of the National Council on Measurement in Education, Washington, D.C.
Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159-194. https://doi.org/10.1007/BF02294572
Shepard, L. A., Camilli, G., & Averill, M. (1981). Comparison of procedures for detecting test-item bias with both internal and external ability criteria. Journal of Educational Statistics, 6(4), 317-375. https://doi.org/10.3102/10769986006004317
Shepard, L. A., Camilli, G., & Williams, D. A. (1985). Validity of approximation techniques for detecting item bias. Journal of Educational Measurement, 22(2), 77-105. https://doi.org/10.1111/j.1745-3984.1985.tb01050.x
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
Tat, O. ve Doğan, N. (2018). Uluslararası Bilgisayar ve Bilgi Teknolojileri Okuryazarlığı Testinin madde-birey dağılımı ve değişen madde fonksiyonu yönünden incelenmesi. Gazi Üniversitesi Gazi Eğitim Fakültesi Dergisi, 38(3), 1207-1231. https://doi.org/10.17152/gefad.321630
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 147-172). Lawrence Erlbaum Associates, Inc.
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67-113). Lawrence Erlbaum Associates, Inc.
Thurstone, L. L. (1925). A method of scaling psychological and educational tests. Journal of Educational Psychology, 16(7), 433-451. https://doi.org/10.1037/h0073357
van der Flier, H., Mellenberg, G. J., Adér, H. J., & Wijn, M. (1984). An iterative item bias detection method. Journal of Educational Measurement, 21(2), 131-145. https://doi.org/10.1111/j.1745-3984.1984.tb00225.x
Van Vo, D., & Csapó, B. (2023). Effects of multimedia on psychometric characteristics of cognitive tests: A comparison between technology-based and paper-based modalities. Studies in Educational Evaluation, 77, 1-12. https://doi.org/10.1016/j.stueduc.2023.101254
Wainer, H. (1993). Model-based standardized measurement of an item’s differential impact. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 123-135). Lawrence Erlbaum Associates.
Wainer, H., Bradlow, E., & Wang, X. (2010). Detecting DIF: Many paths to salvation. Journal of Educational and Behavioral Statistics, 35(4), 489-493. https://doi.org/10.3102/1076998610376624
Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-type (Ordinal) Item Scores. Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233. http://dx.doi.org/10.1080/15434300701375832
Zwick, R., & Ercikan, K. (1989). Analysis of differential item functioning in the NAEP history assessment. Journal of Educational Measurement, 26(1), 55-66. https://doi.org/10.1111/j.1745-3984.1989.tb00318.x

Angoff’un Dönüştürülmüş Madde Güçlükleri Yöntemi’nin Değişen Madde Fonksiyonu Belirlemede Kullanımı

Yıl 2025, Cilt: 25 Sayı: 1, 398 - 424

Metehan Güngör , Ergul Demir

Öz

Bu çalışmada değişen madde fonksiyonu (DMF) belirleme yöntemlerinden Angoff’un Dönüştürülmüş Madde Güçlükleri (Transformed Item Difficulties) Yöntemi önemli ayrıntıları ve eleştirilen yönleriyle tanıtılmakta, yöntemin güçlü ve zayıf yönleri tartışılmaktadır. Tek boyutlu ve 0-1 şeklinde puanlanan maddelerden oluşan testlerde DMF belirleme çalışmalarında kullanılmak üzere geliştirilmiş öncü yöntemlerden biri olan Angoff’un yöntemine yönelik olarak alan yazında bazı eleştiriler bulunmaktadır. Yöntemin yalnızca madde güçlüklerine odaklı olması, gruplar arasındaki gerçek farkın değişen madde fonksiyonu olarak görülme olasılığı bulunması gibi nedenlerle bu yöntemin kullanılmaması önerilebilmektedir. Diğer taraftan yöntem, uygulama kolaylığı, grafiksel yorumlama imkânı tanıması ve görece küçük örneklemlerde de kullanılabilmesi gibi pratik avantajlara sahiptir. Bu kapsamda bu çalışmada Angoff’un yönteminin algoritması, genel karakteristiği, güçlü yönleri ve sınırlılıkları tartışılmıştır. Ayrıca, R programlama dili üzerinde kullanılabilen “difR” paketi ile DMF analizinde Angoff’un yönteminin nasıl kullanılacağı adım adım satır komutları yardımıyla açıklanmıştır. Yürütülen tartışmalar göstermektedir ki, Angoff’un Yöntemi ile DMF belirleme, dikkat edilmesi gereken bazı önemli sınırlılıklar içermektedir. Bununla birlikte yöntemin uygulama kolaylığı ve görselleştirme imkânı tanıyor oluşu, yanlılık ve DMF kavramlarının temellerinin açıklanması açısından yararlı olabilir. Bu yöntem, grupların ölçülen özellik bakımından test puanı ortalamalarının yakın ya da eşit olması durumunda daha anlamlı sonuçlar verebilmektedir. Yöntemin, bir testteki potansiyel olarak yanlı maddelerin belirlenmesinde bir öngörü sağlaması amacıyla daha sınırlı kullanımı düşünülebilir.

Anahtar Kelimeler

Değişen Madde Fonksiyonu, Dönüştürülmüş Madde Güçlükleri, Difr, Delta Yöntemi, DMF Analizi

Kaynakça

Aituariagbon, K. E., & Osarumwense, H. J. (2022). Non-parametric method of detecting differential item functioning in Senior School Certificate Examination (SSCE) 2019 Economics multiple choice items. Kashere Journal of Education, 3(1), 146-158. https://dx.doi.org/10.4314/kje.v3i1.19
American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME]. (1999). Standards for educational and psychological testing. American Educational Research Association.
American Psychological Association [APA] (1988). Code of fair testing practices in education. Washington, DC: Author.
Anastasi, A., & Urbina, S. (1997). Psychological testing (9th Ed.). Prentice-Hall, Inc.
Angoff, W. H. (1972). A technique for the investigation of cultural differences [Paper presentation]. American Psychological Association Annual Meeting, Honolulu.
Angoff, W. H. (1975). The investigation of test bias in the absence of an outside criterion [Paper presentation]. NIE Conference on Test Bias, Washington, D.C.
Angoff, W. H. (1982). Use of difficulty and discrimination indices for detecting item bias. In R. A. Beck (Ed.), Handbook of methods for detecting item bias (pp. 96-116). Johns Hopkins University Press.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 3-23). Lawrence Erlbaum Associates.
Angoff, W. H., & Cook, L. L. (1988). Equating the scores of the prueba de aptitud académica and the scholastic aptitude test (Report No. 88-3). ETS Research Report Series. https://doi.org/10.1002/j.2330-8516.1988.tb00259.x
Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10(2), 95-105. https://doi.org/10.1111/j.1745-3984.1973.tb00787.x
Angoff, W. H., & Modu, C. C. (1973). Equating the scales of the Prueba de Aptitud Académica and the Scholastic Aptitude Test (Report No. CEEB-RR-3). College Entrance Examination Board.
Bezruczko, N., Schulz, E. M., Reynolds, A. J., Perlman, C. L. & Rice, W. K. (1989). The stability of four methods for estimating item bias (Report No. ED-392-823). Department of Research and Evaluation, Chicago Public Schools.
Binet, A., & Simon, T. (1905). New methods for the diagnosis of the intellectual level of subnormals. In H. H. Goddard (Ed.), Development of intelligence in children (the Binet-Simon Scale). Williams & Wilkins.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Sage Publications.
Choi, S. W., Gibbons, L. E., & Crane, P. K. (2011). Lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and monte carlo simulations. Journal of Statistical Software, 39(8), 1-30. https://doi.org/10.18637/jss.v039.i08
Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31-44. https://doi.org/10.1111/j.1745-3992.1998.tb00619.x
de Ayala, R. J. (2009). The theory and practice of item response theory. The Guilford Press.
de Ruiter, L. E., & Bers, M. U. (2022). The Coding Stages Assessment: Development and validation of an instrument for assessing young children’s proficiency in the ScratchJr programming language. Computer Science Education, 32(4), 1-30. https://doi.org/10.1080/08993408.2021.1956216
Devine, P. J., & Raju, N. S. (1982). Extent of overlap among four item bias methods. Educational and Psychological Measurement, 42(4), 1049-1066. https://doi.org/10.1177/001316448204200412
Dodeen, H., & Johanson, G. A. (2003). An analysis of sex-related differential item functioning in attitude assessment. Assessment & Evaluation in Higher Education, 28(2), 129-134. https://doi.org/10.1080/02602930301667
Dorans, N. J. (1989). Two new approaches to assessing differential item functioning: Standardization and the Mantel-Haenszel Method. Applied Measurement in Education, 2(3), 217-233. https://doi.org/10.1207/s15324818ame0203_3
Dorans, N. J., & Kulick, E. (1983). Assessing unexpected differential item performance of female candidates on SAT and TSWE forms administered in December 1977: An application of the standardization approach (Report No. RR-83-9). ETS Research Report Series. https://doi.org/10.1002/j.2330-8516.1983.tb00009.x
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23(4), 355-368. https://doi.org/10.1111/j.1745-3984.1986.tb00255.x
Dzul-Garcia, C., & Atar, B. (2020). Investigation of possible item bias on PISA 2015 science items across Chile, Costa Rica and Mexico. Culture and Education, 32(3), 470-505. https://doi.org/10.1080/11356405.2020.1785158
Elosua, P., & Wells, C. S. (2013). Detecting DIF in polytomous items using MACS, IRT and ordinal logistic regression. Psicológica, 34(2), 327-342.
Facon, B., & Nuchadee, M-L. (2010). An item analysis of Raven’s Colored Progressive Matrices among participants with Down syndrome. Research in Developmental Disabilities, 31(1), 243-249. https://doi.org/10.1016/j.ridd.2009.09.011
Farcomeni, A., Pittau, M. G., Viviani, S., & Zelli, R. (2022). A European measurement scale for material deprivation. Research Square, 1-32. https://doi.org/10.21203/rs.3.rs-2250804/v1
Finch, W. H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29(4), 278-295. https://doi.org/10.1177/0146621605275728
Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education (8th Ed.). Mc-Graw Hill.
Gamerman, D., Gonçalves, F. B., & Soares, T. M. (2018). Differential item functioning. In W. J. van der Linden (Ed.), Handbook of item response theory (pp. 67-84). CRC Press.
Gao, Y., & Zhu, W. (2009). Identifying culturally sensitive physical activities using DIF analysis. Medicine & Science in Sports & Exercise, 41(5), 416-417. http://dx.doi.org/10.1249/01.MSS.0000355818.07045.09
Gelin, M. N., Carleton, B. C., Smith, A. A., & Zumbo, B. D. (2004). The dimensionality and gender differential item functioning of the mini asthma quality of life questionnaire (MINIAQLQ). Social Indicators Research, 68(1), 91-105. https://doi.org/10.1023/B:SOCI.0000025580.54702.90
Gómez-Benito, J., Sireci, S., Padilla, J.-L., Hidalgo, M. D., & Benítez, I. (2018). Differential item functioning: Beyond validity evidence based on internal structure. Psicothema, 30(1), 104-109. http://doi.org/10.7334/psicothema2017.183
Gould, S. J. (1981). The mismeasure of man. W. W. Norton & Company. Hauger, J. B., & Sireci, S. G. (2008). Detecting differential item functioning across examinees tested in their dominant language and examinees tested in a second language. International Journal of Testing, 8(3), 237-250. http://dx.doi.org/10.1080/15305050802262183
Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Lawrence Erlbaum Associates.
Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item functioning: Theory and practice. Erlbaum Publishers.
Hunter, J. E. (1975). A critical analysis of the use of item means and item-test correlations to determine the presence or absence of content bias in achievement test items [Paper presentation]. National Institute of Education Conference on Test Bias, Annapolis, MD.
Ironson, G. H., & Subkoviak, M. J. (1979). A comparison of several methods of assessing item bias. Journal of Educational Measurement, 16(4), 209-225. https://doi.org/10.1111/j.1745-3984.1979.tb00103.x
Iwata, N., Turner, R. J., & Lloyd, D. A. (2002). Race/ethnicity and depressive symptoms in community-dwelling young adults: A differential item functioning analysis. Psychiatry Research, 110(3), 281-289. https://doi.org/10.1016/S0165-1781(02)00102-6
Jensen, A. R. (1973). Educability and group differences. Basic Books.
Jensen, A. R. (1976). Test bias and construct validity. The Phi Delta Kappan, 58(4), 340-346.
Kelderman, H. (1989). Item bias detection using loglinear IRT. Psychometrika, 54(4), 681-697. https://doi.org/10.1007/BF02296403
Korkmaz, M. (2006). Test ve ölçek geliştirmede yeni yaklaşımlar: Madde cevap kuramı kapsamında madde işlevsel farklılık (madde yanlılık) yöntemleri. Türk Psikoloji Yazıları, 9(18), 63-80.
Li, Z., & Zumbo, B. D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score data. Psicológica, 30(2), 343-370.
Lord, F. M. (1977). A study of item bias using item characteristic curve theory. In N. H. Poortinga (Ed.), Basic problems in cross-cultural psychology (pp. 19-29). Swets and Zeitlinger.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates, Inc.
Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847-862. https://doi.org/10.3758/BRM.42.3.847
Magis, D., & Facon, B. (2012). Angoff's delta method revisited: Improving DIF detection under small samples. British Journal of Mathematical and Statistical Psychology, 65(2), 302-321. https://doi.org/10.1111/j.2044-8317.2011.02025.x
Magis, D., & Facon, B. (2014). deltaPlotR: An R package for differential item functioning analysis with Angoff’s Delta Plot. Journal of Statistical Software, 59(1), 1-19. https://doi.org/10.18637/jss.v059.c01
Mellenberg, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127-143. https://doi.org/10.1016/0883-0355(89)90002-5
Muñiz, J., Hambleton, R. K., & Xing, D. (2001). Small sample studies to detect flaws in item translations. International Journal of Testing, 1(2), 115-135. https://doi.org/10.1207/S15327574IJT0102_2
Oosterhof, A. C., Atash, M. N., & Lassiter, K. L. (1984). Facilitating identification of item bias through use of delta plots. Educational and Psychological Measurement, 44(3), 619-627. https://doi.org/10.1177/0013164484443009
Osterlind, S. J. (1983). Test item bias. Sage Publications.
Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd Ed.). Sage Publications.
Ozarkan, H. B., Kucam, E. ve Demir, E. (2017). Merkezi Ortak Sınav Matematik alt testinde değişen madde fonksiyonunun görme engeli durumuna göre incelenmesi. Curr Res Educ, 3(1), 24-34.
Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26, pp. 125-167). Elsevier.
Pine, S. M. (1977). Applications of item response theory to the problem of test bias. In D. J. Weiss (Ed.), Applications of computerized adaptive testing (pp. 37-43). University of Minnesota, Psychometric Methods Program.
R Development Core Team. (2023). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing.
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495-502. https://doi.org/10.1007/BF02294403
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197- 207. https://doi.org/10.1177/014662169001400208
Raju, N. S., Drasgow, F., & Slinde, J. A. (1993). An empirical comparison of the area methods, Lord’S Chi-Square Test, and the Mantel-Haenszel technique for assessing differential item functioning. Educational and Psychological Measurement, 53(2), 301-314. https://doi.org/10.1177/0013164493053002001
Revelle, W. (2023). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University, Evanston, Illinois. R package version 2.3.6, https://CRAN.R-project.org/package=psych.
Robin, F., Sireci, S. G., & Hambleton, R. K. (2003). Evaluating the equivalence of different language versions of a credentialing exam. International Journal of Testing, 3(1), 1-20, https://doi.org/10.1207/S15327574IJT0301_1
Rudner, L. M. (1978). Using standard tests with the hearing impaired: The problems of item bias. Volta Review, 80, 31-40.
Scarr, S., & Weinberg, R. A. (1976). IQ test performance of Black children adopted by White families. American Psychologist, 31(10), 726-739. https://doi.org/10.1037/0003-066X.31.10.726
Scheuneman, J. (1979). A method of assessing bias in test items. Journal of Educational Measurement, 16(3), 143-152. https://doi.org/10.1111/j.1745-3984.1979.tb00095.x
Seong, T-J., & Subkoviak, M. J. (1987). A comparative study of recently proposed item bias detection methods [Paper presentation]. Annual Meeting of the National Council on Measurement in Education, Washington, D.C.
Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159-194. https://doi.org/10.1007/BF02294572
Shepard, L. A., Camilli, G., & Averill, M. (1981). Comparison of procedures for detecting test-item bias with both internal and external ability criteria. Journal of Educational Statistics, 6(4), 317-375. https://doi.org/10.3102/10769986006004317
Shepard, L. A., Camilli, G., & Williams, D. A. (1985). Validity of approximation techniques for detecting item bias. Journal of Educational Measurement, 22(2), 77-105. https://doi.org/10.1111/j.1745-3984.1985.tb01050.x
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
Tat, O. ve Doğan, N. (2018). Uluslararası Bilgisayar ve Bilgi Teknolojileri Okuryazarlığı Testinin madde-birey dağılımı ve değişen madde fonksiyonu yönünden incelenmesi. Gazi Üniversitesi Gazi Eğitim Fakültesi Dergisi, 38(3), 1207-1231. https://doi.org/10.17152/gefad.321630
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 147-172). Lawrence Erlbaum Associates, Inc.
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67-113). Lawrence Erlbaum Associates, Inc.
Thurstone, L. L. (1925). A method of scaling psychological and educational tests. Journal of Educational Psychology, 16(7), 433-451. https://doi.org/10.1037/h0073357
van der Flier, H., Mellenberg, G. J., Adér, H. J., & Wijn, M. (1984). An iterative item bias detection method. Journal of Educational Measurement, 21(2), 131-145. https://doi.org/10.1111/j.1745-3984.1984.tb00225.x
Van Vo, D., & Csapó, B. (2023). Effects of multimedia on psychometric characteristics of cognitive tests: A comparison between technology-based and paper-based modalities. Studies in Educational Evaluation, 77, 1-12. https://doi.org/10.1016/j.stueduc.2023.101254
Wainer, H. (1993). Model-based standardized measurement of an item’s differential impact. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 123-135). Lawrence Erlbaum Associates.
Wainer, H., Bradlow, E., & Wang, X. (2010). Detecting DIF: Many paths to salvation. Journal of Educational and Behavioral Statistics, 35(4), 489-493. https://doi.org/10.3102/1076998610376624
Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-type (Ordinal) Item Scores. Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233. http://dx.doi.org/10.1080/15434300701375832
Zwick, R., & Ercikan, K. (1989). Analysis of differential item functioning in the NAEP history assessment. Journal of Educational Measurement, 26(1), 55-66. https://doi.org/10.1111/j.1745-3984.1989.tb00318.x

Toplam 82 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Eğitimde Program Değerlendirme
Bölüm	Makaleler
Yazarlar	Metehan Güngör 0000-0003-4409-2229 Ergul Demir 0000-0002-3708-8013
Erken Görünüm Tarihi	9 Mart 2025
Yayımlanma Tarihi
Gönderilme Tarihi	22 Şubat 2024
Kabul Tarihi	19 Aralık 2024
Yayımlandığı Sayı	Yıl 2025 Cilt: 25 Sayı: 1

Kaynak Göster

APA	Güngör, M., & Demir, E. (2025). Angoff’un Dönüştürülmüş Madde Güçlükleri Yöntemi’nin Değişen Madde Fonksiyonu Belirlemede Kullanımı. Abant İzzet Baysal Üniversitesi Eğitim Fakültesi Dergisi, 25(1), 398-424.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin