Araştırma Makalesi
BibTex RIS Kaynak Göster

The Use of Angoff’s Transformed Item Difficulties Method in Detecting Differential Item Functioning

Yıl 2025, Cilt: 25 Sayı: 1, 398 - 424

Öz

In this study, the Angoff’s Transformed Item Difficulties method of detecting differential item functioning (DIF) is introduced with its important details and criticized aspects. The strengths and weaknesses of this Method are discussed. Among the pioneering methods developed for DIF detection in tests consisting of single-dimension items scored on a 0-1 scale, there are some criticisms in the literature regarding Angoff's Method. It is suggested that this Method may not be used due to criticisms such as its exclusive focus on item difficulties and the possibility of viewing real differences between groups as item bias. On the other hand, the Method has practical advantages such as ease of application, the possibility of graphical interpretation, and usability in relatively small samples. Within the scope of this study, the algorithm, general characteristics, strengths, and limitations of Angoff's Method are discussed. In addition, the step-by-step command lines for using Angoff's Method in DIF analysis with the "difR" package in the R programming language are explained. The discussions conducted indicate that using Angoff's Method for DIF detection comes with some important limitations that need to be taken into account. However, the Method's ease of application and visualization capabilities can be beneficial in explaining the fundamentals of bias and DIF concepts. This Method can provide more meaningful results when the test score averages of groups are close or equal in terms of the measured characteristics. The Method's limited use can be considered for the purpose of identifying potentially biased items in a test.

Kaynakça

  • Aituariagbon, K. E., & Osarumwense, H. J. (2022). Non-parametric method of detecting differential item functioning in Senior School Certificate Examination (SSCE) 2019 Economics multiple choice items. Kashere Journal of Education, 3(1), 146-158. https://dx.doi.org/10.4314/kje.v3i1.19
  • American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME]. (1999). Standards for educational and psychological testing. American Educational Research Association.
  • American Psychological Association [APA] (1988). Code of fair testing practices in education. Washington, DC: Author.
  • Anastasi, A., & Urbina, S. (1997). Psychological testing (9th Ed.). Prentice-Hall, Inc.
  • Angoff, W. H. (1972). A technique for the investigation of cultural differences [Paper presentation]. American Psychological Association Annual Meeting, Honolulu.
  • Angoff, W. H. (1975). The investigation of test bias in the absence of an outside criterion [Paper presentation]. NIE Conference on Test Bias, Washington, D.C.
  • Angoff, W. H. (1982). Use of difficulty and discrimination indices for detecting item bias. In R. A. Beck (Ed.), Handbook of methods for detecting item bias (pp. 96-116). Johns Hopkins University Press.
  • Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 3-23). Lawrence Erlbaum Associates.
  • Angoff, W. H., & Cook, L. L. (1988). Equating the scores of the prueba de aptitud académica and the scholastic aptitude test (Report No. 88-3). ETS Research Report Series. https://doi.org/10.1002/j.2330-8516.1988.tb00259.x
  • Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10(2), 95-105. https://doi.org/10.1111/j.1745-3984.1973.tb00787.x
  • Angoff, W. H., & Modu, C. C. (1973). Equating the scales of the Prueba de Aptitud Académica and the Scholastic Aptitude Test (Report No. CEEB-RR-3). College Entrance Examination Board.
  • Bezruczko, N., Schulz, E. M., Reynolds, A. J., Perlman, C. L. & Rice, W. K. (1989). The stability of four methods for estimating item bias (Report No. ED-392-823). Department of Research and Evaluation, Chicago Public Schools.
  • Binet, A., & Simon, T. (1905). New methods for the diagnosis of the intellectual level of subnormals. In H. H. Goddard (Ed.), Development of intelligence in children (the Binet-Simon Scale). Williams & Wilkins.
  • Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Sage Publications.
  • Choi, S. W., Gibbons, L. E., & Crane, P. K. (2011). Lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and monte carlo simulations. Journal of Statistical Software, 39(8), 1-30. https://doi.org/10.18637/jss.v039.i08
  • Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31-44. https://doi.org/10.1111/j.1745-3992.1998.tb00619.x
  • de Ayala, R. J. (2009). The theory and practice of item response theory. The Guilford Press.
  • de Ruiter, L. E., & Bers, M. U. (2022). The Coding Stages Assessment: Development and validation of an instrument for assessing young children’s proficiency in the ScratchJr programming language. Computer Science Education, 32(4), 1-30. https://doi.org/10.1080/08993408.2021.1956216
  • Devine, P. J., & Raju, N. S. (1982). Extent of overlap among four item bias methods. Educational and Psychological Measurement, 42(4), 1049-1066. https://doi.org/10.1177/001316448204200412
  • Dodeen, H., & Johanson, G. A. (2003). An analysis of sex-related differential item functioning in attitude assessment. Assessment & Evaluation in Higher Education, 28(2), 129-134. https://doi.org/10.1080/02602930301667
  • Dorans, N. J. (1989). Two new approaches to assessing differential item functioning: Standardization and the Mantel-Haenszel Method. Applied Measurement in Education, 2(3), 217-233. https://doi.org/10.1207/s15324818ame0203_3
  • Dorans, N. J., & Kulick, E. (1983). Assessing unexpected differential item performance of female candidates on SAT and TSWE forms administered in December 1977: An application of the standardization approach (Report No. RR-83-9). ETS Research Report Series. https://doi.org/10.1002/j.2330-8516.1983.tb00009.x
  • Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23(4), 355-368. https://doi.org/10.1111/j.1745-3984.1986.tb00255.x
  • Dzul-Garcia, C., & Atar, B. (2020). Investigation of possible item bias on PISA 2015 science items across Chile, Costa Rica and Mexico. Culture and Education, 32(3), 470-505. https://doi.org/10.1080/11356405.2020.1785158
  • Elosua, P., & Wells, C. S. (2013). Detecting DIF in polytomous items using MACS, IRT and ordinal logistic regression. Psicológica, 34(2), 327-342.
  • Facon, B., & Nuchadee, M-L. (2010). An item analysis of Raven’s Colored Progressive Matrices among participants with Down syndrome. Research in Developmental Disabilities, 31(1), 243-249. https://doi.org/10.1016/j.ridd.2009.09.011
  • Farcomeni, A., Pittau, M. G., Viviani, S., & Zelli, R. (2022). A European measurement scale for material deprivation. Research Square, 1-32. https://doi.org/10.21203/rs.3.rs-2250804/v1
  • Finch, W. H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29(4), 278-295. https://doi.org/10.1177/0146621605275728
  • Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education (8th Ed.). Mc-Graw Hill.
  • Gamerman, D., Gonçalves, F. B., & Soares, T. M. (2018). Differential item functioning. In W. J. van der Linden (Ed.), Handbook of item response theory (pp. 67-84). CRC Press.
  • Gao, Y., & Zhu, W. (2009). Identifying culturally sensitive physical activities using DIF analysis. Medicine & Science in Sports & Exercise, 41(5), 416-417. http://dx.doi.org/10.1249/01.MSS.0000355818.07045.09
  • Gelin, M. N., Carleton, B. C., Smith, A. A., & Zumbo, B. D. (2004). The dimensionality and gender differential item functioning of the mini asthma quality of life questionnaire (MINIAQLQ). Social Indicators Research, 68(1), 91-105. https://doi.org/10.1023/B:SOCI.0000025580.54702.90
  • Gómez-Benito, J., Sireci, S., Padilla, J.-L., Hidalgo, M. D., & Benítez, I. (2018). Differential item functioning: Beyond validity evidence based on internal structure. Psicothema, 30(1), 104-109. http://doi.org/10.7334/psicothema2017.183
  • Gould, S. J. (1981). The mismeasure of man. W. W. Norton & Company. Hauger, J. B., & Sireci, S. G. (2008). Detecting differential item functioning across examinees tested in their dominant language and examinees tested in a second language. International Journal of Testing, 8(3), 237-250. http://dx.doi.org/10.1080/15305050802262183
  • Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Lawrence Erlbaum Associates.
  • Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item functioning: Theory and practice. Erlbaum Publishers.
  • Hunter, J. E. (1975). A critical analysis of the use of item means and item-test correlations to determine the presence or absence of content bias in achievement test items [Paper presentation]. National Institute of Education Conference on Test Bias, Annapolis, MD.
  • Ironson, G. H., & Subkoviak, M. J. (1979). A comparison of several methods of assessing item bias. Journal of Educational Measurement, 16(4), 209-225. https://doi.org/10.1111/j.1745-3984.1979.tb00103.x
  • Iwata, N., Turner, R. J., & Lloyd, D. A. (2002). Race/ethnicity and depressive symptoms in community-dwelling young adults: A differential item functioning analysis. Psychiatry Research, 110(3), 281-289. https://doi.org/10.1016/S0165-1781(02)00102-6
  • Jensen, A. R. (1973). Educability and group differences. Basic Books.
  • Jensen, A. R. (1976). Test bias and construct validity. The Phi Delta Kappan, 58(4), 340-346.
  • Kelderman, H. (1989). Item bias detection using loglinear IRT. Psychometrika, 54(4), 681-697. https://doi.org/10.1007/BF02296403
  • Korkmaz, M. (2006). Test ve ölçek geliştirmede yeni yaklaşımlar: Madde cevap kuramı kapsamında madde işlevsel farklılık (madde yanlılık) yöntemleri. Türk Psikoloji Yazıları, 9(18), 63-80.
  • Li, Z., & Zumbo, B. D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score data. Psicológica, 30(2), 343-370.
  • Lord, F. M. (1977). A study of item bias using item characteristic curve theory. In N. H. Poortinga (Ed.), Basic problems in cross-cultural psychology (pp. 19-29). Swets and Zeitlinger.
  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates, Inc.
  • Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847-862. https://doi.org/10.3758/BRM.42.3.847
  • Magis, D., & Facon, B. (2012). Angoff's delta method revisited: Improving DIF detection under small samples. British Journal of Mathematical and Statistical Psychology, 65(2), 302-321. https://doi.org/10.1111/j.2044-8317.2011.02025.x
  • Magis, D., & Facon, B. (2014). deltaPlotR: An R package for differential item functioning analysis with Angoff’s Delta Plot. Journal of Statistical Software, 59(1), 1-19. https://doi.org/10.18637/jss.v059.c01
  • Mellenberg, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127-143. https://doi.org/10.1016/0883-0355(89)90002-5
  • Muñiz, J., Hambleton, R. K., & Xing, D. (2001). Small sample studies to detect flaws in item translations. International Journal of Testing, 1(2), 115-135. https://doi.org/10.1207/S15327574IJT0102_2
  • Oosterhof, A. C., Atash, M. N., & Lassiter, K. L. (1984). Facilitating identification of item bias through use of delta plots. Educational and Psychological Measurement, 44(3), 619-627. https://doi.org/10.1177/0013164484443009
  • Osterlind, S. J. (1983). Test item bias. Sage Publications.
  • Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd Ed.). Sage Publications.
  • Ozarkan, H. B., Kucam, E. ve Demir, E. (2017). Merkezi Ortak Sınav Matematik alt testinde değişen madde fonksiyonunun görme engeli durumuna göre incelenmesi. Curr Res Educ, 3(1), 24-34.
  • Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26, pp. 125-167). Elsevier.
  • Pine, S. M. (1977). Applications of item response theory to the problem of test bias. In D. J. Weiss (Ed.), Applications of computerized adaptive testing (pp. 37-43). University of Minnesota, Psychometric Methods Program.
  • R Development Core Team. (2023). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing.
  • Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495-502. https://doi.org/10.1007/BF02294403
  • Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197- 207. https://doi.org/10.1177/014662169001400208
  • Raju, N. S., Drasgow, F., & Slinde, J. A. (1993). An empirical comparison of the area methods, Lord’S Chi-Square Test, and the Mantel-Haenszel technique for assessing differential item functioning. Educational and Psychological Measurement, 53(2), 301-314. https://doi.org/10.1177/0013164493053002001
  • Revelle, W. (2023). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University, Evanston, Illinois. R package version 2.3.6, https://CRAN.R-project.org/package=psych.
  • Robin, F., Sireci, S. G., & Hambleton, R. K. (2003). Evaluating the equivalence of different language versions of a credentialing exam. International Journal of Testing, 3(1), 1-20, https://doi.org/10.1207/S15327574IJT0301_1
  • Rudner, L. M. (1978). Using standard tests with the hearing impaired: The problems of item bias. Volta Review, 80, 31-40.
  • Scarr, S., & Weinberg, R. A. (1976). IQ test performance of Black children adopted by White families. American Psychologist, 31(10), 726-739. https://doi.org/10.1037/0003-066X.31.10.726
  • Scheuneman, J. (1979). A method of assessing bias in test items. Journal of Educational Measurement, 16(3), 143-152. https://doi.org/10.1111/j.1745-3984.1979.tb00095.x
  • Seong, T-J., & Subkoviak, M. J. (1987). A comparative study of recently proposed item bias detection methods [Paper presentation]. Annual Meeting of the National Council on Measurement in Education, Washington, D.C.
  • Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159-194. https://doi.org/10.1007/BF02294572
  • Shepard, L. A., Camilli, G., & Averill, M. (1981). Comparison of procedures for detecting test-item bias with both internal and external ability criteria. Journal of Educational Statistics, 6(4), 317-375. https://doi.org/10.3102/10769986006004317
  • Shepard, L. A., Camilli, G., & Williams, D. A. (1985). Validity of approximation techniques for detecting item bias. Journal of Educational Measurement, 22(2), 77-105. https://doi.org/10.1111/j.1745-3984.1985.tb01050.x
  • Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
  • Tat, O. ve Doğan, N. (2018). Uluslararası Bilgisayar ve Bilgi Teknolojileri Okuryazarlığı Testinin madde-birey dağılımı ve değişen madde fonksiyonu yönünden incelenmesi. Gazi Üniversitesi Gazi Eğitim Fakültesi Dergisi, 38(3), 1207-1231. https://doi.org/10.17152/gefad.321630
  • Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 147-172). Lawrence Erlbaum Associates, Inc.
  • Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67-113). Lawrence Erlbaum Associates, Inc.
  • Thurstone, L. L. (1925). A method of scaling psychological and educational tests. Journal of Educational Psychology, 16(7), 433-451. https://doi.org/10.1037/h0073357
  • van der Flier, H., Mellenberg, G. J., Adér, H. J., & Wijn, M. (1984). An iterative item bias detection method. Journal of Educational Measurement, 21(2), 131-145. https://doi.org/10.1111/j.1745-3984.1984.tb00225.x
  • Van Vo, D., & Csapó, B. (2023). Effects of multimedia on psychometric characteristics of cognitive tests: A comparison between technology-based and paper-based modalities. Studies in Educational Evaluation, 77, 1-12. https://doi.org/10.1016/j.stueduc.2023.101254
  • Wainer, H. (1993). Model-based standardized measurement of an item’s differential impact. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 123-135). Lawrence Erlbaum Associates.
  • Wainer, H., Bradlow, E., & Wang, X. (2010). Detecting DIF: Many paths to salvation. Journal of Educational and Behavioral Statistics, 35(4), 489-493. https://doi.org/10.3102/1076998610376624
  • Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-type (Ordinal) Item Scores. Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233. http://dx.doi.org/10.1080/15434300701375832
  • Zwick, R., & Ercikan, K. (1989). Analysis of differential item functioning in the NAEP history assessment. Journal of Educational Measurement, 26(1), 55-66. https://doi.org/10.1111/j.1745-3984.1989.tb00318.x

Angoff’un Dönüştürülmüş Madde Güçlükleri Yöntemi’nin Değişen Madde Fonksiyonu Belirlemede Kullanımı

Yıl 2025, Cilt: 25 Sayı: 1, 398 - 424

Öz

Bu çalışmada değişen madde fonksiyonu (DMF) belirleme yöntemlerinden Angoff’un Dönüştürülmüş Madde Güçlükleri (Transformed Item Difficulties) Yöntemi önemli ayrıntıları ve eleştirilen yönleriyle tanıtılmakta, yöntemin güçlü ve zayıf yönleri tartışılmaktadır. Tek boyutlu ve 0-1 şeklinde puanlanan maddelerden oluşan testlerde DMF belirleme çalışmalarında kullanılmak üzere geliştirilmiş öncü yöntemlerden biri olan Angoff’un yöntemine yönelik olarak alan yazında bazı eleştiriler bulunmaktadır. Yöntemin yalnızca madde güçlüklerine odaklı olması, gruplar arasındaki gerçek farkın değişen madde fonksiyonu olarak görülme olasılığı bulunması gibi nedenlerle bu yöntemin kullanılmaması önerilebilmektedir. Diğer taraftan yöntem, uygulama kolaylığı, grafiksel yorumlama imkânı tanıması ve görece küçük örneklemlerde de kullanılabilmesi gibi pratik avantajlara sahiptir. Bu kapsamda bu çalışmada Angoff’un yönteminin algoritması, genel karakteristiği, güçlü yönleri ve sınırlılıkları tartışılmıştır. Ayrıca, R programlama dili üzerinde kullanılabilen “difR” paketi ile DMF analizinde Angoff’un yönteminin nasıl kullanılacağı adım adım satır komutları yardımıyla açıklanmıştır. Yürütülen tartışmalar göstermektedir ki, Angoff’un Yöntemi ile DMF belirleme, dikkat edilmesi gereken bazı önemli sınırlılıklar içermektedir. Bununla birlikte yöntemin uygulama kolaylığı ve görselleştirme imkânı tanıyor oluşu, yanlılık ve DMF kavramlarının temellerinin açıklanması açısından yararlı olabilir. Bu yöntem, grupların ölçülen özellik bakımından test puanı ortalamalarının yakın ya da eşit olması durumunda daha anlamlı sonuçlar verebilmektedir. Yöntemin, bir testteki potansiyel olarak yanlı maddelerin belirlenmesinde bir öngörü sağlaması amacıyla daha sınırlı kullanımı düşünülebilir.

Kaynakça

  • Aituariagbon, K. E., & Osarumwense, H. J. (2022). Non-parametric method of detecting differential item functioning in Senior School Certificate Examination (SSCE) 2019 Economics multiple choice items. Kashere Journal of Education, 3(1), 146-158. https://dx.doi.org/10.4314/kje.v3i1.19
  • American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME]. (1999). Standards for educational and psychological testing. American Educational Research Association.
  • American Psychological Association [APA] (1988). Code of fair testing practices in education. Washington, DC: Author.
  • Anastasi, A., & Urbina, S. (1997). Psychological testing (9th Ed.). Prentice-Hall, Inc.
  • Angoff, W. H. (1972). A technique for the investigation of cultural differences [Paper presentation]. American Psychological Association Annual Meeting, Honolulu.
  • Angoff, W. H. (1975). The investigation of test bias in the absence of an outside criterion [Paper presentation]. NIE Conference on Test Bias, Washington, D.C.
  • Angoff, W. H. (1982). Use of difficulty and discrimination indices for detecting item bias. In R. A. Beck (Ed.), Handbook of methods for detecting item bias (pp. 96-116). Johns Hopkins University Press.
  • Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 3-23). Lawrence Erlbaum Associates.
  • Angoff, W. H., & Cook, L. L. (1988). Equating the scores of the prueba de aptitud académica and the scholastic aptitude test (Report No. 88-3). ETS Research Report Series. https://doi.org/10.1002/j.2330-8516.1988.tb00259.x
  • Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10(2), 95-105. https://doi.org/10.1111/j.1745-3984.1973.tb00787.x
  • Angoff, W. H., & Modu, C. C. (1973). Equating the scales of the Prueba de Aptitud Académica and the Scholastic Aptitude Test (Report No. CEEB-RR-3). College Entrance Examination Board.
  • Bezruczko, N., Schulz, E. M., Reynolds, A. J., Perlman, C. L. & Rice, W. K. (1989). The stability of four methods for estimating item bias (Report No. ED-392-823). Department of Research and Evaluation, Chicago Public Schools.
  • Binet, A., & Simon, T. (1905). New methods for the diagnosis of the intellectual level of subnormals. In H. H. Goddard (Ed.), Development of intelligence in children (the Binet-Simon Scale). Williams & Wilkins.
  • Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Sage Publications.
  • Choi, S. W., Gibbons, L. E., & Crane, P. K. (2011). Lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and monte carlo simulations. Journal of Statistical Software, 39(8), 1-30. https://doi.org/10.18637/jss.v039.i08
  • Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31-44. https://doi.org/10.1111/j.1745-3992.1998.tb00619.x
  • de Ayala, R. J. (2009). The theory and practice of item response theory. The Guilford Press.
  • de Ruiter, L. E., & Bers, M. U. (2022). The Coding Stages Assessment: Development and validation of an instrument for assessing young children’s proficiency in the ScratchJr programming language. Computer Science Education, 32(4), 1-30. https://doi.org/10.1080/08993408.2021.1956216
  • Devine, P. J., & Raju, N. S. (1982). Extent of overlap among four item bias methods. Educational and Psychological Measurement, 42(4), 1049-1066. https://doi.org/10.1177/001316448204200412
  • Dodeen, H., & Johanson, G. A. (2003). An analysis of sex-related differential item functioning in attitude assessment. Assessment & Evaluation in Higher Education, 28(2), 129-134. https://doi.org/10.1080/02602930301667
  • Dorans, N. J. (1989). Two new approaches to assessing differential item functioning: Standardization and the Mantel-Haenszel Method. Applied Measurement in Education, 2(3), 217-233. https://doi.org/10.1207/s15324818ame0203_3
  • Dorans, N. J., & Kulick, E. (1983). Assessing unexpected differential item performance of female candidates on SAT and TSWE forms administered in December 1977: An application of the standardization approach (Report No. RR-83-9). ETS Research Report Series. https://doi.org/10.1002/j.2330-8516.1983.tb00009.x
  • Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23(4), 355-368. https://doi.org/10.1111/j.1745-3984.1986.tb00255.x
  • Dzul-Garcia, C., & Atar, B. (2020). Investigation of possible item bias on PISA 2015 science items across Chile, Costa Rica and Mexico. Culture and Education, 32(3), 470-505. https://doi.org/10.1080/11356405.2020.1785158
  • Elosua, P., & Wells, C. S. (2013). Detecting DIF in polytomous items using MACS, IRT and ordinal logistic regression. Psicológica, 34(2), 327-342.
  • Facon, B., & Nuchadee, M-L. (2010). An item analysis of Raven’s Colored Progressive Matrices among participants with Down syndrome. Research in Developmental Disabilities, 31(1), 243-249. https://doi.org/10.1016/j.ridd.2009.09.011
  • Farcomeni, A., Pittau, M. G., Viviani, S., & Zelli, R. (2022). A European measurement scale for material deprivation. Research Square, 1-32. https://doi.org/10.21203/rs.3.rs-2250804/v1
  • Finch, W. H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29(4), 278-295. https://doi.org/10.1177/0146621605275728
  • Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education (8th Ed.). Mc-Graw Hill.
  • Gamerman, D., Gonçalves, F. B., & Soares, T. M. (2018). Differential item functioning. In W. J. van der Linden (Ed.), Handbook of item response theory (pp. 67-84). CRC Press.
  • Gao, Y., & Zhu, W. (2009). Identifying culturally sensitive physical activities using DIF analysis. Medicine & Science in Sports & Exercise, 41(5), 416-417. http://dx.doi.org/10.1249/01.MSS.0000355818.07045.09
  • Gelin, M. N., Carleton, B. C., Smith, A. A., & Zumbo, B. D. (2004). The dimensionality and gender differential item functioning of the mini asthma quality of life questionnaire (MINIAQLQ). Social Indicators Research, 68(1), 91-105. https://doi.org/10.1023/B:SOCI.0000025580.54702.90
  • Gómez-Benito, J., Sireci, S., Padilla, J.-L., Hidalgo, M. D., & Benítez, I. (2018). Differential item functioning: Beyond validity evidence based on internal structure. Psicothema, 30(1), 104-109. http://doi.org/10.7334/psicothema2017.183
  • Gould, S. J. (1981). The mismeasure of man. W. W. Norton & Company. Hauger, J. B., & Sireci, S. G. (2008). Detecting differential item functioning across examinees tested in their dominant language and examinees tested in a second language. International Journal of Testing, 8(3), 237-250. http://dx.doi.org/10.1080/15305050802262183
  • Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Lawrence Erlbaum Associates.
  • Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item functioning: Theory and practice. Erlbaum Publishers.
  • Hunter, J. E. (1975). A critical analysis of the use of item means and item-test correlations to determine the presence or absence of content bias in achievement test items [Paper presentation]. National Institute of Education Conference on Test Bias, Annapolis, MD.
  • Ironson, G. H., & Subkoviak, M. J. (1979). A comparison of several methods of assessing item bias. Journal of Educational Measurement, 16(4), 209-225. https://doi.org/10.1111/j.1745-3984.1979.tb00103.x
  • Iwata, N., Turner, R. J., & Lloyd, D. A. (2002). Race/ethnicity and depressive symptoms in community-dwelling young adults: A differential item functioning analysis. Psychiatry Research, 110(3), 281-289. https://doi.org/10.1016/S0165-1781(02)00102-6
  • Jensen, A. R. (1973). Educability and group differences. Basic Books.
  • Jensen, A. R. (1976). Test bias and construct validity. The Phi Delta Kappan, 58(4), 340-346.
  • Kelderman, H. (1989). Item bias detection using loglinear IRT. Psychometrika, 54(4), 681-697. https://doi.org/10.1007/BF02296403
  • Korkmaz, M. (2006). Test ve ölçek geliştirmede yeni yaklaşımlar: Madde cevap kuramı kapsamında madde işlevsel farklılık (madde yanlılık) yöntemleri. Türk Psikoloji Yazıları, 9(18), 63-80.
  • Li, Z., & Zumbo, B. D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score data. Psicológica, 30(2), 343-370.
  • Lord, F. M. (1977). A study of item bias using item characteristic curve theory. In N. H. Poortinga (Ed.), Basic problems in cross-cultural psychology (pp. 19-29). Swets and Zeitlinger.
  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates, Inc.
  • Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847-862. https://doi.org/10.3758/BRM.42.3.847
  • Magis, D., & Facon, B. (2012). Angoff's delta method revisited: Improving DIF detection under small samples. British Journal of Mathematical and Statistical Psychology, 65(2), 302-321. https://doi.org/10.1111/j.2044-8317.2011.02025.x
  • Magis, D., & Facon, B. (2014). deltaPlotR: An R package for differential item functioning analysis with Angoff’s Delta Plot. Journal of Statistical Software, 59(1), 1-19. https://doi.org/10.18637/jss.v059.c01
  • Mellenberg, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127-143. https://doi.org/10.1016/0883-0355(89)90002-5
  • Muñiz, J., Hambleton, R. K., & Xing, D. (2001). Small sample studies to detect flaws in item translations. International Journal of Testing, 1(2), 115-135. https://doi.org/10.1207/S15327574IJT0102_2
  • Oosterhof, A. C., Atash, M. N., & Lassiter, K. L. (1984). Facilitating identification of item bias through use of delta plots. Educational and Psychological Measurement, 44(3), 619-627. https://doi.org/10.1177/0013164484443009
  • Osterlind, S. J. (1983). Test item bias. Sage Publications.
  • Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd Ed.). Sage Publications.
  • Ozarkan, H. B., Kucam, E. ve Demir, E. (2017). Merkezi Ortak Sınav Matematik alt testinde değişen madde fonksiyonunun görme engeli durumuna göre incelenmesi. Curr Res Educ, 3(1), 24-34.
  • Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26, pp. 125-167). Elsevier.
  • Pine, S. M. (1977). Applications of item response theory to the problem of test bias. In D. J. Weiss (Ed.), Applications of computerized adaptive testing (pp. 37-43). University of Minnesota, Psychometric Methods Program.
  • R Development Core Team. (2023). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing.
  • Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495-502. https://doi.org/10.1007/BF02294403
  • Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197- 207. https://doi.org/10.1177/014662169001400208
  • Raju, N. S., Drasgow, F., & Slinde, J. A. (1993). An empirical comparison of the area methods, Lord’S Chi-Square Test, and the Mantel-Haenszel technique for assessing differential item functioning. Educational and Psychological Measurement, 53(2), 301-314. https://doi.org/10.1177/0013164493053002001
  • Revelle, W. (2023). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University, Evanston, Illinois. R package version 2.3.6, https://CRAN.R-project.org/package=psych.
  • Robin, F., Sireci, S. G., & Hambleton, R. K. (2003). Evaluating the equivalence of different language versions of a credentialing exam. International Journal of Testing, 3(1), 1-20, https://doi.org/10.1207/S15327574IJT0301_1
  • Rudner, L. M. (1978). Using standard tests with the hearing impaired: The problems of item bias. Volta Review, 80, 31-40.
  • Scarr, S., & Weinberg, R. A. (1976). IQ test performance of Black children adopted by White families. American Psychologist, 31(10), 726-739. https://doi.org/10.1037/0003-066X.31.10.726
  • Scheuneman, J. (1979). A method of assessing bias in test items. Journal of Educational Measurement, 16(3), 143-152. https://doi.org/10.1111/j.1745-3984.1979.tb00095.x
  • Seong, T-J., & Subkoviak, M. J. (1987). A comparative study of recently proposed item bias detection methods [Paper presentation]. Annual Meeting of the National Council on Measurement in Education, Washington, D.C.
  • Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159-194. https://doi.org/10.1007/BF02294572
  • Shepard, L. A., Camilli, G., & Averill, M. (1981). Comparison of procedures for detecting test-item bias with both internal and external ability criteria. Journal of Educational Statistics, 6(4), 317-375. https://doi.org/10.3102/10769986006004317
  • Shepard, L. A., Camilli, G., & Williams, D. A. (1985). Validity of approximation techniques for detecting item bias. Journal of Educational Measurement, 22(2), 77-105. https://doi.org/10.1111/j.1745-3984.1985.tb01050.x
  • Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
  • Tat, O. ve Doğan, N. (2018). Uluslararası Bilgisayar ve Bilgi Teknolojileri Okuryazarlığı Testinin madde-birey dağılımı ve değişen madde fonksiyonu yönünden incelenmesi. Gazi Üniversitesi Gazi Eğitim Fakültesi Dergisi, 38(3), 1207-1231. https://doi.org/10.17152/gefad.321630
  • Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 147-172). Lawrence Erlbaum Associates, Inc.
  • Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67-113). Lawrence Erlbaum Associates, Inc.
  • Thurstone, L. L. (1925). A method of scaling psychological and educational tests. Journal of Educational Psychology, 16(7), 433-451. https://doi.org/10.1037/h0073357
  • van der Flier, H., Mellenberg, G. J., Adér, H. J., & Wijn, M. (1984). An iterative item bias detection method. Journal of Educational Measurement, 21(2), 131-145. https://doi.org/10.1111/j.1745-3984.1984.tb00225.x
  • Van Vo, D., & Csapó, B. (2023). Effects of multimedia on psychometric characteristics of cognitive tests: A comparison between technology-based and paper-based modalities. Studies in Educational Evaluation, 77, 1-12. https://doi.org/10.1016/j.stueduc.2023.101254
  • Wainer, H. (1993). Model-based standardized measurement of an item’s differential impact. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 123-135). Lawrence Erlbaum Associates.
  • Wainer, H., Bradlow, E., & Wang, X. (2010). Detecting DIF: Many paths to salvation. Journal of Educational and Behavioral Statistics, 35(4), 489-493. https://doi.org/10.3102/1076998610376624
  • Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-type (Ordinal) Item Scores. Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233. http://dx.doi.org/10.1080/15434300701375832
  • Zwick, R., & Ercikan, K. (1989). Analysis of differential item functioning in the NAEP history assessment. Journal of Educational Measurement, 26(1), 55-66. https://doi.org/10.1111/j.1745-3984.1989.tb00318.x
Toplam 82 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Eğitimde Program Değerlendirme
Bölüm Makaleler
Yazarlar

Metehan Güngör 0000-0003-4409-2229

Ergul Demir 0000-0002-3708-8013

Erken Görünüm Tarihi 9 Mart 2025
Yayımlanma Tarihi
Gönderilme Tarihi 22 Şubat 2024
Kabul Tarihi 19 Aralık 2024
Yayımlandığı Sayı Yıl 2025 Cilt: 25 Sayı: 1

Kaynak Göster

APA Güngör, M., & Demir, E. (2025). Angoff’un Dönüştürülmüş Madde Güçlükleri Yöntemi’nin Değişen Madde Fonksiyonu Belirlemede Kullanımı. Abant İzzet Baysal Üniversitesi Eğitim Fakültesi Dergisi, 25(1), 398-424.