Klasik Test ve Madde Tepki Kuramlarına Göre Çoktan Seçmeli Testlerde Farklı Puanlama Yöntemlerinin Karşılaştırılması

Göksu Gözen Çıtak

Research Article

A Comparison of Differential Scoring Methods For Multiple Choice Tests in Terms of Classical Test and Item Response Theories

Year 2010, Volume: 9 Issue: 1, 170 - 187, 26.06.2010

Göksu Gözen Çıtak

Abstract

The purpose of this research is to determine the effects of binary (1-0) scoring, judgement-based (a
priori) option weighting and empirical option weighting on the reliability and validity of a multiple-choice test
regarding Classical Test and Item Response theories. Data were collected through the administration of a multiplechoice
test of verbal ability to 1593 students attending several departments at Hacettepe and Gazi Universities.
Research findings showed that regarding Item Response Theory, “1-0“ scoring estimates the parameters within
different intervals on the ability scale more precisely than weighted scoring and binary scoring is superior to
weighted scoring in terms of validity. In case of Classical Test Theory, results indicated that empirical option
weighting estimates the highest reliability and all scoring methods cause an identical effect on test validity.

Keywords

multiple choice tests, partial credit model, scoring methods, option weighting

References

Adams, R.J. (1988). Applying the partial credit model to educational diagnosis. Applied Measurement in Education, 1(4), 347-361.
Akku:, O. (2000). Çoktan seçmeli test maddelerini puanlamada, seçenekleri farkl biçimlerde a rl kland rman n madde ve test istatistiklerine olan etkisinin incelenmesi. Yay;mlanmam;: yüksek lisans tezi, Hacettepe Üniversitesi Sosyal Bilimler Enstitüsü, Ankara.
Ark, L.A. (2001). Relationships and properties of polytomous Item Response theory models. Applied Psychological Measurement, 25(3), 273-282.
Backhoff, E.E., Tirado, F.S., & Larrazolo, N.R. (2001). Differential weighting of items to improve university admission test validity. Electronic Journal of Educational Research, 3(1), 21-31.
Baykul, Y. (2000). E itimde ve psikolojide ölçme: Klasik Test teorisi ve uygulamas . Ankara: ÖSYM Yay;nlar;.
Bayuk, R.J. (1973). The effects of choice weights and item weights on the reliability and predictive validity of aptitude-type tests [Abstract]. ERIC Digest, Washington DC: ERIC Clearinghouse on Assessment and Evaluation. (ERIC Document Reproduction Service No: ED078061).
Ben-Simon, A., Budescu, D.V. & Nevo, B. (1997). A comparative study of measures of partial knowledge in multiple-choice tests. Applied Psychological Measurement, 21(1), 65-88.
Bock, R.D. (1997). The nominal categories model (In W. J. van der Linden & R. K. Hambleton, Eds.), Handbook of Modern Item Response Theory (p.33-49), New York Inc.: Springer-Verlag.
Coombs, C.H., Milholland, J.E. & Womer, F.B. (1956). The assessment of partial knowledge. Educational and Psychological Measurement, 16, 13-37.
Corey, S.M. (1930). The effect of weighting exercises in a new-type examination. Journal of Educational Psychology, 21, 383-385.
Crehan K.D. & Haladyna T.M. (1994). A comparison of three linear polytomous scoring methods. ERIC Digest, Washington DC: ERIC Clearinghouse on Assessment and Evaluation. (ERIC Document Reproduction Service No: ED377246).
Crocker, L. & Algina J. (1986). Introduction to classical and modern test theory. Orlando: Harcourt Brace Jovanovich Inc.
Cross, L.H. & Frary, R.B. (1978). Empirical choice weighting under “guess” and “do not guess” directions. Educational and Psychological Measurement, 38, 613-620.
Cross, L.H., Ross, F.K. & Geller, E.S. (1980). Using choice-weighted scoring of multiple-choice tests for determination of grades in college courses. Journal of Experimental Education, 48, 296-301.
Davis, F.B. & Fifer, G. (1959). The effect on test reliability and validity of scoring aptitude and achievement tests with weights for every choice. Educational and Psychological Measurement, 19, 159-170.
De Ayala, R.J. (1993). An introduction to polytomous Item Response theory models. Measurement and Evaluation in Counseling and Development, 3, 172-189.
De Ayala, R.J., Dodd, B.G. & Koch, W.R. (1992). A comparison of the partial credit and graded response models in computerized adaptive testing. Applied Measurement in Education, 5(1), 17-34.
Dodd, B.G. (1984). Attitude scaling: A comparison of the graded response and partial credit latent trait models (Doctoral Dissertation, University of Texas at Austin, 1984). Dissertation Abstracts International, 45, 2074A.
Dodd, B.G. & Koch, W.R. (1987). Effects of variations in item stop values on item and test information in the partial credit model. Applied Psychological Measurement, 11, 371-384.
Downey, R.G. (1979). Item-option weighting of achievement tests: Comparative study of methods. Applied Psychological Measurement, 3, 453-461.
Drasgow, F., Levine, M.V., Tsien, S., Williams, B. & Mead, A.D. (1995). Fitting polytomous Item Response theory models to multiple-choice tests. Applied Psychological Measurement, 19(2), 143-165.
Embretson, S.E. & Reise, S.P. (2000). Item Response theory for psychologists. New Jersey: Lawrence Erlbaum Associates.
Echternacht, G. (1976). Reliability and validity of item option weighting schemes. Educational and Psychological Measurement, 36, 301-309.
Frary, R. (1980). The effect of misinformation, partial information, and guessing on expected multiple-choice test item scores. Applied Psychological Measurement, 4(1), 79-90.
Frary, R. (1989). Partial credit scoring methods for multiple choice tests. Applied Measurement in Education, 2(1), 79-96.
Glas, C.A.W. & Verhelst, N.D. (1989). Extensions of the partial credit model. Psychometrika, 54(4), 635-659.
Gözen, G. (2006). K;sa cevapl; ve çoktan seçmeli maddelerin “1-0” ve a ;rl;kl; puanlama yöntemleri ile puanlanmas;n;n testin psikometrik özellikleri aç;s;ndan incelenmesi. E itim Bilimleri ve Uygulama, 5(9), 35– 52.
Guilford, J.P. (1941). A simple scoring weight for test items and its reliability. Psychometrika, 6(6), 367-374.
Gulliksen, H. (1967). Theory of mental tests. New York: John-Wiley & Sons Inc.
Guttman, L. (1941). An outline of the statistical theory of prediction (In P. Horst, Ed.). Prediction of personal adjustment. Social Science Research Bulletin, 48, 253-364.
Haladyna, T.M. (1990). Effects of empirical option weighting on estimating domain scores and making pass/ fail decisions. Applied Measurement in Education, 3(3), 231-244.
Hambleton, R.K., Roberts, D.M. & Traub, R.E. (1970). A comparison of the reliability and validity of two methods for assessing partial knowledge on a multiple-choice test. Journal of Educational Measurement, 7, 75-82.
Hambleton, R.K. & Swaminathan, H. (1985). Item Response theory: Principles and application. Boston: Kluwer Academic Publishers Group.
Hambleton, R.K., Swaminathan, H. & Rogers, H.J. (1991). Fundamentals of Item Response theory. California: Sage Publications Inc.
Hutchinson, T.P. (1982). Some theories of performance in multiple-choice tests, and their implications for variants of the task. British Journal of Mathematical and Statistical Psychology, 35, 71-89.
Jaradat, D. & Tollefson, N. (1988). The impact of alternative scoring procedures for multiple-choice items on test reliability, validity and grading. Educational and Psychological Measurement, 48, 627-635.
Kansup, W. & Hakstain, A.R. (1975). A comparison of several methods of assessing partial knowledge in multiple- choice tests: Scoring procedures. Journal of Educational Measurement, 12, 219-230.
Lord, F. (1980). Applications of Item Response theory to practical testing problems. New Jersey: Lawrence Erlbaum Associates Publishers.
Lord, F. & Novick R.M. (1968). Statistical theories of mental test scores. New York: Addison Wesley Publishing Company.
Magnusson, D. (1966). Test theory. Stockholm: Addison-Wesley Publishing Company.
Masters, G.N. (1982). A rasch model for partial credit scoring. Psychometrika, 47(2), 149-173.
Masters, G.N. (1988). The analysis of partial credit scoring. Applied Measurement in Education, 1(4), 279-297.
Nedelsky, L. (1954). Ability to avoid gross error as a measure of achievement. Educational and Psychological Measurement, 14, 459-472.
Odell, C.V. (1931). Further data concerning the effect of weighting exercises in new-type examinations. Journal of Educational Psychology, 22, 700-704.
Özdemir, D. (2002). Çoktan seçmeli testlerin Klasik Test teorisi ve örtük özellikler teorisine göre hesaplanan psikometrik özelliklerinin iki kategorili ve a rl kland r lm puanlanmas yönünden kar la t r lmas . Yay;mlanmam;: doktora tezi, Hacettepe Üniversitesi Sosyal Bilimler Enstitüsü, Ankara.
Patnaik, D. & Traub, R.E. (1973). Differential weighting by judged degree of correctness. Journal of Educational Measurement, 10, 281-286.
Sabers, D.L. & White, G.W. (1969). The effect of differential weighting of individual Item Responses on the predictive validity and reliability of an aptitude test. Journal of Educational Measurement, 6, 93-96.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, No. 17.
Sayg;, B. (2004). “1-0” ve a rl kl puanlama yöntemleri ile puanlanan çoktan seçmeli testlerin madde ve test özelliklerinin kar la t r lmas . Yay;mlanmam;: yüksek lisans tezi, Hacettepe Üniversitesi Sosyal Bilimler Enstitüsü, Ankara.
Siegel, S. (1977). Nonparametric statistics (Çev. Y. Topsever). A.Ü. Dil ve Tarih-Co rafya Fakültesi Yay;nlar; No: 274. (Eserin orijinali 1956’da yay;mland;).
Sympson, J.B. & Haladyna, T.M. (1988). An evaluation of polyweighting in domain-referenced testing. ERIC Digest, Washington DC: ERIC Clearinghouse on Assessment and Evaluation. (ERIC Document Reproduction Service No: ED294 911).
Thissen, D.M. (1976). Information in wrong responses to the raven progressive matrices. Journal of Educational Measurement, 14, 201-214.
Thissen, D.M. (1991). Multilog user’s guide- multiple, categorical item analysis and test scoring using Item Response theory. Chicago: Scientific Software, Inc.
Wang, M.D. & Stanley, J.C. (1970). Differential weighting: A review of methods and empirical studies. Review of Educational Research, 40 (5), 663-705.
Waters, B.K. (1976). The measurement of partial knowledge: A comparison between two empirical option- weighting methods and rights-only scoring. The Journal of Educational Research, 69(7), 256-260.
Wright, B.D. (1999). Model selection: Rating scale or partial credit?. Rasch Measurement Transactions, 12(3), 641- 642.

Klasik Test ve Madde Tepki Kuramlarına Göre Çoktan Seçmeli Testlerde Farklı Puanlama Yöntemlerinin Karşılaştırılması

Year 2010, Volume: 9 Issue: 1, 170 - 187, 26.06.2010

Göksu Gözen Çıtak

Abstract

Bu fuar, çoktan seçmeli bir testte yer alan madde seçeneklerinin iki kategorili (1-0) puanlama, uzman yargılama hakkı seçeneği ağırlıklandırma ve deneysel seçenek ağırlıklandırma Klasik Test Kuramı ve Madde Tepki Kuramı'na göre nasıl etkilendiğinin olduğu amaçlanmıştır. Araştırma verisi çoktan seçmeli bir sözel yetenek testinin, Hacettepe ve Gazi Üniversitesi'nin farklı bölümleri okuyan toplam 1593 öğrenciye almayayla elde edilir. Araştırmanın bulguları, Madde Tepki Kuramı'nda “1-0” puanlamanın yapıldığı kapasitesi ölçeği rampa parametrelerin nüfus puanlamaların daha uygun kestirildiğini göstermiş, bu orantılı test geçerliği. Klasik Test Kuramı'nda ise deneysel daha fazla kestirilmiştir, bütün yöntemlerin test geçerliği ödeme benzer etkiyi belirlenmiştir.

Keywords

çoktan seçmeli testler, kısmi puan modeli, puanlama yöntemleri, seçenek ağırlıklandırma

References

Adams, R.J. (1988). Applying the partial credit model to educational diagnosis. Applied Measurement in Education, 1(4), 347-361.
Akku:, O. (2000). Çoktan seçmeli test maddelerini puanlamada, seçenekleri farkl biçimlerde a rl kland rman n madde ve test istatistiklerine olan etkisinin incelenmesi. Yay;mlanmam;: yüksek lisans tezi, Hacettepe Üniversitesi Sosyal Bilimler Enstitüsü, Ankara.
Ark, L.A. (2001). Relationships and properties of polytomous Item Response theory models. Applied Psychological Measurement, 25(3), 273-282.
Backhoff, E.E., Tirado, F.S., & Larrazolo, N.R. (2001). Differential weighting of items to improve university admission test validity. Electronic Journal of Educational Research, 3(1), 21-31.
Baykul, Y. (2000). E itimde ve psikolojide ölçme: Klasik Test teorisi ve uygulamas . Ankara: ÖSYM Yay;nlar;.
Bayuk, R.J. (1973). The effects of choice weights and item weights on the reliability and predictive validity of aptitude-type tests [Abstract]. ERIC Digest, Washington DC: ERIC Clearinghouse on Assessment and Evaluation. (ERIC Document Reproduction Service No: ED078061).
Ben-Simon, A., Budescu, D.V. & Nevo, B. (1997). A comparative study of measures of partial knowledge in multiple-choice tests. Applied Psychological Measurement, 21(1), 65-88.
Bock, R.D. (1997). The nominal categories model (In W. J. van der Linden & R. K. Hambleton, Eds.), Handbook of Modern Item Response Theory (p.33-49), New York Inc.: Springer-Verlag.
Coombs, C.H., Milholland, J.E. & Womer, F.B. (1956). The assessment of partial knowledge. Educational and Psychological Measurement, 16, 13-37.
Corey, S.M. (1930). The effect of weighting exercises in a new-type examination. Journal of Educational Psychology, 21, 383-385.
Crehan K.D. & Haladyna T.M. (1994). A comparison of three linear polytomous scoring methods. ERIC Digest, Washington DC: ERIC Clearinghouse on Assessment and Evaluation. (ERIC Document Reproduction Service No: ED377246).
Crocker, L. & Algina J. (1986). Introduction to classical and modern test theory. Orlando: Harcourt Brace Jovanovich Inc.
Cross, L.H. & Frary, R.B. (1978). Empirical choice weighting under “guess” and “do not guess” directions. Educational and Psychological Measurement, 38, 613-620.
Cross, L.H., Ross, F.K. & Geller, E.S. (1980). Using choice-weighted scoring of multiple-choice tests for determination of grades in college courses. Journal of Experimental Education, 48, 296-301.
Davis, F.B. & Fifer, G. (1959). The effect on test reliability and validity of scoring aptitude and achievement tests with weights for every choice. Educational and Psychological Measurement, 19, 159-170.
De Ayala, R.J. (1993). An introduction to polytomous Item Response theory models. Measurement and Evaluation in Counseling and Development, 3, 172-189.
De Ayala, R.J., Dodd, B.G. & Koch, W.R. (1992). A comparison of the partial credit and graded response models in computerized adaptive testing. Applied Measurement in Education, 5(1), 17-34.
Dodd, B.G. (1984). Attitude scaling: A comparison of the graded response and partial credit latent trait models (Doctoral Dissertation, University of Texas at Austin, 1984). Dissertation Abstracts International, 45, 2074A.
Dodd, B.G. & Koch, W.R. (1987). Effects of variations in item stop values on item and test information in the partial credit model. Applied Psychological Measurement, 11, 371-384.
Downey, R.G. (1979). Item-option weighting of achievement tests: Comparative study of methods. Applied Psychological Measurement, 3, 453-461.
Drasgow, F., Levine, M.V., Tsien, S., Williams, B. & Mead, A.D. (1995). Fitting polytomous Item Response theory models to multiple-choice tests. Applied Psychological Measurement, 19(2), 143-165.
Embretson, S.E. & Reise, S.P. (2000). Item Response theory for psychologists. New Jersey: Lawrence Erlbaum Associates.
Echternacht, G. (1976). Reliability and validity of item option weighting schemes. Educational and Psychological Measurement, 36, 301-309.
Frary, R. (1980). The effect of misinformation, partial information, and guessing on expected multiple-choice test item scores. Applied Psychological Measurement, 4(1), 79-90.
Frary, R. (1989). Partial credit scoring methods for multiple choice tests. Applied Measurement in Education, 2(1), 79-96.
Glas, C.A.W. & Verhelst, N.D. (1989). Extensions of the partial credit model. Psychometrika, 54(4), 635-659.
Gözen, G. (2006). K;sa cevapl; ve çoktan seçmeli maddelerin “1-0” ve a ;rl;kl; puanlama yöntemleri ile puanlanmas;n;n testin psikometrik özellikleri aç;s;ndan incelenmesi. E itim Bilimleri ve Uygulama, 5(9), 35– 52.
Guilford, J.P. (1941). A simple scoring weight for test items and its reliability. Psychometrika, 6(6), 367-374.
Gulliksen, H. (1967). Theory of mental tests. New York: John-Wiley & Sons Inc.
Guttman, L. (1941). An outline of the statistical theory of prediction (In P. Horst, Ed.). Prediction of personal adjustment. Social Science Research Bulletin, 48, 253-364.
Haladyna, T.M. (1990). Effects of empirical option weighting on estimating domain scores and making pass/ fail decisions. Applied Measurement in Education, 3(3), 231-244.
Hambleton, R.K., Roberts, D.M. & Traub, R.E. (1970). A comparison of the reliability and validity of two methods for assessing partial knowledge on a multiple-choice test. Journal of Educational Measurement, 7, 75-82.
Hambleton, R.K. & Swaminathan, H. (1985). Item Response theory: Principles and application. Boston: Kluwer Academic Publishers Group.
Hambleton, R.K., Swaminathan, H. & Rogers, H.J. (1991). Fundamentals of Item Response theory. California: Sage Publications Inc.
Hutchinson, T.P. (1982). Some theories of performance in multiple-choice tests, and their implications for variants of the task. British Journal of Mathematical and Statistical Psychology, 35, 71-89.
Jaradat, D. & Tollefson, N. (1988). The impact of alternative scoring procedures for multiple-choice items on test reliability, validity and grading. Educational and Psychological Measurement, 48, 627-635.
Kansup, W. & Hakstain, A.R. (1975). A comparison of several methods of assessing partial knowledge in multiple- choice tests: Scoring procedures. Journal of Educational Measurement, 12, 219-230.
Lord, F. (1980). Applications of Item Response theory to practical testing problems. New Jersey: Lawrence Erlbaum Associates Publishers.
Lord, F. & Novick R.M. (1968). Statistical theories of mental test scores. New York: Addison Wesley Publishing Company.
Magnusson, D. (1966). Test theory. Stockholm: Addison-Wesley Publishing Company.
Masters, G.N. (1982). A rasch model for partial credit scoring. Psychometrika, 47(2), 149-173.
Masters, G.N. (1988). The analysis of partial credit scoring. Applied Measurement in Education, 1(4), 279-297.
Nedelsky, L. (1954). Ability to avoid gross error as a measure of achievement. Educational and Psychological Measurement, 14, 459-472.
Odell, C.V. (1931). Further data concerning the effect of weighting exercises in new-type examinations. Journal of Educational Psychology, 22, 700-704.
Özdemir, D. (2002). Çoktan seçmeli testlerin Klasik Test teorisi ve örtük özellikler teorisine göre hesaplanan psikometrik özelliklerinin iki kategorili ve a rl kland r lm puanlanmas yönünden kar la t r lmas . Yay;mlanmam;: doktora tezi, Hacettepe Üniversitesi Sosyal Bilimler Enstitüsü, Ankara.
Patnaik, D. & Traub, R.E. (1973). Differential weighting by judged degree of correctness. Journal of Educational Measurement, 10, 281-286.
Sabers, D.L. & White, G.W. (1969). The effect of differential weighting of individual Item Responses on the predictive validity and reliability of an aptitude test. Journal of Educational Measurement, 6, 93-96.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, No. 17.
Sayg;, B. (2004). “1-0” ve a rl kl puanlama yöntemleri ile puanlanan çoktan seçmeli testlerin madde ve test özelliklerinin kar la t r lmas . Yay;mlanmam;: yüksek lisans tezi, Hacettepe Üniversitesi Sosyal Bilimler Enstitüsü, Ankara.
Siegel, S. (1977). Nonparametric statistics (Çev. Y. Topsever). A.Ü. Dil ve Tarih-Co rafya Fakültesi Yay;nlar; No: 274. (Eserin orijinali 1956’da yay;mland;).
Sympson, J.B. & Haladyna, T.M. (1988). An evaluation of polyweighting in domain-referenced testing. ERIC Digest, Washington DC: ERIC Clearinghouse on Assessment and Evaluation. (ERIC Document Reproduction Service No: ED294 911).
Thissen, D.M. (1976). Information in wrong responses to the raven progressive matrices. Journal of Educational Measurement, 14, 201-214.
Thissen, D.M. (1991). Multilog user’s guide- multiple, categorical item analysis and test scoring using Item Response theory. Chicago: Scientific Software, Inc.
Wang, M.D. & Stanley, J.C. (1970). Differential weighting: A review of methods and empirical studies. Review of Educational Research, 40 (5), 663-705.
Waters, B.K. (1976). The measurement of partial knowledge: A comparison between two empirical option- weighting methods and rights-only scoring. The Journal of Educational Research, 69(7), 256-260.
Wright, B.D. (1999). Model selection: Rating scale or partial credit?. Rasch Measurement Transactions, 12(3), 641- 642.

There are 56 citations in total.

Details

Primary Language	Turkish
Journal Section	Articles
Authors	Göksu Gözen Çıtak This is me
Publication Date	June 26, 2010
Published in Issue	Year 2010 Volume: 9 Issue: 1

Cite

APA	Çıtak, G. G. (2010). Klasik Test ve Madde Tepki Kuramlarına Göre Çoktan Seçmeli Testlerde Farklı Puanlama Yöntemlerinin Karşılaştırılması. İlköğretim Online, 9(1), 170-187.
AMA	Çıtak GG. Klasik Test ve Madde Tepki Kuramlarına Göre Çoktan Seçmeli Testlerde Farklı Puanlama Yöntemlerinin Karşılaştırılması. İOO. March 2010;9(1):170-187.
Chicago	Çıtak, Göksu Gözen. “Klasik Test Ve Madde Tepki Kuramlarına Göre Çoktan Seçmeli Testlerde Farklı Puanlama Yöntemlerinin Karşılaştırılması”. İlköğretim Online 9, no. 1 (March 2010): 170-87.
EndNote	Çıtak GG (March 1, 2010) Klasik Test ve Madde Tepki Kuramlarına Göre Çoktan Seçmeli Testlerde Farklı Puanlama Yöntemlerinin Karşılaştırılması. İlköğretim Online 9 1 170–187.
IEEE	G. G. Çıtak, “Klasik Test ve Madde Tepki Kuramlarına Göre Çoktan Seçmeli Testlerde Farklı Puanlama Yöntemlerinin Karşılaştırılması”, İOO, vol. 9, no. 1, pp. 170–187, 2010.
ISNAD	Çıtak, Göksu Gözen. “Klasik Test Ve Madde Tepki Kuramlarına Göre Çoktan Seçmeli Testlerde Farklı Puanlama Yöntemlerinin Karşılaştırılması”. İlköğretim Online 9/1 (March 2010), 170-187.
JAMA	Çıtak GG. Klasik Test ve Madde Tepki Kuramlarına Göre Çoktan Seçmeli Testlerde Farklı Puanlama Yöntemlerinin Karşılaştırılması. İOO. 2010;9:170–187.
MLA	Çıtak, Göksu Gözen. “Klasik Test Ve Madde Tepki Kuramlarına Göre Çoktan Seçmeli Testlerde Farklı Puanlama Yöntemlerinin Karşılaştırılması”. İlköğretim Online, vol. 9, no. 1, 2010, pp. 170-87.
Vancouver	Çıtak GG. Klasik Test ve Madde Tepki Kuramlarına Göre Çoktan Seçmeli Testlerde Farklı Puanlama Yöntemlerinin Karşılaştırılması. İOO. 2010;9(1):170-87.

Article Files

Full Text