Research Article
BibTex RIS Cite

A dialectic on validity: Explanation-focused and the many ways of being human

Year 2023, Volume: 10 Issue: Special Issue, 1 - 96, 27.12.2023
https://doi.org/10.21449/ijate.1406304

Abstract

In line with the journal volume’s theme, this essay considers lessons from the past and visions for the future of test validity. In the first part of the essay, a description of historical trends in test validity since the early 1900s leads to the natural question of whether the discipline has progressed in its definition and description of test validity. There is no single agreed-upon definition of test validity; however, there is a marked coalescing of explanation-centered views at the meta-level. The second part of the essay focuses on the author's development of an explanation-focused view of validity theory with aligned validation methods. The confluence of ideas that motivated and influenced the development of a coherent view of test validity as the explanation for the test score variation and validation is the process of developing and testing the explanation guided by abductive methods and inference to the best explanation. This description also includes a new re-interpretation of true scores in classical test theory afforded by the author’s measure-theoretic mental test theory development—for a particular test-taker, the variation in observed test-taker scores includes measurement error and variation attributable to the different ecological testing settings, which aligns with the explanation-focused view wherein item and test performance are the object of explanatory analyses. The final main section of the essay describes several methodological innovations in explanation-focused validity that are in response to the tensions and changes in assessment in the last 25 years.

References

  • Addey, C., Maddox, B., & Zumbo, B.D. (2020) Assembled validity: Rethinking Kane’s argument-based approach in the context of International Large-Scale Assessments (ILSAs), Assessment in Education: Principles, Policy & Practice, 27(6), 588-606. https://doi.org/10.1080/0969594X.2020.1843136
  • American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1974). Standards for educational and psychological tests. American Psychological Association.
  • American Educational Research Association, American Psychological Association, and National Council on Measurement in Education [AERA, APA, & NCME]. (1999). Standards for educational and psychological testing. American Educational Research Association.
  • American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association. https://www.testingstandards.net/open-access-files.html
  • American Psychological Association. (1954). Technical recommendations for psychological tests and diagnostic techniques. Psychological Bulletin, 51(2, Pt.2), 1 38. https://doi.org/10.1037/h0053479
  • Anastasi, A. (1950). The concept of validity in the interpretation of test scores. Educational and Psychological Measurement, 10, 67–78. https://doi.org/10.1177/001316445001000105
  • Anastasi, A. (1954). Psychological testing (1st ed.). Macmillan.
  • Angoff, W.H. (1988). Validity: An evolving concept. In: H. Wainer & H.I. Braun (Eds.), Test validity (pp. 19-32). Lawrence Erlbaum Associates.
  • Bazire, M., & Brézillon, P. (2005). Understanding Context Before Using It. In: Dey, A., Kokinov, B., Leake, D., Turner, R. (eds) modeling and using context. CONTEXT 2005. Lecture notes in computer science, vol. 3554. Springer. https://doi.org/10.1007/11508373_3
  • Bingham, W.V. (1937). Aptitudes and aptitude testing. Harper.
  • Borsboom, D., Mellenbergh, G.J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061 1071. https://doi.org/10.1037/0033 295X.111.4.1061
  • Borsboom, D., Cramer, A.O.J., Kievit, R.A., Scholten, A.Z., & Franić, S. (2009). The end of construct validity. In R.W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 135–170). IAP Information Age Publishing.
  • Bronfenbrenner, U. (1979). The ecology of human development. Harvard University Press.
  • Bronfenbrenner, U. (1994). Ecological models of human development. In T. Huston & T.N. Postlethwaith (Eds.), International enclyclopedia of education, 2nd ed., Vol. 3 (pp. 1643-1647). Elsevier Science.
  • Buckingham, B.R. (1921). Intelligence and its measurement: A symposium. Journal of Educational Psychology, 12, 271–275.
  • Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant validation by the multitrait multimethod matrix. Psychological Bulletin, 56(2), 81 105. https://doi.org/10.1037/h0046016
  • Carnap R. (1935). Philosophy and logical syntax. American Mathematical Society.
  • Chen, M.Y., & Zumbo, B.D. (2017). Ecological framework of item responding as validity evidence: An application of multilevel DIF modeling using PISA data. In: Zumbo, B., Hubley, A. (eds) Understanding and investigating response processes in validation research. Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_4
  • ChoGlueck, C. (2018). The error is in the gap: Synthesizing accounts for societal values in science. Philosophy of Science, 85(4), 704-725. https://doi.org/10.1086/699191
  • Clark, A. (1998). Being there: Putting brain, body, and world together again. MIT press.
  • Clark, A. (2011). Supersizing the mind: Embodiment, action, and cognitive extension. Oxford University Press.
  • Courtis, S.A. (1921). Report of the standardization committee. Journal of Educational Research, 4(1), 78–90.
  • Cronbach, L.J. (1971). Test validation. In: R.L. Thorndike (ed.) Educational measurement, 2nd ed. (pp. 443-507). American Council on Education.
  • Cronbach, L.J. (1988). Five perspectives on the validity argument. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 3–17). Lawrence Erlbaum Associates, Inc.
  • Cronbach, L.J. (1989). Construct validation after thirty years. In R.L. Linn (ed.) Intelligence: Measurement, theory, and public policy: Proceedings of a symposium in honor of Lloyd G. Humphreys (pp. 147-171). University of Illinois Press.
  • Cronbach, L.J., & Meehl, P.E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957
  • Danziger, K. (1990). Constructing the subject: Historical origins of psychological research. Cambridge University Press. https://doi.org/10.1017/CBO9780511524059
  • de Ayala, R.J. (2009). [Review of Handbook of Statistics, Volume 26: Psychometrics, by C.R. Rao & S. Sinharay]. Journal of the American Statistical Association, 104(487), 1281–1283. http://www.jstor.org/stable/40592308
  • Dewey, J. (1938). Logic: the theory of inquiry. Holt.
  • Douglas H. (2000) Inductive risk and values in science. Philosophy of Science, 67, 559–79. https://doi.org/10.1086/392855
  • Douglas, H. (2003). The Moral Responsibilities of Scientists (Tensions between Autonomy and Responsibility). American Philosophical Quarterly, 40(1), 59 68. http://www.jstor.org/stable/20010097
  • Douglas, H. (2004). The Irreducible Complexity of Objectivity. Synthese 138, 453–473. https://doi.org/10.1023/B:SYNT.0000016451.18182.91
  • Douglas, H. (2009). Science, policy, and the value-free ideal. University of Pittsburgh Press.
  • Douglas, H. (2016), Values in science. In P. Humphries (ed.), The Oxford Handbook of Philosophy of Science (pp. 609 630). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199368815.013.28
  • Eid, M. (1996). Longitudinal confirmatory factor analysis for polytomous item responses: Model definition and model selection on the basis of stochastic measurement theory. Methods of Psychological Research Online, 1(4), 65-85.
  • Eid, M. (2000). A multitrait-multimethod model with minimal assumptions. Psychometrika, 65, 241-261. https://doi.org/10.1007/BF02294377
  • Elliott, K. (2011). Is a little pollution good for you?: incorporating societal values in environmental research. Oxford University Press.
  • Embretson S.E. (Whitely). (1983). Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93(1), 179–197. https://doi.org/10.1037/0033-2909.93.1.179
  • Embretson, S. (1984). A general latent trait model for response processes. Psychometrika, 49(2), 175–186. https://doi.org/10.1007/BF02294171
  • Embretson, S. (1993). Psychometric models for learning and cognitive processes. In N. Frederiksen, R.J., Mislevy, & I.I. Bejar (Eds.), Test theory for a new generation of tests (pp. 125– 150). Erlbaum.
  • Embretson, S.E. (1998). A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3(3), 380 396. https://doi.org/10.1037/1082-989X.3.3.380
  • Embretson, S.E. (2007). Construct validity: A universal validity system or just another test evaluation procedure? Educational Researcher, 36(8), 449 455. https://doi.org/10.3102/0013189X07311600
  • Embretson, S.E. (2016), Understanding Examinees’ Responses to Items: Implications for Measurement. Educational Measurement: Issues and Practice, 35, 6 22. https://doi.org/10.1111/emip.12117
  • Embretson, S., Schneider, L.M., & Roth, D.L. (1986). Multiple processing strategies and the construct validity of verbal reasoning tests. Journal of Educational Measurement, 23, 13–32. https://doi.org/10.1111/j.1745-3984.1986.tb00231.x
  • Fine, A.I. (1984). The natural ontological attitude (pp. 261-277). In J. Leplin (ed.), Scientific realism. University of California Press.
  • Fox, J., Pychyl, T., & Zumbo, B.D. (1997). An investigation of background knowledge in the assessment of language proficiency. In A. Huhta, V. Kohonen, L. Kurki-Suonio, & S. Luoma, (Eds.), Current developments and alternatives in language assessment: Proceedings of LTRC 1996 (pp. 367 – 383). University of Jyvaskyla Press.
  • Friedman, M. (1974). Explanation and scientific understanding. The Journal of Philosophy, 71(1), 5–19. https://doi.org/10.2307/2024924
  • Galupo, M.P., Mitchell, R.C., & Davis, K.S. (2018). Face validity ratings of sexual orientation scales by sexual minority adults: Effects of sexual orientation and gender identity. Archives of Sexual Behavior, 47(4), 1241–1250. https://doi.org/10.1007/s10508-017-1037-y
  • Geiser, C., & Lockhart, G. (2012). A comparison of four approaches to account for method effects in latent state trait analyses. Psychological Methods, 17(2), 255 283. https://doi.org/10.1037/a0026977
  • Giere, R.N. (1999). Science without Laws. University of Chicago Press.
  • Giere, R.N. (2006). Scientific perspectivism. University of Chicago Press. https://doi.org/10.7208/chicago/9780226292144.001.0001
  • Giere, R.N. (2010). Explaining science: A cognitive approach. University of Chicago Press.
  • Gigerenzer, G., Swijtink, Z.G., Porter, T.M., Daston, L., Beatty, J., & Krüger, L. (1989). The empire of chance: How probability changed science and everyday life. Cambridge University Press.
  • Goffman, E. (1959). The presentation of self in everyday life. Doubleday.
  • Goffman, E. (1964). The Neglected Situation. American Anthropologist, 66(6), 133–136. http://www.jstor.org/stable/668167
  • Goldstein, H. (1980). Dimensionality, bias, independence and measurement scale problems in latent trait test score models. British Journal of Mathematical and Statistical Psychology, 33(2), 234–246. https://doi.org/10.1111/j.2044-8317.1980.tb00610.x
  • Goldstein, H. (1994). Recontextualizing mental measurement. Educational Measurement: Issues and Practice, 12(1), 16-19, 43.
  • Goldstein H. (1995). Multilevel statistical models (2nd edition). Edward Arnold/Halstead Press.
  • Goldstein, H., & Wood, R. (1989). Five decades of item response modelling. British Journal of Mathematical and Statistical Psychology, 42(2), 139 167. https://doi.org/10.1111/j.2044-8317.1989.tb00905.x
  • Green, B. F. (1990). A comprehensive assessment of measurement. Contemporary Psychology, 35, 850-851.
  • Green, C.D. (2015). Why psychology isn’t unified, and probably never will be. Review of General Psychology, 19(3), 207-214. https://doi.org/10.1037/gpr0000051
  • Guilford, J.P. (1946). New standards for test evaluation. Educational and Psychological Measurement, 6(4), 427-438. https://doi.org/10.1177/001316444600600401
  • Guion, R.M. (1980). On trinitarian doctrines of validity. Professional Psychology, 11(3), 385–398. https://doi.org/10.1037/0735-7028.11.3.385
  • Gulliksen, H. (1950a). Intrinsic validity. American Psychologist, 5(10), 511 517. https://doi.org/10.1037/h0054604
  • Gulliksen, H. (1950b). Theory of mental tests. John Wiley & Sons Inc. https://doi.org/10.1037/13240-000
  • Gulliksen, H. (1961). Measurement of learning and mental abilities. Psychometrika 26, 93–107. https://doi.org/10.1007/BF02289688
  • Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255–282. https://doi.org/10.1007/BF02288892
  • Haig, B.D. (1999). Construct validation and clinical assessment. Behaviour Change, 16, 64 - 73.
  • Haig, B.D. (2005a). Exploratory factor analysis, theory generation, and scientific method. Multivariate Behavioral Research, 40(3), 303-329.
  • Haig, B.D. (2005b). An abductive theory of scientific method. Psychological Methods, 10(4), 371–388. https://doi.org/10.1037/1082-989X.10.4.371
  • Haig, B.D. (2009). Inference to the best explanation: A neglected approach to theory appraisal in psychology. The American journal of psychology, 122(2), 219-234.
  • Haig, B.D. (2014). Investigating the psychological world: Scientific method in the behavioral sciences. MIT Press.
  • Haig, B.D. (2018). Exploratory factor analysis, theory generation, and scientific method (pp. 65-88). In: Method matters in psychology. Studies in applied philosophy, epistemology and rational ethics, vol 45. Springer, Cham.
  • Haig, B.D. (2019). The importance of scientific method for psychological science. Psychology, Crime & Law, 25(6), 527–541. https://doi.org/10.1080/1068316X.2018.1557181
  • Haig, B.D. (in press). Repositioning construct validity theory: From nomological networks to pragmatic theories, and their evaluation by expiatory means. Perspectives on Psychological Science.
  • Haig, B.D., & Evers, C.W. (2016). Realist inquiry in social science. Sage.
  • Hattie, J., & Leeson, H. (2013). Future directions in assessment and testing in education and psychology. In K.F. Geisinger, B.A. Bracken, J.F. Carlson, J.-I. C. Hansen, N.R. Kuncel, S.P. Reise, & M.C. Rodriguez (Eds.), APA handbook of testing and assessment in psychology, vol. 3. testing and assessment in school psychology and education (pp. 591–622). American Psychological Association. https://doi.org/10.1037/14049-028
  • Hempel, C.G. (1965). Aspects of scientific explanation and other essays in the philosophy of science. The Free Press.
  • Hicks, D.J. (2014). A new direction for science and values. Synthese, 191(14), 3271–3295. http://www.jstor.org/stable/24026188
  • Higgins, N.C., Zumbo, B.D., & Hay, J.L. (1999). Construct validity of attributional style: Modeling context-dependent item sets in the attributional style questionnaire. Educational and Psychological Measurement, 59(5), 804 820. https://doi.org/10.1177/00131649921970152
  • Holman, B., & Wilholt, T. (2022). The new demarcation problem. Studies in history and philosophy of science, 91, 211-220. https://doi.org/10.1016/j.shpsa.2021.11.011
  • Hubley, A.M., & Zumbo, B.D. (1996). A dialectic on validity: Where we have been and where we are going. The Journal of General Psychology, 123(3), 207 215. https://doi.org/10.1080/00221309.1996.9921273
  • Hubley, A.M., & Zumbo, B.D. (2011). Validity and the consequences of test interpretation and use. Social Indicators Research, 103(2), 219–230. https://doi.org/10.1007/s11205-011-9843-4
  • Hubley, A.M., & Zumbo, B.D. (2013). Psychometric characteristics of assessment procedures: An overview. In Kurt F. Geisinger (Ed.), APA Handbook of Testing and Assessment in Psychology, 1 (pp. 3 19). American Psychological Association Press. https://doi.org/10.1037/14047-001
  • Hubley, A.M., & Zumbo, B.D. (2017). Response processes in the context of validity: Setting the stage. In B.D. Zumbo & A.M. Hubley (Eds.), Understanding and investigating response processes in validation research (pp. 1–12). Springer International Publishing/Springer Nature. https://doi.org/10.1007/978-3-319-56129-5_1
  • Hull, C.L. (1935). The conflicting psychologies of learning: A way out. Psychological Review. 42(6), 491–516. https://doi.org/10.1037/h0058665
  • Jonson, J.L., & Plake, B.S. (1998). A historical comparison of validity standards and validity practices. Educational and Psychological Measurement, 58(5), 736 753. https://doi.org/10.1177/0013164498058005002
  • Kaldis, B. (2013). Kinds: natural kinds versus human kinds. In Encyclopedia of Philosophy and the Social Sciences,2, (pp. 515 518). SAGE Publications, Inc. https://doi.org/10.4135/9781452276052
  • Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 527–535. https://doi.org/10.1037/0033-2909.112.3.527
  • Kane, M. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319-342. https://doi.org/10.1111/j.1745-3984.2001.tb01130.x
  • Kane, M. (2004). Certification testing as an illustration of argument-based validation. Measurement: Interdisciplinary Research and Perspective, 2(3), 135 170. https://doi.org/10.1207/s15366359mea0203_1
  • Kane, M. (2006). Validation. In R. Brennan (Ed.) Educational measurement (4th ed., pp. 17–64). American Council on Education and Praeger.
  • Kane, M. (2012). Validating score interpretations and uses. Language Testing, 29(1), 3-17. https://doi.org/10.1177/0265532211417210
  • Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1-73. https://doi.org/10.1111/jedm.12000
  • Kane, M. (2016). Explicating validity. Assessment in Education: Principles, Policy & Practice, 23(2), 198–211. https://doi.org/10.1080/0969594X.2015.1060192
  • Kincaid, H. (2000). Global arguments and local realism about the social sciences. Philosophy of Science, 67(S3), S667-S678. https://doi.org/10.1086/392854
  • Koch, T., Eid, M., & Lochner, K. (2018). Multitrait-multimethod-analysis: The psychometric foundation of CFA-MTMM models. In P. Irwing, T. Booth, & D.J. Hughes (Eds.), The Wiley handbook of psychometric testing: A multidisciplinary reference on survey, scale and test development (pp. 781 846). Wiley Blackwell. https://doi.org/10.1002/9781118489772.ch25
  • Koch, T., Schultze, M., Eid, M., & Geiser, C. (2014). A longitudinal multilevel CFA-MTMM model for interchangeable and structurally different methods. Frontiers in Psychology, 5, Article 311. https://doi.org/10.3389/fpsyg.2014.00311
  • Kroc, E., & Zumbo, B.D. (2018). Calibration of measurements. Journal of Modern Applied Statistical Methods, 17(2), eP2780. https://digitalcommons.wayne.edu/jmasm/vol17/iss2/17/
  • Kroc, E., & Zumbo, B.D. (2020). A transdisciplinary view of measurement error models and the variations of X= T+ E. Journal of Mathematical Psychology, 98, 102372. https://doi.org/10.1016/j.jmp.2020.102372
  • Kuhn, T.S. (1962). The structure of scientific revolutions. University of Chicago Press.
  • Kuhn, T.S. (1970). The structure of scientific revolutions (2nd ed.). University of Chicago Press.
  • Kuhn, T.S. (1977). The essential tension: Selected studies in scientific tradition and change. University of Chicago Press.
  • Kuhn, T.S. (1996). The structure of scientific revolutions (3rd ed.). University of Chicago Press.
  • Lakatos I. (1976). Falsification and the methodology of scientific research programmes. Can theories be refuted? (pp. 205–259). Springer.
  • Lane, S., Zumbo, B.D., Abedi, J., Benson, J., Dossey, J., Elliott, S.N., Kane, M., Linn, R., Paredes-Ziker, C., Rodriguez, M., Schraw, G., Slattery, J., Thomas, V., & Willhoft, J. (2009). Prologue: An Introduction to the Evaluation of NAEP. Applied Measurement in Education, 22(4), 309-316. https://doi.org/10.1080/08957340903221436
  • Lennon, R.T. (1956). Assumptions Underlying the Use of Content Validity. Educational and Psychological Measurement, 16(3), 294 304. https://doi.org/10.1177/001316445601600303
  • Lewis, C. (1986). Test theory and psychometrika: The past twenty-five years. Psychometrika, 51(1), 11–22. https://doi.org/10.1007/BF02293995
  • Li, Z., & Zumbo, B.D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score data. Psicológica, 30(2), 343–370. https://www.uv.es/psicologica/articulos2.09/11LI.pdf
  • Lipton, P. (2004). Inference to the best explanation (2nd ed.). Routledge. https://doi.org/10.4324/9780203470855
  • Lissitz, R.W., & Samuelsen, K. (2007). A suggested change in terminology and emphasis regarding validity and education. Educational Researcher, 36(8), 437–448. https://doi.org/10.3102/0013189X07311286
  • Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Addison-Wesley.
  • Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635-694 (Monograph Supp. 9).
  • Maddox, B. (2015). The neglected situation: assessment performance and interaction in context. Assessment in Education: Principles, Policy & Practice, 22(4), 427-443. https://doi.org/10.1080/0969594X.2015.1026246
  • Maddox, B., Zumbo, B.D. (2017). Observing testing situations: Validation as Jazz. In: B.D. Zumbo, A.M. Hubley (eds) Understanding and investigating response processes in validation research. Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_10
  • Maddox, B., Zumbo, B.D., Tay-Lim, B. S.-H., & Demin Qu, I. (2015). An anthropologist among the psychometricians: Assessment events, ethnography and DIF in the Mongolian Gobi. International Journal of Testing, 15(4), 291 309. https://doi.org/10.1080/15305058.2015.1017103
  • Markus, K.A. (1998). Science, measurement, and validity: Is completion of Samuel Messick's synthesis possible?. Social Indicators Research, 45, 7 34. https://doi.org/10.1023/A:1006960823277
  • MacCorquodale, K., & Meehl, P.E. (1948). On a distinction between hypothetical constructs and intervening variables. Psychological Review, 55(2), 95 107. https://doi.org/10.1037/h0056029
  • Mehrens, W.A. (1997). The consequences of consequential validity. Educational Measurement: Issues and Practice, 16(2), 16-18.
  • Messick, S. (1972). Beyond structure: In search of functional models of psychological process. Psychometrika, 37(4, Pt. 1), 357–375. https://doi.org/10.1007/BF02291215
  • Messick, S. (1975). The standard problem: Meaning and values in measurement and evaluation. American Psychologist, 30, 955- 966.
  • Messick, S. (1980). Test validity and the ethics of assessment. American Psychologist, 35, 1012-1027.
  • Messick, S. (1988). The once and future issues of validity: Assessing the meaning and consequences of measurement. In: H. Wainer & H.I. Braun (Eds.), Test validity (pp. 33-45). Lawrence Erlbaum Associates.
  • Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). Macmillan.
  • Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749. https://doi.org/10.1037/0003-066X.50.9.741
  • Messick, S. (1998). Test validity: A matter of consequence [Special issue]. Social Indicators Research, 45, 35-44. https://doi.org/10.1023/A:1006964925094
  • Messick, S. (2000). Consequences of test interpretation and use: The fusion of validity and values in psychological assessment. In: Goffin, R.D., Helmes, E. (eds) Problems and solutions in human assessment. Springer. https://doi.org/10.1007/978-1-4615-4397-8_1
  • Millman, J. (1979). Reliability and validity of criterion-referenced test scores. In: R. Traub (Ed.), New directions for testing and measurement: Methodological developments. Jossey-Bass.
  • Mosier, C.I. (1947). A critical examination of the concepts of face validity. Educational and Psychological Measurement, 7(2), 191 205. https://doi.org/10.1177/001316444700700201
  • Nickles, T. (2017). Cognitive illusions and nonrealism: Objections and replies. In: Agazzi, E. (eds) Varieties of Scientific Realism: Objectivity and truth in science (pp. 151–163). Springer, Cham. https://doi.org/10.1007/978-3-319-51608-0_8
  • Novick, M.R. (1966). The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3(1), 1–18. https://doi.org/10.1016/0022-2496(66)90002-2
  • O'Leary, T.M., Hattie, J.A.C., & Griffin, P. (2017). Actual interpretations and use of scores as aspects of validity. Educational Measurement: Issues and Practice, 36, 16-23. https://doi.org/10.1111/emip.12141
  • Padilla, J.L., & Benítez, I. (2014). Validity evidence based on response processes. Psicothema, 26, 136–144. https://doi.org/10.7334.psicothema2013.259
  • Padilla, J.L., & Benítez, I. (2017). A rationale for and demonstration of the use of DIF and mixed methods. In: Zumbo, B.D., Hubley, A.M. (eds) Understanding and investigating response processes in validation research (pp. 193–210). Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_1
  • Pellicano, E., & den Houting, J. (2022). Annual research review: Shifting from “normal science” to neurodiversity in autism science. Journal of Child Psychology and Psychiatry, 63, 381–396. https://doi.org/10.1111/jcpp.13534
  • Persson, J., & Ylikoski, P. (Eds.). (2007). Rethinking explanation (Boston Studies in the Philosophy of Science, Vol. 252). Springer.
  • Pitt, J.C. (Ed.) (1988). Theories of explanation. Oxford University Press.
  • Popham, W.J. (1997). Consequential validity: Right concern – wrong concept. Educational Measurement: Issues and Practice, 16(2), 9-13.
  • Psillos, S. (2022). Realism and theory change in science. In: Zalta, E.N., Nodelman, U. (eds.) The Stanford encyclopedia of philosophy. https://plato.stanford.edu/archives/fall2022/entries/realism-theory-change/
  • Rao, C.R., & Sinharay, S. (Eds.). (2007). Handbook of statistics, Volume 26: Psychometrics. Elsevier.
  • Raykov, T. (1992), On structural models for analyzing change. Scandinavian Journal of Psychology, 33, 247-265. https://doi.org/10.1111/j.1467-9450.1992.tb00914.x
  • Raykov, T. (1998a). Coefficient alpha and composite reliability with interrelated nonhomogeneous items. Applied Psychological Measurement, 22(4), 375-385. https://doi.org/10.1177/014662169802200407
  • Raykov, T. (1998b). A method for obtaining standard errors and confidence intervals of composite reliability for congeneric items. Applied Psychological Measurement, 22(4), 369-374. https://doi.org/10.1177/014662169802200406
  • Raykov, T. (1999). Are simple change scores obsolete? An approach to studying correlates and predictors of change. Applied Psychological Measurement, 23(2), 120-126. https://doi.org/10.1177/01466219922031248
  • Raykov, T. (2001), Estimation of congeneric scale reliability using covariance structure analysis with nonlinear constraints. British Journal of Mathematical and Statistical Psychology, 54, 315-323. https://doi.org/10.1348/000711001159582
  • Raykov, T., & Marcoulides, G.A. (2011). Introduction to psychometric theory. Routledge.
  • Raykov, T., & Marcoulides, G.A. (2016). On the relationship between classical test theory and item response theory: From one to the other and back. Educational and Psychological Measurement, 76(2), 325–338. https://doi.org/10.1177/0013164415576958
  • Reichenbach H. (1977). Philosophie der Raum-Zeit-Lehre. In: Kamlah, A., Reichenbach, M. (eds) Philosophie der Raum-Zeit-Lehre. Hans Reichenbach, vol 2. Vieweg+Teubner Verlag, Wiesbaden.
  • Roberts, B.W. (2007). Contextualizing personality psychology. Journal of Personality, 75(6), 1071–1082. https://doi.org/10.1111/j.1467-6494.2007.00467.x
  • Rome, L., & Zhang, B. (2018). Investigating the effects of differential item functioning on proficiency classification. Applied psychological measurement, 42(4), 259–274. https://doi.org/10.1177/0146621617726789
  • Rozeboom, W.W. (1966). Foundations of the theory of prediction. Dorsey.
  • Rulon, P.J. (1946). On the validity of educational tests. Harvard Educational Review, 16, 290-296.
  • Salmon, W. (1990). Four decades of scientific explanation. University of Minnesota Press.
  • Schaffner, K.F. (2020). A comparison of two neurobiological models of fear and anxiety: A “construct validity” application? Perspectives on Psychological Science, 15(5), 1214-1227. https://doi.org/10.1177/1745691620920860
  • Schaffner, K.F. (1993). Discovery and explanation in biology and medicine. University of Chicago Press.
  • Searle, J.R. (1969). Speech acts: An essay in the philosophy of language. Cambridge University Press.
  • Searle, J.R. (1979). Expression and meaning: Studies in the theory of speech acts. Cambridge University Press. https://doi.org/10.1017/CBO9780511609213
  • Sells, S.B. (ed.) (1963). Stimulus determinants of behavior. Ronald Press.
  • Shear, B.R., Zumbo, B.D. (2014). What counts as evidence: A review of validity studies in educational and psychological measurement. In: Zumbo, B.D., Chan, E.K.H. (eds) Validity and validation in social, behavioral, and health sciences (pp. 91-111). Springer, Cham. https://doi.org/10.1007/978-3-319-07794-9_6
  • Shepard, L.A. (1993). Evaluating test validity. Review of Research in Education, 19(1), 405-450. https://doi.org/10.3102/0091732X019001405
  • Shepard, L.A. (1997). The centrality of test use and consequences for test validity. Educational Measurement: Issues and Practice, 16, 5-8, 13, 24.
  • Sinnott-Armstrong, W., & Fogelin, R.J. (2010). Understanding arguments: An introduction to informal logic. Wadsworth Cengage Learning.
  • Sireci, S.G. (1998). The construct of content validity [Special issue]. Social Indicators Research 45, 83–117. https://doi.org/10.1023/A:1006985528729
  • Sireci, S.G. (2009). Packing and unpacking sources of validity evidence: History repeats itself again. In R.W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 19–37). IAP Information Age Publishing.
  • Sireci, S.G. (2013). Agreeing on validity arguments. Journal of Educational Measurement, 50, 99-104. https://doi.org/10.1111/jedm.12005
  • Sireci, S.G. (2020). De-“constructing” test validation. Chinese/English Journal of Educational Measurement and Evaluation, 1(1), Article 3. https://www.ce jeme.org/journal/vol1/iss1/3
  • Slaney, K.L., & Racine, T.P. (2013). What’s in a name? Psychology’s ever evasive construct. New Ideas in Psychology, 31(1), 4 12. https://doi.org/10.1016/j.newideapsych.2011.02.003
  • Spearman, C. (1904). The proof and measurement of association between two things. The American Journal of Psychology, 15(1), 72–101. https://doi.org/10.2307/1412159
  • Steyer, R. (1988). Conditional expectations: An introduction to the concept and its applications in empirical sciences. Methodika, 2, 53-78.
  • Steyer, R. (1989). Models of classical psychometric test theory as stochastic measurement models: representation, uniqueness, meaningfulness, identifiability, and testability. Methodika, 3, 25-60.
  • Steyer, R., Ferring, D., & Schmitt, M.J. (1992). States and traits in psychological assessment. European Journal of Psychological Assessment, 8(2), 79–98.
  • Steyer, R., Majcen, A.-M., Schwenkmezger, P., & Buchner, A. (1989). A latent state-trait anxiety model and its application to determine consistency and specificity coefficients. Anxiety Research, 1(4), 281–299. https://doi.org/10.1080/08917778908248726
  • Steyer, R., & Schmitt, M. (1990). Latent state-trait models in attitude research. Quality & Quantity, 24, 427–445. https://doi.org/10.1007/BF00152014
  • Steyer, R., Schmitt, M., & Eid, M. (1999). Latent state–trait theory and research in personality and individual differences. European Journal of Personality, 13(5), 389-408. https://doi.org/10.1002/(SICI)1099 0984(199909/10)13:5<389::AID PER361>3.0.CO;2-A
  • Stone, J., & Zumbo, B.D. (2016). Validity as a pragmatist project: A global concern with local application. In: Aryadoust V., & Fox J. (eds.) Trends in language assessment research and practice (pp. 555–573). Cambridge Scholars Publishing.
  • Suppes, P. (1969). Models of data. In: Studies in the methodology and foundations of science. Synthese Library, vol 22. Springer. https://doi.org/10.1007/978-94-017-3173-7_2
  • Thagard, P. (1989). Explanatory coherence. Behavioral and Brain Sciences, 12(3), 435-467. https://doi.org/10.1017/S0140525X00057046
  • Thagard, P. (1992). Conceptual revolutions. Princeton University Press. http://www.jstor.org/stable/j.ctv36zq4g
  • Tolman, C.W. (1991). Review of constructing the subject: Historical origins of psychological research [Review of the book Constructing the subject: Historical origins of psychological research, by K. Danziger]. Canadian Psychology, 32(4), 650–652. https://doi.org/10.1037/h0084651
  • Toulmin, S. (1958). The uses of argument. Cambridge University Press.
  • van Fraassen, B.C. (1980). The scientific image. Oxford University Press. https://doi.org/10.1093/0198244274.001.0001
  • van Fraassen, B.C. (1985). Empiricism in the philosophy of science. In: Churchland P.M., & Hooker C.A. (eds.) Images of science: Essays on realism and empiricism (pp. 245-308). University of Chicago Press.
  • van Fraassen, B.C. (2008). Scientific representation: Paradoxes of perspective. Oxford University Press.
  • van Fraassen, B.C. (2012). Modeling and measurement: The criterion of empirical grounding. Philosophy of Science, 79(5), 773–784. https://doi.org/10.1086/667847
  • Varela, F.J., Thompson, E., & Rosch, E. (1991). The embodied mind: Cognitive science and human experience. The MIT Press. https://doi.org/10.7551/mitpress/6730.001.0001
  • Wallin, A. (2007). Explanation and environment. In: Persson, J., Ylikoski, P. (eds) Rethinking explanation. Boston studies in the philosophy of science, (pp. 163-175), vol 252. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-5581-2_12
  • Wapner, S., & Demick, J. (2002). The increasing contexts of context in the study of environment behavior relations. In R.B. Bechtel & A. Churchman (eds.) Handbook of environmental psychology (pp. 3–14). John Wiley & Sons, Inc.
  • Watson, J.B. (1913). Psychology as the behaviorist views it. Psychological Review, 20(2), 158–177. https://doi.org/10.1037/h0074428
  • Whitely (Embretson), S.E. (1977). Information-processing on intelligence test items: Some response components. Applied Psychological Measurement, 1, 465 476. https://doi.org/10.1177/014662167700100402
  • Wiley, D.E. (1991). Test validity and invalidity reconsidered. In: R.E. Snow & D.E. Wiley (Eds.), Improving inquiry in social science: a volume in honor of Lee J. Cronbach (pp. 75-107). Erlbaum.
  • Woitschach, P., Zumbo, B.D., & Fernández-Alonso, R. (2019). An ecological view of measurement: Focus on multilevel model explanation of differential item functioning. Psicothema, 31(2), 194–203. https://doi.org/10.7334/psicothema2018.303
  • Woodward, J. (1989). Data and phenomena. Synthese, 79, 393 472. https://doi.org/10.1007/BF00869282
  • Wu, A.D., & Zumbo, B.D. (2008). Understanding and using mediators and moderators. Social Indicators Research, 87, 367–392. https://doi.org/10.1007/s11205-007-9143-1
  • Wu, A.D., Zumbo, B.D., & Marshall, S.K. (2014). A method to aid in the interpretation of EFA results: An application of Pratt’s measures. International Journal of Behavioral Development, 38(1), 98-110. https://doi.org/10.1177/0165025413506143
  • Yang, Y., Read, S.J., & Miller, L.C. (2009). The concept of situations. Social and Personality Psychology Compass, 3(6), 1018 1037. https://doi.org/10.1111/j.1751 9004.2009.00236.x
  • Zimmerman, D.W. (1975). Probability spaces, Hilbert spaces, and the axioms of test theory. Psychometrika, 40(3), 395-412. https://doi.org/10.1007/BF02291765
  • Zimmerman, D.W., & Zumbo, B.D. (2001). The geometry of probability, statistics, and test theory. International Journal of Testing, 1(3 4), 283 303. https://doi.org/10.1080/15305058.2001.9669476
  • Zumbo, B.D. (Ed.). (1998). Validity theory and the methods used in validation: perspectives from the social and behavioral sciences. In: Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, [Special volume], Vol. 45, Issues 1-3. Springer International Publishing.
  • Zumbo, B.D. (1999). The simple difference score as an inherently poor measure of change: Some reality, much mythology. Advances in social science methodology, 5(1), 269-304.
  • Zumbo, B.D. (2005, July). Reflections on validity at the intersection of psychometrics, scaling, philosophy of inquiry, and language testing [Samuel J. Messick Memorial Award Lecture]. LTRC, the 27th Language Testing Research Colloquium, Ottawa, Canada.
  • Zumbo, B.D. (2007a). Validity: Foundational Issues and Statistical Methodology. In C.R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26, pp. 45–79). Elsevier.
  • Zumbo, B.D. (2007b). Three Generations of DIF Analyses: Considering Where It Has Been, Where It Is Now, and Where It Is Going. Language Assessment Quarterly, 4(2), 223-233. https://doi.org/10.1080/15434300701375832
  • Zumbo, B.D. (2009). Validity as contextualized and pragmatic explanation, and its implications for validation practice. In R.W. Lissitz (ed.) The concept of validity: Revisions, new directions, and applications (pp. 65–82). IAP Information Age Publishing.
  • Zumbo, B.D. (2010, September). Measurement validity and validation: A meditation on where we have come from and the state of the art today [Invited address]. Presented at the International conference on outcomes measurement, US National Institutes of Health, Bethesda, MD.
  • Zumbo, B.D. (2015, November). Consequences, side effects and the ecology of testing: Keys to considering assessment “in vivo” [Plenary address]. Annual Meeting of the Association for Educational Assessment – Europe (AEAEurope), Glasgow, Scotland. https://youtu.be/0L6Lr2BzuSQ
  • Zumbo, B.D. (2016). Standard Setting Methodology [Invited address]. “Applied Physiology Physical Employment Standards - Current Issues and Challenges” at the Canadian Society for Exercise Physiology (CSEP) conference, Victoria, Canada.
  • Zumbo, B.D. (2017). Trending away from routine procedures, toward an ecologically informed in vivo view of validation practices. Measurement: Interdisciplinary Research and Perspectives, 15(3-4), 137–139. https://doi.org/10.1080/15366367.2017.1404367
  • Zumbo, B.D. (2018a, April). Methodologies used to ensure fairness and equity in the assessment of students’ educational outcomes [Invited presentation and panel session]. AERA Presidential Symposium “Methodology and equity: An international perspective” at the Annual Meeting of the American Educational Research Association (AERA), New York, NY.
  • Zumbo, B.D. (2018b, July). The reports of DIF’s death are greatly exaggerated; It is like a Phoenix rising from the ashes [Keynote Address]. The 11th Conference of the International Test Commission, Montreal, Canada.
  • Zumbo, B.D. (2019). Foreword: Tensions, Intersectionality, and What Is on the Horizon for International Large-Scale Assessments in Education. In B. Maddox (Ed.), International large-scale assessments in education: Insider research perspectives (pp. xii–xiv). Bloomsbury Publishing. https://doi.org/10.5040/9781350023635
  • Zumbo, B.D. (2021). A novel multimethod approach to investigate whether tests delivered at a test centre are concordant with those delivered remotely online [Research Monograph]. UBC Psychometric Research Series, University of British Columbia. http://dx.doi.org/10.14288/1.0400581
  • Zumbo, B.D. (2023a). Validity theories, frameworks and practices in using tests and measures: an over-the-shoulder look back at validity while also looking to the horizon [Invited Address]. Ciclo Formazione Metodologica (FORME), Dipartimento di Psicologia, Università Cattolica Del Sacro Cuore. https://brunozumbo.com/?page_id=31
  • Zumbo, B.D. (2023b). Test validation and Bayesian statistical frameworks to estimate the magnitude and corresponding uncertainty of washback effects of test preparation [Research Monograph]. UBC Psychometric Research Series, University of British Columbia. https://dx.doi.org/10.14288/1.0435197
  • Zumbo, B.D. (2023c, October). The Challenges and Promise of Embracing the Many Ways of Being Human: Toward an Ecologically Informed In Vivo View of Validation Practices [Invited Address]. Symposium on Inclusive Educational Assessment, Neurodiversity and Disability. Hughes Hall, University of Cambridge.
  • Zumbo, B.D., & Chan, E.K.H. (Eds.). (2014a). Validity and validation in social, behavioral, and health sciences. Springer International Publishing/Springer Nature. https://doi.org/10.1007/978-3-319-07794-9
  • Zumbo, B.D., & Chan, E.K.H. (2014b). Reflections on validation practices in the social, behavioral, and health sciences. In: Zumbo, B.D., Chan, E.K.H. (eds) Validity and validation in social, behavioral, and health sciences (pp. 321-327). Springer, Cham. https://doi.org/10.1007/978-3-319-07794-9_19
  • Zumbo, B.D., & Chan, E.K.H. (2014c). Setting the stage for validity and validation in social, behavioral, and health sciences: Trends in validation practices. In: Zumbo, B.D., Chan, E.K.H. (eds) Validity and validation in social, behavioral, and health sciences (pp. 3-8). Springer, Cham. https://doi.org/10.1007/978-3-319-07794-9_1
  • Zumbo, B.D., & Forer, B. (2011). Testing and measurement from a multilevel view: Psychometrics and validation. In J.A. Bovaird, K.F. Geisinger, & C.W. Buckendahl (Eds.), High-stakes testing in education: Science and practice in K–12 settings (pp. 177–190). American Psychological Association. https://doi.org/10.1037/12330-011
  • Zumbo, B.D., & Gelin, M.N. (2005). A matter of test bias in educational policy research: bringing the context into picture by investigating sociological/community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5, 1–23. URL: https://files.eric.ed.gov/fulltext/EJ846827.pdf
  • Zumbo, B. D., & Hubley, A. M. (2016). Bringing consequences and side effects of testing and assessment to the foreground. Assessment in Education: Principles, Policy & Practice, 23(2), 299–303. https://doi.org/10.1080/0969594X.2016.1141169
  • Zumbo, B.D., & Hubley, A.M. (Eds.). (2017). Understanding and investigating response processes in validation research. Springer International Publishing/Springer Nature. https://doi.org/10.1007/978-3-319-56129-5
  • Zumbo, B.D., & Kroc, E. (2019). A Measurement Is a Choice and Stevens’ scales of measurement do not help make it: A response to chalmers. Educational and Psychological Measurement, 79(6), 1184 1197. https://doi.org/10.1177/0013164419844305
  • Zumbo, B.D., Liu, Y., Wu, A.D., Forer, B., Shear, B.R. (2017). National and international educational achievement testing: A case of multi-level validation framed by the ecological model of item responding. In B.D. Zumbo & A.M. Hubley (Eds.), Understanding and investigating response processes in validation research (pp. 341-362). Springer International Publishing/Springer Nature. https://doi.org/10.1007/978-3-319-56129-5_18
  • Zumbo, B.D., Liu, Y., Wu, A.D., Shear, B.R., Olvera Astivia, O.L., & Ark, T.K. (2015). A methodology for Zumbo’s third generation DIF analyses and the ecology of item responding. Language Assessment Quarterly, 12(1), 136 151. https://doi.org/10.1080/15434303.2014.972559
  • Zumbo, B.D., Maddox, B., & Care, N.M. (2023). Process and product in computer-based assessments: Clearing the ground for a holistic validity framework. European Journal of Psychological Assessment, 39(4), 252–262. https://doi.org/10.1027/1015-5759/a000748
  • Zumbo, B.D., & Padilla, J.-L. (2020). The interplay between survey research and psychometrics, with a focus on validity theory. In P.C. Beatty, D. Collins, L. Kaye, J.-L. Padilla, G.B. Willis, & A. Wilmot (Eds.), Advances in questionnaire design, development, evaluation and testing (pp. 593 612). John Wiley & Sons, Inc.. https://doi.org/10.1002/9781119263685.ch24
  • Zumbo, B.D., Pychyl, T.A., & Fox, J.A. (1993). Psychometric properties of the CAEL assessment, II: An examination of the dependability/reliability of placement decisions. Carleton Papers in Applied Language Studies, 10, 13-27.
  • Zumbo, B.D., & Rupp, A.A. (2004). Responsible modeling of measurement data for appropriate inferences: important advances in reliability and validity theory. In David Kaplan (ed.) The SAGE handbook of quantitative methodology for the social sciences (pp. 74-93). SAGE Publications, Inc. https://doi.org/10.4135/9781412986311
  • Zumbo, B.D., & Shear, B.R. (2011, October). The concept of validity and some novel validation methods [Lecture/Workshop, half-day]. The 42nd annual Northeastern Educational Research Association (NERA) meeting, Rocky Hill, CT.

A dialectic on validity: Explanation-focused and the many ways of being human

Year 2023, Volume: 10 Issue: Special Issue, 1 - 96, 27.12.2023
https://doi.org/10.21449/ijate.1406304

Abstract

In line with the journal volume’s theme, this essay considers lessons from the past and visions for the future of test validity. In the first part of the essay, a description of historical trends in test validity since the early 1900s leads to the natural question of whether the discipline has progressed in its definition and description of test validity. There is no single agreed-upon definition of test validity; however, there is a marked coalescing of explanation-centered views at the meta-level. The second part of the essay focuses on the author's development of an explanation-focused view of validity theory with aligned validation methods. The confluence of ideas that motivated and influenced the development of a coherent view of test validity as the explanation for the test score variation and validation is the process of developing and testing the explanation guided by abductive methods and inference to the best explanation. This description also includes a new re-interpretation of true scores in classical test theory afforded by the author’s measure-theoretic mental test theory development—for a particular test-taker, the variation in observed test-taker scores includes measurement error and variation attributable to the different ecological testing settings, which aligns with the explanation-focused view wherein item and test performance are the object of explanatory analyses. The final main section of the essay describes several methodological innovations in explanation-focused validity that are in response to the tensions and changes in assessment in the last 25 years.

References

  • Addey, C., Maddox, B., & Zumbo, B.D. (2020) Assembled validity: Rethinking Kane’s argument-based approach in the context of International Large-Scale Assessments (ILSAs), Assessment in Education: Principles, Policy & Practice, 27(6), 588-606. https://doi.org/10.1080/0969594X.2020.1843136
  • American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1974). Standards for educational and psychological tests. American Psychological Association.
  • American Educational Research Association, American Psychological Association, and National Council on Measurement in Education [AERA, APA, & NCME]. (1999). Standards for educational and psychological testing. American Educational Research Association.
  • American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association. https://www.testingstandards.net/open-access-files.html
  • American Psychological Association. (1954). Technical recommendations for psychological tests and diagnostic techniques. Psychological Bulletin, 51(2, Pt.2), 1 38. https://doi.org/10.1037/h0053479
  • Anastasi, A. (1950). The concept of validity in the interpretation of test scores. Educational and Psychological Measurement, 10, 67–78. https://doi.org/10.1177/001316445001000105
  • Anastasi, A. (1954). Psychological testing (1st ed.). Macmillan.
  • Angoff, W.H. (1988). Validity: An evolving concept. In: H. Wainer & H.I. Braun (Eds.), Test validity (pp. 19-32). Lawrence Erlbaum Associates.
  • Bazire, M., & Brézillon, P. (2005). Understanding Context Before Using It. In: Dey, A., Kokinov, B., Leake, D., Turner, R. (eds) modeling and using context. CONTEXT 2005. Lecture notes in computer science, vol. 3554. Springer. https://doi.org/10.1007/11508373_3
  • Bingham, W.V. (1937). Aptitudes and aptitude testing. Harper.
  • Borsboom, D., Mellenbergh, G.J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061 1071. https://doi.org/10.1037/0033 295X.111.4.1061
  • Borsboom, D., Cramer, A.O.J., Kievit, R.A., Scholten, A.Z., & Franić, S. (2009). The end of construct validity. In R.W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 135–170). IAP Information Age Publishing.
  • Bronfenbrenner, U. (1979). The ecology of human development. Harvard University Press.
  • Bronfenbrenner, U. (1994). Ecological models of human development. In T. Huston & T.N. Postlethwaith (Eds.), International enclyclopedia of education, 2nd ed., Vol. 3 (pp. 1643-1647). Elsevier Science.
  • Buckingham, B.R. (1921). Intelligence and its measurement: A symposium. Journal of Educational Psychology, 12, 271–275.
  • Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant validation by the multitrait multimethod matrix. Psychological Bulletin, 56(2), 81 105. https://doi.org/10.1037/h0046016
  • Carnap R. (1935). Philosophy and logical syntax. American Mathematical Society.
  • Chen, M.Y., & Zumbo, B.D. (2017). Ecological framework of item responding as validity evidence: An application of multilevel DIF modeling using PISA data. In: Zumbo, B., Hubley, A. (eds) Understanding and investigating response processes in validation research. Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_4
  • ChoGlueck, C. (2018). The error is in the gap: Synthesizing accounts for societal values in science. Philosophy of Science, 85(4), 704-725. https://doi.org/10.1086/699191
  • Clark, A. (1998). Being there: Putting brain, body, and world together again. MIT press.
  • Clark, A. (2011). Supersizing the mind: Embodiment, action, and cognitive extension. Oxford University Press.
  • Courtis, S.A. (1921). Report of the standardization committee. Journal of Educational Research, 4(1), 78–90.
  • Cronbach, L.J. (1971). Test validation. In: R.L. Thorndike (ed.) Educational measurement, 2nd ed. (pp. 443-507). American Council on Education.
  • Cronbach, L.J. (1988). Five perspectives on the validity argument. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 3–17). Lawrence Erlbaum Associates, Inc.
  • Cronbach, L.J. (1989). Construct validation after thirty years. In R.L. Linn (ed.) Intelligence: Measurement, theory, and public policy: Proceedings of a symposium in honor of Lloyd G. Humphreys (pp. 147-171). University of Illinois Press.
  • Cronbach, L.J., & Meehl, P.E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957
  • Danziger, K. (1990). Constructing the subject: Historical origins of psychological research. Cambridge University Press. https://doi.org/10.1017/CBO9780511524059
  • de Ayala, R.J. (2009). [Review of Handbook of Statistics, Volume 26: Psychometrics, by C.R. Rao & S. Sinharay]. Journal of the American Statistical Association, 104(487), 1281–1283. http://www.jstor.org/stable/40592308
  • Dewey, J. (1938). Logic: the theory of inquiry. Holt.
  • Douglas H. (2000) Inductive risk and values in science. Philosophy of Science, 67, 559–79. https://doi.org/10.1086/392855
  • Douglas, H. (2003). The Moral Responsibilities of Scientists (Tensions between Autonomy and Responsibility). American Philosophical Quarterly, 40(1), 59 68. http://www.jstor.org/stable/20010097
  • Douglas, H. (2004). The Irreducible Complexity of Objectivity. Synthese 138, 453–473. https://doi.org/10.1023/B:SYNT.0000016451.18182.91
  • Douglas, H. (2009). Science, policy, and the value-free ideal. University of Pittsburgh Press.
  • Douglas, H. (2016), Values in science. In P. Humphries (ed.), The Oxford Handbook of Philosophy of Science (pp. 609 630). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199368815.013.28
  • Eid, M. (1996). Longitudinal confirmatory factor analysis for polytomous item responses: Model definition and model selection on the basis of stochastic measurement theory. Methods of Psychological Research Online, 1(4), 65-85.
  • Eid, M. (2000). A multitrait-multimethod model with minimal assumptions. Psychometrika, 65, 241-261. https://doi.org/10.1007/BF02294377
  • Elliott, K. (2011). Is a little pollution good for you?: incorporating societal values in environmental research. Oxford University Press.
  • Embretson S.E. (Whitely). (1983). Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93(1), 179–197. https://doi.org/10.1037/0033-2909.93.1.179
  • Embretson, S. (1984). A general latent trait model for response processes. Psychometrika, 49(2), 175–186. https://doi.org/10.1007/BF02294171
  • Embretson, S. (1993). Psychometric models for learning and cognitive processes. In N. Frederiksen, R.J., Mislevy, & I.I. Bejar (Eds.), Test theory for a new generation of tests (pp. 125– 150). Erlbaum.
  • Embretson, S.E. (1998). A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3(3), 380 396. https://doi.org/10.1037/1082-989X.3.3.380
  • Embretson, S.E. (2007). Construct validity: A universal validity system or just another test evaluation procedure? Educational Researcher, 36(8), 449 455. https://doi.org/10.3102/0013189X07311600
  • Embretson, S.E. (2016), Understanding Examinees’ Responses to Items: Implications for Measurement. Educational Measurement: Issues and Practice, 35, 6 22. https://doi.org/10.1111/emip.12117
  • Embretson, S., Schneider, L.M., & Roth, D.L. (1986). Multiple processing strategies and the construct validity of verbal reasoning tests. Journal of Educational Measurement, 23, 13–32. https://doi.org/10.1111/j.1745-3984.1986.tb00231.x
  • Fine, A.I. (1984). The natural ontological attitude (pp. 261-277). In J. Leplin (ed.), Scientific realism. University of California Press.
  • Fox, J., Pychyl, T., & Zumbo, B.D. (1997). An investigation of background knowledge in the assessment of language proficiency. In A. Huhta, V. Kohonen, L. Kurki-Suonio, & S. Luoma, (Eds.), Current developments and alternatives in language assessment: Proceedings of LTRC 1996 (pp. 367 – 383). University of Jyvaskyla Press.
  • Friedman, M. (1974). Explanation and scientific understanding. The Journal of Philosophy, 71(1), 5–19. https://doi.org/10.2307/2024924
  • Galupo, M.P., Mitchell, R.C., & Davis, K.S. (2018). Face validity ratings of sexual orientation scales by sexual minority adults: Effects of sexual orientation and gender identity. Archives of Sexual Behavior, 47(4), 1241–1250. https://doi.org/10.1007/s10508-017-1037-y
  • Geiser, C., & Lockhart, G. (2012). A comparison of four approaches to account for method effects in latent state trait analyses. Psychological Methods, 17(2), 255 283. https://doi.org/10.1037/a0026977
  • Giere, R.N. (1999). Science without Laws. University of Chicago Press.
  • Giere, R.N. (2006). Scientific perspectivism. University of Chicago Press. https://doi.org/10.7208/chicago/9780226292144.001.0001
  • Giere, R.N. (2010). Explaining science: A cognitive approach. University of Chicago Press.
  • Gigerenzer, G., Swijtink, Z.G., Porter, T.M., Daston, L., Beatty, J., & Krüger, L. (1989). The empire of chance: How probability changed science and everyday life. Cambridge University Press.
  • Goffman, E. (1959). The presentation of self in everyday life. Doubleday.
  • Goffman, E. (1964). The Neglected Situation. American Anthropologist, 66(6), 133–136. http://www.jstor.org/stable/668167
  • Goldstein, H. (1980). Dimensionality, bias, independence and measurement scale problems in latent trait test score models. British Journal of Mathematical and Statistical Psychology, 33(2), 234–246. https://doi.org/10.1111/j.2044-8317.1980.tb00610.x
  • Goldstein, H. (1994). Recontextualizing mental measurement. Educational Measurement: Issues and Practice, 12(1), 16-19, 43.
  • Goldstein H. (1995). Multilevel statistical models (2nd edition). Edward Arnold/Halstead Press.
  • Goldstein, H., & Wood, R. (1989). Five decades of item response modelling. British Journal of Mathematical and Statistical Psychology, 42(2), 139 167. https://doi.org/10.1111/j.2044-8317.1989.tb00905.x
  • Green, B. F. (1990). A comprehensive assessment of measurement. Contemporary Psychology, 35, 850-851.
  • Green, C.D. (2015). Why psychology isn’t unified, and probably never will be. Review of General Psychology, 19(3), 207-214. https://doi.org/10.1037/gpr0000051
  • Guilford, J.P. (1946). New standards for test evaluation. Educational and Psychological Measurement, 6(4), 427-438. https://doi.org/10.1177/001316444600600401
  • Guion, R.M. (1980). On trinitarian doctrines of validity. Professional Psychology, 11(3), 385–398. https://doi.org/10.1037/0735-7028.11.3.385
  • Gulliksen, H. (1950a). Intrinsic validity. American Psychologist, 5(10), 511 517. https://doi.org/10.1037/h0054604
  • Gulliksen, H. (1950b). Theory of mental tests. John Wiley & Sons Inc. https://doi.org/10.1037/13240-000
  • Gulliksen, H. (1961). Measurement of learning and mental abilities. Psychometrika 26, 93–107. https://doi.org/10.1007/BF02289688
  • Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255–282. https://doi.org/10.1007/BF02288892
  • Haig, B.D. (1999). Construct validation and clinical assessment. Behaviour Change, 16, 64 - 73.
  • Haig, B.D. (2005a). Exploratory factor analysis, theory generation, and scientific method. Multivariate Behavioral Research, 40(3), 303-329.
  • Haig, B.D. (2005b). An abductive theory of scientific method. Psychological Methods, 10(4), 371–388. https://doi.org/10.1037/1082-989X.10.4.371
  • Haig, B.D. (2009). Inference to the best explanation: A neglected approach to theory appraisal in psychology. The American journal of psychology, 122(2), 219-234.
  • Haig, B.D. (2014). Investigating the psychological world: Scientific method in the behavioral sciences. MIT Press.
  • Haig, B.D. (2018). Exploratory factor analysis, theory generation, and scientific method (pp. 65-88). In: Method matters in psychology. Studies in applied philosophy, epistemology and rational ethics, vol 45. Springer, Cham.
  • Haig, B.D. (2019). The importance of scientific method for psychological science. Psychology, Crime & Law, 25(6), 527–541. https://doi.org/10.1080/1068316X.2018.1557181
  • Haig, B.D. (in press). Repositioning construct validity theory: From nomological networks to pragmatic theories, and their evaluation by expiatory means. Perspectives on Psychological Science.
  • Haig, B.D., & Evers, C.W. (2016). Realist inquiry in social science. Sage.
  • Hattie, J., & Leeson, H. (2013). Future directions in assessment and testing in education and psychology. In K.F. Geisinger, B.A. Bracken, J.F. Carlson, J.-I. C. Hansen, N.R. Kuncel, S.P. Reise, & M.C. Rodriguez (Eds.), APA handbook of testing and assessment in psychology, vol. 3. testing and assessment in school psychology and education (pp. 591–622). American Psychological Association. https://doi.org/10.1037/14049-028
  • Hempel, C.G. (1965). Aspects of scientific explanation and other essays in the philosophy of science. The Free Press.
  • Hicks, D.J. (2014). A new direction for science and values. Synthese, 191(14), 3271–3295. http://www.jstor.org/stable/24026188
  • Higgins, N.C., Zumbo, B.D., & Hay, J.L. (1999). Construct validity of attributional style: Modeling context-dependent item sets in the attributional style questionnaire. Educational and Psychological Measurement, 59(5), 804 820. https://doi.org/10.1177/00131649921970152
  • Holman, B., & Wilholt, T. (2022). The new demarcation problem. Studies in history and philosophy of science, 91, 211-220. https://doi.org/10.1016/j.shpsa.2021.11.011
  • Hubley, A.M., & Zumbo, B.D. (1996). A dialectic on validity: Where we have been and where we are going. The Journal of General Psychology, 123(3), 207 215. https://doi.org/10.1080/00221309.1996.9921273
  • Hubley, A.M., & Zumbo, B.D. (2011). Validity and the consequences of test interpretation and use. Social Indicators Research, 103(2), 219–230. https://doi.org/10.1007/s11205-011-9843-4
  • Hubley, A.M., & Zumbo, B.D. (2013). Psychometric characteristics of assessment procedures: An overview. In Kurt F. Geisinger (Ed.), APA Handbook of Testing and Assessment in Psychology, 1 (pp. 3 19). American Psychological Association Press. https://doi.org/10.1037/14047-001
  • Hubley, A.M., & Zumbo, B.D. (2017). Response processes in the context of validity: Setting the stage. In B.D. Zumbo & A.M. Hubley (Eds.), Understanding and investigating response processes in validation research (pp. 1–12). Springer International Publishing/Springer Nature. https://doi.org/10.1007/978-3-319-56129-5_1
  • Hull, C.L. (1935). The conflicting psychologies of learning: A way out. Psychological Review. 42(6), 491–516. https://doi.org/10.1037/h0058665
  • Jonson, J.L., & Plake, B.S. (1998). A historical comparison of validity standards and validity practices. Educational and Psychological Measurement, 58(5), 736 753. https://doi.org/10.1177/0013164498058005002
  • Kaldis, B. (2013). Kinds: natural kinds versus human kinds. In Encyclopedia of Philosophy and the Social Sciences,2, (pp. 515 518). SAGE Publications, Inc. https://doi.org/10.4135/9781452276052
  • Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 527–535. https://doi.org/10.1037/0033-2909.112.3.527
  • Kane, M. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319-342. https://doi.org/10.1111/j.1745-3984.2001.tb01130.x
  • Kane, M. (2004). Certification testing as an illustration of argument-based validation. Measurement: Interdisciplinary Research and Perspective, 2(3), 135 170. https://doi.org/10.1207/s15366359mea0203_1
  • Kane, M. (2006). Validation. In R. Brennan (Ed.) Educational measurement (4th ed., pp. 17–64). American Council on Education and Praeger.
  • Kane, M. (2012). Validating score interpretations and uses. Language Testing, 29(1), 3-17. https://doi.org/10.1177/0265532211417210
  • Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1-73. https://doi.org/10.1111/jedm.12000
  • Kane, M. (2016). Explicating validity. Assessment in Education: Principles, Policy & Practice, 23(2), 198–211. https://doi.org/10.1080/0969594X.2015.1060192
  • Kincaid, H. (2000). Global arguments and local realism about the social sciences. Philosophy of Science, 67(S3), S667-S678. https://doi.org/10.1086/392854
  • Koch, T., Eid, M., & Lochner, K. (2018). Multitrait-multimethod-analysis: The psychometric foundation of CFA-MTMM models. In P. Irwing, T. Booth, & D.J. Hughes (Eds.), The Wiley handbook of psychometric testing: A multidisciplinary reference on survey, scale and test development (pp. 781 846). Wiley Blackwell. https://doi.org/10.1002/9781118489772.ch25
  • Koch, T., Schultze, M., Eid, M., & Geiser, C. (2014). A longitudinal multilevel CFA-MTMM model for interchangeable and structurally different methods. Frontiers in Psychology, 5, Article 311. https://doi.org/10.3389/fpsyg.2014.00311
  • Kroc, E., & Zumbo, B.D. (2018). Calibration of measurements. Journal of Modern Applied Statistical Methods, 17(2), eP2780. https://digitalcommons.wayne.edu/jmasm/vol17/iss2/17/
  • Kroc, E., & Zumbo, B.D. (2020). A transdisciplinary view of measurement error models and the variations of X= T+ E. Journal of Mathematical Psychology, 98, 102372. https://doi.org/10.1016/j.jmp.2020.102372
  • Kuhn, T.S. (1962). The structure of scientific revolutions. University of Chicago Press.
  • Kuhn, T.S. (1970). The structure of scientific revolutions (2nd ed.). University of Chicago Press.
  • Kuhn, T.S. (1977). The essential tension: Selected studies in scientific tradition and change. University of Chicago Press.
  • Kuhn, T.S. (1996). The structure of scientific revolutions (3rd ed.). University of Chicago Press.
  • Lakatos I. (1976). Falsification and the methodology of scientific research programmes. Can theories be refuted? (pp. 205–259). Springer.
  • Lane, S., Zumbo, B.D., Abedi, J., Benson, J., Dossey, J., Elliott, S.N., Kane, M., Linn, R., Paredes-Ziker, C., Rodriguez, M., Schraw, G., Slattery, J., Thomas, V., & Willhoft, J. (2009). Prologue: An Introduction to the Evaluation of NAEP. Applied Measurement in Education, 22(4), 309-316. https://doi.org/10.1080/08957340903221436
  • Lennon, R.T. (1956). Assumptions Underlying the Use of Content Validity. Educational and Psychological Measurement, 16(3), 294 304. https://doi.org/10.1177/001316445601600303
  • Lewis, C. (1986). Test theory and psychometrika: The past twenty-five years. Psychometrika, 51(1), 11–22. https://doi.org/10.1007/BF02293995
  • Li, Z., & Zumbo, B.D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score data. Psicológica, 30(2), 343–370. https://www.uv.es/psicologica/articulos2.09/11LI.pdf
  • Lipton, P. (2004). Inference to the best explanation (2nd ed.). Routledge. https://doi.org/10.4324/9780203470855
  • Lissitz, R.W., & Samuelsen, K. (2007). A suggested change in terminology and emphasis regarding validity and education. Educational Researcher, 36(8), 437–448. https://doi.org/10.3102/0013189X07311286
  • Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Addison-Wesley.
  • Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635-694 (Monograph Supp. 9).
  • Maddox, B. (2015). The neglected situation: assessment performance and interaction in context. Assessment in Education: Principles, Policy & Practice, 22(4), 427-443. https://doi.org/10.1080/0969594X.2015.1026246
  • Maddox, B., Zumbo, B.D. (2017). Observing testing situations: Validation as Jazz. In: B.D. Zumbo, A.M. Hubley (eds) Understanding and investigating response processes in validation research. Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_10
  • Maddox, B., Zumbo, B.D., Tay-Lim, B. S.-H., & Demin Qu, I. (2015). An anthropologist among the psychometricians: Assessment events, ethnography and DIF in the Mongolian Gobi. International Journal of Testing, 15(4), 291 309. https://doi.org/10.1080/15305058.2015.1017103
  • Markus, K.A. (1998). Science, measurement, and validity: Is completion of Samuel Messick's synthesis possible?. Social Indicators Research, 45, 7 34. https://doi.org/10.1023/A:1006960823277
  • MacCorquodale, K., & Meehl, P.E. (1948). On a distinction between hypothetical constructs and intervening variables. Psychological Review, 55(2), 95 107. https://doi.org/10.1037/h0056029
  • Mehrens, W.A. (1997). The consequences of consequential validity. Educational Measurement: Issues and Practice, 16(2), 16-18.
  • Messick, S. (1972). Beyond structure: In search of functional models of psychological process. Psychometrika, 37(4, Pt. 1), 357–375. https://doi.org/10.1007/BF02291215
  • Messick, S. (1975). The standard problem: Meaning and values in measurement and evaluation. American Psychologist, 30, 955- 966.
  • Messick, S. (1980). Test validity and the ethics of assessment. American Psychologist, 35, 1012-1027.
  • Messick, S. (1988). The once and future issues of validity: Assessing the meaning and consequences of measurement. In: H. Wainer & H.I. Braun (Eds.), Test validity (pp. 33-45). Lawrence Erlbaum Associates.
  • Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). Macmillan.
  • Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749. https://doi.org/10.1037/0003-066X.50.9.741
  • Messick, S. (1998). Test validity: A matter of consequence [Special issue]. Social Indicators Research, 45, 35-44. https://doi.org/10.1023/A:1006964925094
  • Messick, S. (2000). Consequences of test interpretation and use: The fusion of validity and values in psychological assessment. In: Goffin, R.D., Helmes, E. (eds) Problems and solutions in human assessment. Springer. https://doi.org/10.1007/978-1-4615-4397-8_1
  • Millman, J. (1979). Reliability and validity of criterion-referenced test scores. In: R. Traub (Ed.), New directions for testing and measurement: Methodological developments. Jossey-Bass.
  • Mosier, C.I. (1947). A critical examination of the concepts of face validity. Educational and Psychological Measurement, 7(2), 191 205. https://doi.org/10.1177/001316444700700201
  • Nickles, T. (2017). Cognitive illusions and nonrealism: Objections and replies. In: Agazzi, E. (eds) Varieties of Scientific Realism: Objectivity and truth in science (pp. 151–163). Springer, Cham. https://doi.org/10.1007/978-3-319-51608-0_8
  • Novick, M.R. (1966). The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3(1), 1–18. https://doi.org/10.1016/0022-2496(66)90002-2
  • O'Leary, T.M., Hattie, J.A.C., & Griffin, P. (2017). Actual interpretations and use of scores as aspects of validity. Educational Measurement: Issues and Practice, 36, 16-23. https://doi.org/10.1111/emip.12141
  • Padilla, J.L., & Benítez, I. (2014). Validity evidence based on response processes. Psicothema, 26, 136–144. https://doi.org/10.7334.psicothema2013.259
  • Padilla, J.L., & Benítez, I. (2017). A rationale for and demonstration of the use of DIF and mixed methods. In: Zumbo, B.D., Hubley, A.M. (eds) Understanding and investigating response processes in validation research (pp. 193–210). Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_1
  • Pellicano, E., & den Houting, J. (2022). Annual research review: Shifting from “normal science” to neurodiversity in autism science. Journal of Child Psychology and Psychiatry, 63, 381–396. https://doi.org/10.1111/jcpp.13534
  • Persson, J., & Ylikoski, P. (Eds.). (2007). Rethinking explanation (Boston Studies in the Philosophy of Science, Vol. 252). Springer.
  • Pitt, J.C. (Ed.) (1988). Theories of explanation. Oxford University Press.
  • Popham, W.J. (1997). Consequential validity: Right concern – wrong concept. Educational Measurement: Issues and Practice, 16(2), 9-13.
  • Psillos, S. (2022). Realism and theory change in science. In: Zalta, E.N., Nodelman, U. (eds.) The Stanford encyclopedia of philosophy. https://plato.stanford.edu/archives/fall2022/entries/realism-theory-change/
  • Rao, C.R., & Sinharay, S. (Eds.). (2007). Handbook of statistics, Volume 26: Psychometrics. Elsevier.
  • Raykov, T. (1992), On structural models for analyzing change. Scandinavian Journal of Psychology, 33, 247-265. https://doi.org/10.1111/j.1467-9450.1992.tb00914.x
  • Raykov, T. (1998a). Coefficient alpha and composite reliability with interrelated nonhomogeneous items. Applied Psychological Measurement, 22(4), 375-385. https://doi.org/10.1177/014662169802200407
  • Raykov, T. (1998b). A method for obtaining standard errors and confidence intervals of composite reliability for congeneric items. Applied Psychological Measurement, 22(4), 369-374. https://doi.org/10.1177/014662169802200406
  • Raykov, T. (1999). Are simple change scores obsolete? An approach to studying correlates and predictors of change. Applied Psychological Measurement, 23(2), 120-126. https://doi.org/10.1177/01466219922031248
  • Raykov, T. (2001), Estimation of congeneric scale reliability using covariance structure analysis with nonlinear constraints. British Journal of Mathematical and Statistical Psychology, 54, 315-323. https://doi.org/10.1348/000711001159582
  • Raykov, T., & Marcoulides, G.A. (2011). Introduction to psychometric theory. Routledge.
  • Raykov, T., & Marcoulides, G.A. (2016). On the relationship between classical test theory and item response theory: From one to the other and back. Educational and Psychological Measurement, 76(2), 325–338. https://doi.org/10.1177/0013164415576958
  • Reichenbach H. (1977). Philosophie der Raum-Zeit-Lehre. In: Kamlah, A., Reichenbach, M. (eds) Philosophie der Raum-Zeit-Lehre. Hans Reichenbach, vol 2. Vieweg+Teubner Verlag, Wiesbaden.
  • Roberts, B.W. (2007). Contextualizing personality psychology. Journal of Personality, 75(6), 1071–1082. https://doi.org/10.1111/j.1467-6494.2007.00467.x
  • Rome, L., & Zhang, B. (2018). Investigating the effects of differential item functioning on proficiency classification. Applied psychological measurement, 42(4), 259–274. https://doi.org/10.1177/0146621617726789
  • Rozeboom, W.W. (1966). Foundations of the theory of prediction. Dorsey.
  • Rulon, P.J. (1946). On the validity of educational tests. Harvard Educational Review, 16, 290-296.
  • Salmon, W. (1990). Four decades of scientific explanation. University of Minnesota Press.
  • Schaffner, K.F. (2020). A comparison of two neurobiological models of fear and anxiety: A “construct validity” application? Perspectives on Psychological Science, 15(5), 1214-1227. https://doi.org/10.1177/1745691620920860
  • Schaffner, K.F. (1993). Discovery and explanation in biology and medicine. University of Chicago Press.
  • Searle, J.R. (1969). Speech acts: An essay in the philosophy of language. Cambridge University Press.
  • Searle, J.R. (1979). Expression and meaning: Studies in the theory of speech acts. Cambridge University Press. https://doi.org/10.1017/CBO9780511609213
  • Sells, S.B. (ed.) (1963). Stimulus determinants of behavior. Ronald Press.
  • Shear, B.R., Zumbo, B.D. (2014). What counts as evidence: A review of validity studies in educational and psychological measurement. In: Zumbo, B.D., Chan, E.K.H. (eds) Validity and validation in social, behavioral, and health sciences (pp. 91-111). Springer, Cham. https://doi.org/10.1007/978-3-319-07794-9_6
  • Shepard, L.A. (1993). Evaluating test validity. Review of Research in Education, 19(1), 405-450. https://doi.org/10.3102/0091732X019001405
  • Shepard, L.A. (1997). The centrality of test use and consequences for test validity. Educational Measurement: Issues and Practice, 16, 5-8, 13, 24.
  • Sinnott-Armstrong, W., & Fogelin, R.J. (2010). Understanding arguments: An introduction to informal logic. Wadsworth Cengage Learning.
  • Sireci, S.G. (1998). The construct of content validity [Special issue]. Social Indicators Research 45, 83–117. https://doi.org/10.1023/A:1006985528729
  • Sireci, S.G. (2009). Packing and unpacking sources of validity evidence: History repeats itself again. In R.W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 19–37). IAP Information Age Publishing.
  • Sireci, S.G. (2013). Agreeing on validity arguments. Journal of Educational Measurement, 50, 99-104. https://doi.org/10.1111/jedm.12005
  • Sireci, S.G. (2020). De-“constructing” test validation. Chinese/English Journal of Educational Measurement and Evaluation, 1(1), Article 3. https://www.ce jeme.org/journal/vol1/iss1/3
  • Slaney, K.L., & Racine, T.P. (2013). What’s in a name? Psychology’s ever evasive construct. New Ideas in Psychology, 31(1), 4 12. https://doi.org/10.1016/j.newideapsych.2011.02.003
  • Spearman, C. (1904). The proof and measurement of association between two things. The American Journal of Psychology, 15(1), 72–101. https://doi.org/10.2307/1412159
  • Steyer, R. (1988). Conditional expectations: An introduction to the concept and its applications in empirical sciences. Methodika, 2, 53-78.
  • Steyer, R. (1989). Models of classical psychometric test theory as stochastic measurement models: representation, uniqueness, meaningfulness, identifiability, and testability. Methodika, 3, 25-60.
  • Steyer, R., Ferring, D., & Schmitt, M.J. (1992). States and traits in psychological assessment. European Journal of Psychological Assessment, 8(2), 79–98.
  • Steyer, R., Majcen, A.-M., Schwenkmezger, P., & Buchner, A. (1989). A latent state-trait anxiety model and its application to determine consistency and specificity coefficients. Anxiety Research, 1(4), 281–299. https://doi.org/10.1080/08917778908248726
  • Steyer, R., & Schmitt, M. (1990). Latent state-trait models in attitude research. Quality & Quantity, 24, 427–445. https://doi.org/10.1007/BF00152014
  • Steyer, R., Schmitt, M., & Eid, M. (1999). Latent state–trait theory and research in personality and individual differences. European Journal of Personality, 13(5), 389-408. https://doi.org/10.1002/(SICI)1099 0984(199909/10)13:5<389::AID PER361>3.0.CO;2-A
  • Stone, J., & Zumbo, B.D. (2016). Validity as a pragmatist project: A global concern with local application. In: Aryadoust V., & Fox J. (eds.) Trends in language assessment research and practice (pp. 555–573). Cambridge Scholars Publishing.
  • Suppes, P. (1969). Models of data. In: Studies in the methodology and foundations of science. Synthese Library, vol 22. Springer. https://doi.org/10.1007/978-94-017-3173-7_2
  • Thagard, P. (1989). Explanatory coherence. Behavioral and Brain Sciences, 12(3), 435-467. https://doi.org/10.1017/S0140525X00057046
  • Thagard, P. (1992). Conceptual revolutions. Princeton University Press. http://www.jstor.org/stable/j.ctv36zq4g
  • Tolman, C.W. (1991). Review of constructing the subject: Historical origins of psychological research [Review of the book Constructing the subject: Historical origins of psychological research, by K. Danziger]. Canadian Psychology, 32(4), 650–652. https://doi.org/10.1037/h0084651
  • Toulmin, S. (1958). The uses of argument. Cambridge University Press.
  • van Fraassen, B.C. (1980). The scientific image. Oxford University Press. https://doi.org/10.1093/0198244274.001.0001
  • van Fraassen, B.C. (1985). Empiricism in the philosophy of science. In: Churchland P.M., & Hooker C.A. (eds.) Images of science: Essays on realism and empiricism (pp. 245-308). University of Chicago Press.
  • van Fraassen, B.C. (2008). Scientific representation: Paradoxes of perspective. Oxford University Press.
  • van Fraassen, B.C. (2012). Modeling and measurement: The criterion of empirical grounding. Philosophy of Science, 79(5), 773–784. https://doi.org/10.1086/667847
  • Varela, F.J., Thompson, E., & Rosch, E. (1991). The embodied mind: Cognitive science and human experience. The MIT Press. https://doi.org/10.7551/mitpress/6730.001.0001
  • Wallin, A. (2007). Explanation and environment. In: Persson, J., Ylikoski, P. (eds) Rethinking explanation. Boston studies in the philosophy of science, (pp. 163-175), vol 252. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-5581-2_12
  • Wapner, S., & Demick, J. (2002). The increasing contexts of context in the study of environment behavior relations. In R.B. Bechtel & A. Churchman (eds.) Handbook of environmental psychology (pp. 3–14). John Wiley & Sons, Inc.
  • Watson, J.B. (1913). Psychology as the behaviorist views it. Psychological Review, 20(2), 158–177. https://doi.org/10.1037/h0074428
  • Whitely (Embretson), S.E. (1977). Information-processing on intelligence test items: Some response components. Applied Psychological Measurement, 1, 465 476. https://doi.org/10.1177/014662167700100402
  • Wiley, D.E. (1991). Test validity and invalidity reconsidered. In: R.E. Snow & D.E. Wiley (Eds.), Improving inquiry in social science: a volume in honor of Lee J. Cronbach (pp. 75-107). Erlbaum.
  • Woitschach, P., Zumbo, B.D., & Fernández-Alonso, R. (2019). An ecological view of measurement: Focus on multilevel model explanation of differential item functioning. Psicothema, 31(2), 194–203. https://doi.org/10.7334/psicothema2018.303
  • Woodward, J. (1989). Data and phenomena. Synthese, 79, 393 472. https://doi.org/10.1007/BF00869282
  • Wu, A.D., & Zumbo, B.D. (2008). Understanding and using mediators and moderators. Social Indicators Research, 87, 367–392. https://doi.org/10.1007/s11205-007-9143-1
  • Wu, A.D., Zumbo, B.D., & Marshall, S.K. (2014). A method to aid in the interpretation of EFA results: An application of Pratt’s measures. International Journal of Behavioral Development, 38(1), 98-110. https://doi.org/10.1177/0165025413506143
  • Yang, Y., Read, S.J., & Miller, L.C. (2009). The concept of situations. Social and Personality Psychology Compass, 3(6), 1018 1037. https://doi.org/10.1111/j.1751 9004.2009.00236.x
  • Zimmerman, D.W. (1975). Probability spaces, Hilbert spaces, and the axioms of test theory. Psychometrika, 40(3), 395-412. https://doi.org/10.1007/BF02291765
  • Zimmerman, D.W., & Zumbo, B.D. (2001). The geometry of probability, statistics, and test theory. International Journal of Testing, 1(3 4), 283 303. https://doi.org/10.1080/15305058.2001.9669476
  • Zumbo, B.D. (Ed.). (1998). Validity theory and the methods used in validation: perspectives from the social and behavioral sciences. In: Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, [Special volume], Vol. 45, Issues 1-3. Springer International Publishing.
  • Zumbo, B.D. (1999). The simple difference score as an inherently poor measure of change: Some reality, much mythology. Advances in social science methodology, 5(1), 269-304.
  • Zumbo, B.D. (2005, July). Reflections on validity at the intersection of psychometrics, scaling, philosophy of inquiry, and language testing [Samuel J. Messick Memorial Award Lecture]. LTRC, the 27th Language Testing Research Colloquium, Ottawa, Canada.
  • Zumbo, B.D. (2007a). Validity: Foundational Issues and Statistical Methodology. In C.R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26, pp. 45–79). Elsevier.
  • Zumbo, B.D. (2007b). Three Generations of DIF Analyses: Considering Where It Has Been, Where It Is Now, and Where It Is Going. Language Assessment Quarterly, 4(2), 223-233. https://doi.org/10.1080/15434300701375832
  • Zumbo, B.D. (2009). Validity as contextualized and pragmatic explanation, and its implications for validation practice. In R.W. Lissitz (ed.) The concept of validity: Revisions, new directions, and applications (pp. 65–82). IAP Information Age Publishing.
  • Zumbo, B.D. (2010, September). Measurement validity and validation: A meditation on where we have come from and the state of the art today [Invited address]. Presented at the International conference on outcomes measurement, US National Institutes of Health, Bethesda, MD.
  • Zumbo, B.D. (2015, November). Consequences, side effects and the ecology of testing: Keys to considering assessment “in vivo” [Plenary address]. Annual Meeting of the Association for Educational Assessment – Europe (AEAEurope), Glasgow, Scotland. https://youtu.be/0L6Lr2BzuSQ
  • Zumbo, B.D. (2016). Standard Setting Methodology [Invited address]. “Applied Physiology Physical Employment Standards - Current Issues and Challenges” at the Canadian Society for Exercise Physiology (CSEP) conference, Victoria, Canada.
  • Zumbo, B.D. (2017). Trending away from routine procedures, toward an ecologically informed in vivo view of validation practices. Measurement: Interdisciplinary Research and Perspectives, 15(3-4), 137–139. https://doi.org/10.1080/15366367.2017.1404367
  • Zumbo, B.D. (2018a, April). Methodologies used to ensure fairness and equity in the assessment of students’ educational outcomes [Invited presentation and panel session]. AERA Presidential Symposium “Methodology and equity: An international perspective” at the Annual Meeting of the American Educational Research Association (AERA), New York, NY.
  • Zumbo, B.D. (2018b, July). The reports of DIF’s death are greatly exaggerated; It is like a Phoenix rising from the ashes [Keynote Address]. The 11th Conference of the International Test Commission, Montreal, Canada.
  • Zumbo, B.D. (2019). Foreword: Tensions, Intersectionality, and What Is on the Horizon for International Large-Scale Assessments in Education. In B. Maddox (Ed.), International large-scale assessments in education: Insider research perspectives (pp. xii–xiv). Bloomsbury Publishing. https://doi.org/10.5040/9781350023635
  • Zumbo, B.D. (2021). A novel multimethod approach to investigate whether tests delivered at a test centre are concordant with those delivered remotely online [Research Monograph]. UBC Psychometric Research Series, University of British Columbia. http://dx.doi.org/10.14288/1.0400581
  • Zumbo, B.D. (2023a). Validity theories, frameworks and practices in using tests and measures: an over-the-shoulder look back at validity while also looking to the horizon [Invited Address]. Ciclo Formazione Metodologica (FORME), Dipartimento di Psicologia, Università Cattolica Del Sacro Cuore. https://brunozumbo.com/?page_id=31
  • Zumbo, B.D. (2023b). Test validation and Bayesian statistical frameworks to estimate the magnitude and corresponding uncertainty of washback effects of test preparation [Research Monograph]. UBC Psychometric Research Series, University of British Columbia. https://dx.doi.org/10.14288/1.0435197
  • Zumbo, B.D. (2023c, October). The Challenges and Promise of Embracing the Many Ways of Being Human: Toward an Ecologically Informed In Vivo View of Validation Practices [Invited Address]. Symposium on Inclusive Educational Assessment, Neurodiversity and Disability. Hughes Hall, University of Cambridge.
  • Zumbo, B.D., & Chan, E.K.H. (Eds.). (2014a). Validity and validation in social, behavioral, and health sciences. Springer International Publishing/Springer Nature. https://doi.org/10.1007/978-3-319-07794-9
  • Zumbo, B.D., & Chan, E.K.H. (2014b). Reflections on validation practices in the social, behavioral, and health sciences. In: Zumbo, B.D., Chan, E.K.H. (eds) Validity and validation in social, behavioral, and health sciences (pp. 321-327). Springer, Cham. https://doi.org/10.1007/978-3-319-07794-9_19
  • Zumbo, B.D., & Chan, E.K.H. (2014c). Setting the stage for validity and validation in social, behavioral, and health sciences: Trends in validation practices. In: Zumbo, B.D., Chan, E.K.H. (eds) Validity and validation in social, behavioral, and health sciences (pp. 3-8). Springer, Cham. https://doi.org/10.1007/978-3-319-07794-9_1
  • Zumbo, B.D., & Forer, B. (2011). Testing and measurement from a multilevel view: Psychometrics and validation. In J.A. Bovaird, K.F. Geisinger, & C.W. Buckendahl (Eds.), High-stakes testing in education: Science and practice in K–12 settings (pp. 177–190). American Psychological Association. https://doi.org/10.1037/12330-011
  • Zumbo, B.D., & Gelin, M.N. (2005). A matter of test bias in educational policy research: bringing the context into picture by investigating sociological/community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5, 1–23. URL: https://files.eric.ed.gov/fulltext/EJ846827.pdf
  • Zumbo, B. D., & Hubley, A. M. (2016). Bringing consequences and side effects of testing and assessment to the foreground. Assessment in Education: Principles, Policy & Practice, 23(2), 299–303. https://doi.org/10.1080/0969594X.2016.1141169
  • Zumbo, B.D., & Hubley, A.M. (Eds.). (2017). Understanding and investigating response processes in validation research. Springer International Publishing/Springer Nature. https://doi.org/10.1007/978-3-319-56129-5
  • Zumbo, B.D., & Kroc, E. (2019). A Measurement Is a Choice and Stevens’ scales of measurement do not help make it: A response to chalmers. Educational and Psychological Measurement, 79(6), 1184 1197. https://doi.org/10.1177/0013164419844305
  • Zumbo, B.D., Liu, Y., Wu, A.D., Forer, B., Shear, B.R. (2017). National and international educational achievement testing: A case of multi-level validation framed by the ecological model of item responding. In B.D. Zumbo & A.M. Hubley (Eds.), Understanding and investigating response processes in validation research (pp. 341-362). Springer International Publishing/Springer Nature. https://doi.org/10.1007/978-3-319-56129-5_18
  • Zumbo, B.D., Liu, Y., Wu, A.D., Shear, B.R., Olvera Astivia, O.L., & Ark, T.K. (2015). A methodology for Zumbo’s third generation DIF analyses and the ecology of item responding. Language Assessment Quarterly, 12(1), 136 151. https://doi.org/10.1080/15434303.2014.972559
  • Zumbo, B.D., Maddox, B., & Care, N.M. (2023). Process and product in computer-based assessments: Clearing the ground for a holistic validity framework. European Journal of Psychological Assessment, 39(4), 252–262. https://doi.org/10.1027/1015-5759/a000748
  • Zumbo, B.D., & Padilla, J.-L. (2020). The interplay between survey research and psychometrics, with a focus on validity theory. In P.C. Beatty, D. Collins, L. Kaye, J.-L. Padilla, G.B. Willis, & A. Wilmot (Eds.), Advances in questionnaire design, development, evaluation and testing (pp. 593 612). John Wiley & Sons, Inc.. https://doi.org/10.1002/9781119263685.ch24
  • Zumbo, B.D., Pychyl, T.A., & Fox, J.A. (1993). Psychometric properties of the CAEL assessment, II: An examination of the dependability/reliability of placement decisions. Carleton Papers in Applied Language Studies, 10, 13-27.
  • Zumbo, B.D., & Rupp, A.A. (2004). Responsible modeling of measurement data for appropriate inferences: important advances in reliability and validity theory. In David Kaplan (ed.) The SAGE handbook of quantitative methodology for the social sciences (pp. 74-93). SAGE Publications, Inc. https://doi.org/10.4135/9781412986311
  • Zumbo, B.D., & Shear, B.R. (2011, October). The concept of validity and some novel validation methods [Lecture/Workshop, half-day]. The 42nd annual Northeastern Educational Research Association (NERA) meeting, Rocky Hill, CT.
There are 229 citations in total.

Details

Primary Language English
Subjects Measurement Theories and Applications in Education and Psychology, Scale Development, Psychological Methodology, Design and Analysis
Journal Section Special Issue 2023
Authors

Bruno D. Zumbo 0000-0003-2885-5724

Publication Date December 27, 2023
Submission Date December 18, 2023
Acceptance Date December 19, 2023
Published in Issue Year 2023 Volume: 10 Issue: Special Issue

Cite

APA Zumbo, B. D. (2023). A dialectic on validity: Explanation-focused and the many ways of being human. International Journal of Assessment Tools in Education, 10(Special Issue), 1-96. https://doi.org/10.21449/ijate.1406304

23823             23825             23824