Likert-type item is the most popular response format for collecting data in social, educational, and psychological studies through scales or questionnaires. However, there is no consensus on whether parametric or non-parametric tests should be preferred when analyzing Likert-type data. This study examined the statistical power of parametric and non-parametric tests when each Likert-type item was analyzed independently in survey studies. The main purpose of the study is to examine the statistical power of Wilcoxon-Mann-Whitney, Welch's t, and Student's t tests for Likert-type data, which are pairwise comparison tests. For this purpose, a Monte Carlo simulation study was conducted. The statistical significance of the selected tests was examined under the conditions of sample size, group size ratio, and effect size. The results showed that the Wilcoxon-Mann-Whitney test was superior to its counterparts, especially for small samples and unequal group sizes. However, the Student's t-test for Likert-type data had similar statistical power to the Wilcoxon-Mann-Whitney test under conditions of equal group sizes when the sample size was 200 or more. Consistent with the empirical results, practical recommendations were provided for researchers on what to consider when collecting and analyzing Likert-type data.
Bindak, R. (2014). Comparison Mann-Whitney U Test and Students’ t Test in Terms of Type I Error Rate and Test Power: A Monte Carlo Sımulation Study. Afyon Kocatepe University Journal of Sciences and Engineering, 14, 5-11. https://doi.org/10.5578/fmbd.7380
Boneau, C.A. (1962). A comparison of the power of the U and t-tests. Psychological Review, 69, 246-256. https://doi.org/10.1037/h0047269
Boone, H.N., Boone, D.A. 2012. Analyzing Likert data. Journal of Extension, 50(2), 1-5. Retrieved February 20, 2023, from https://eric.ed.gov/?id=EJ1042448
Bridge, P.D., & Sawilowsky, S.S. (1999). Increasing physicians' awareness of the impact of statistics on research outcomes: comparative power of the t-test and Wilcoxon Rank-Sum test in small samples applied research. Journal of clinical epidemiology, 52(3), 229-35. https://doi.org/10.1016/S0895-4356(98)00168-1
Bulus, M. (2021). Sample size determination and optimal design of randomized/non-equivalent pretest-posttest control-group designs. Adiyaman Univesity Journal of Educational Sciences, 11(1), 48-69. https://doi.org/10.17984/adyuebd.941434
Bulus, M. (2022). Minimum detectable effect size computations for cluster-level regression discontinuity: Specifications beyond the linear functional form. Journal of Research on Education Effectiveness, 15(1), 151 177. https://doi.org/10.1080/19345747.2021.1947425
Bulus, M., & Dong, N. (2021). Bound-constrained optimization of sample sizes subject to monetary restrictions in planning multilevel randomized trials and regression discontinuity studies. The Journal of Experimental Education, 89(2), 379-401. https://doi.org/10.1080/00220973.2019.1636197
Calver, M., & Fletcher, D. (2020). When ANOVA isn't ideal: Analyzing ordinal data from practical work in biology. The American Biology Teacher, 82(5), 289-294. https://doi.org/10.1525/abt.2020.82.5.289
Carifio, J., & Perla, R. (2008). Resolving the 50-year debate around using and misusing Likert scales. Medical education, 42(12), 1150–1152. https://doi.org/10.1111/j.1365-2923.2008.03172.x
Champagne, C.A., & Curran, P.J. (2017). Using Monte Carlo simulations to demonstrate the importance of statistical power. The Journal of Educational Research, 110(6), 524-532. https://doi.org/10.1080/00220671.2015.1079697
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
de Winter, J.F., & Dodou, D. (2010). Five-point Likert items: t-test versus Mann-Whitney-Wilcoxon. Practical Assessment, Research, and Evaluation, 15(1), 11. https://doi.org/10.7275/bj1p-ts64
de Winter, J.F. (2013) Using the Student's t-test with extremely small sample sizes. Practical Assessment, Research, and Evaluation, 18, 10. https://doi.org/10.7275/e4r6-dj05
Delacre, M., Lakens, D., & Leys, C. (2017). Why Psychologists Should by Default Use Welch’s t-test Instead of Student’s t-test. International Review of Social Psychology, 30(1), 92. https://www.rips-irsp.com/articles/10.5334/irsp.661/
Derrick, B., & White, P. (2017). Comparing two samples from an individual Likert question. International Journal of Mathematics and Statistics, 18(3). Retrieved February 20, 2023, from http://www.ceser.in/ceserp/index.php/ijms/article/view/4997
Dong, N., & Maynard, R. (2013). PowerUp!: A tool for calculating minimum detectable effect sizes and minimum required sample sizes for experimental and quasi-experimental design studies. Journal of Research on Educational Effectiveness, 6(1), 24-67. https://doi.org/10.1080/19345747.2012.673143
Dwivedi, A.K., Mallawaarachchi, I., & Alvarado, L.A. (2017). Analysis of small sample size studies using non-parametric bootstrap test with pooled sampling method. Statistics in Medicine, 36, 2187 - 2205. https://doi.org/10.1002/sim.7263
Field, A. (2009). Discovering statistics using SPSS (3rd ed.). Sage publications.
Glass, G., Peckham, P., & Sanders, J. (1972). Consequences of failure to meet assumptions underlying the fixed effects analysis of variance and covariance. Review of Educational Research, 42, 237-288. https://doi.org/10.3102/00346543042003237
Harpe, S.E. (2015). How to analyze Likert and other rating scale data. Currents in Pharmacy Teaching and Learning, 7, 836-850. https://doi.org/10.1016/j.cptl.2015.08.001
Heeren, T., & D'Agostino, R.B. (1987). Robustness of the two independent samples t-test when applied to ordinal scaled data. Statistics in Medicine, 6(1), 79 90. https://doi.org/10.1002/sim.4780060110
Jamieson S. (2004). Likert scales: how to (ab)use them. Medical education, 38(12), 1217–1218. https://doi.org/10.1111/j.1365-2929.2004.02012.x
Kim, T.K., & Park, J.H. (2019). More about the basic assumptions of t-test: normality and sample size. Korean Journal of Anesthesiology, 72(4), 331 335. https://doi.org/10.4097/kja.d.18.00292
Liddell, T.M., & Kruschke, J.K. (2018). Analyzing ordinal data with metric models: What could possibly go wrong?. Journal of Experimental Social Psychology, 79, 328-348. https://doi.org/10.1016/j.jesp.2018.08.009
Ma, C., Wang, X., Xia, L., Cheng, X., & Qiu, L. (2021). Effect of sample size and the traditional parametric, non-parametric, and robust methods on the establishment of reference intervals: Evidence from real-world data. Clinical Biochemistry, 92, 67–70. https://doi.org/10.1016/j.clinbiochem.2021.03.006
Nanna, M.J., & Sawilowsky, S.S. (1998). Analysis of Likert scale data in disability and medical rehabilitation research. Psychological Methods, 3(1), 55 67. https://doi.org/10.1037/1082-989X.3.1.55
Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in Health Sciences Education, 15, 625-632. https://doi.org/10.1007/s10459-010-9222-y
Ruxton, G.D. (2006). The unequal variance Student’s t testis an underused alternative to Student’s t test and the Mann–Whitney U test. Behavioral Ecology, 17(4), 688–690. https://doi.org/10.1093/beheco/ark016
Sangthong, M. (2020). The Effect of the Likert Point Scale and Sample Size on the Efficiency of Parametric and Non-parametric Tests. Thailand Statistician, 18(1), 55–64.
Schrum, M.L., Johnson, M., Ghuy, M., & Gombolay, M.C. (2020). Four years in review: Statistical practices of Likert scales in human-robot interaction studies. In Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction (pp. 43-52). https://doi.org/10.1145/3371382.3380739
Wiedermann, W., & von Eye, A. (2013). Robustness and power of the parametric t-test and the non-parametric Wilcoxon test under non-independence of observations. Psychological Test and Assessment Modeling, 55(1), 39-61.
Wilcox, R.R. (2012). Introduction to robust estimation and hypothesis testing (3rd ed.). Academic Press.
Wu, H., & Leung, S.O. (2017). Can Likert scales be treated as interval scales? Simulation study. Journal of Social Service Research, 43(4), 527 532. https://doi.org/10.1080/01488376.2017.1329775
Zimmerman D. W. (2004). A note on preliminary tests of equality of variances. British Journal of Mathematical and Statistical Psychology, 57, 173 181. https://doi.org/10.1348/000711004849222
Zimmerman, D.W. & Zumbo, B.D. (1990) The Relative Power of the Wilcoxon-Mann-Whitney Test and Student t Test Under Simple Bounded Transformations. The Journal of General Psychology, 117(4), 425-436, https://doi.org/10.1080/00221309.1990.9921148
Zimmerman, D.W. (1985). Power Functions of the t-test and Mann-Whitney U Test Under Violation of Parametric Assumptions. Perceptual and Motor Skills, 61, 467 - 470. https://doi.org/10.2466/pms.1985.61.2.467
The power and type I error of Wilcoxon-Mann-Whitney, Welch's t, and student's t tests for Likert-type data
Likert-type item is the most popular response format for collecting data in social, educational, and psychological studies through scales or questionnaires. However, there is no consensus on whether parametric or non-parametric tests should be preferred when analyzing Likert-type data. This study examined the statistical power of parametric and non-parametric tests when each Likert-type item was analyzed independently in survey studies. The main purpose of the study is to examine the statistical power of Wilcoxon-Mann-Whitney, Welch's t, and Student's t tests for Likert-type data, which are pairwise comparison tests. For this purpose, a Monte Carlo simulation study was conducted. The statistical significance of the selected tests was examined under the conditions of sample size, group size ratio, and effect size. The results showed that the Wilcoxon-Mann-Whitney test was superior to its counterparts, especially for small samples and unequal group sizes. However, the Student's t-test for Likert-type data had similar statistical power to the Wilcoxon-Mann-Whitney test under conditions of equal group sizes when the sample size was 200 or more. Consistent with the empirical results, practical recommendations were provided for researchers on what to consider when collecting and analyzing Likert-type data.
Bindak, R. (2014). Comparison Mann-Whitney U Test and Students’ t Test in Terms of Type I Error Rate and Test Power: A Monte Carlo Sımulation Study. Afyon Kocatepe University Journal of Sciences and Engineering, 14, 5-11. https://doi.org/10.5578/fmbd.7380
Boneau, C.A. (1962). A comparison of the power of the U and t-tests. Psychological Review, 69, 246-256. https://doi.org/10.1037/h0047269
Boone, H.N., Boone, D.A. 2012. Analyzing Likert data. Journal of Extension, 50(2), 1-5. Retrieved February 20, 2023, from https://eric.ed.gov/?id=EJ1042448
Bridge, P.D., & Sawilowsky, S.S. (1999). Increasing physicians' awareness of the impact of statistics on research outcomes: comparative power of the t-test and Wilcoxon Rank-Sum test in small samples applied research. Journal of clinical epidemiology, 52(3), 229-35. https://doi.org/10.1016/S0895-4356(98)00168-1
Bulus, M. (2021). Sample size determination and optimal design of randomized/non-equivalent pretest-posttest control-group designs. Adiyaman Univesity Journal of Educational Sciences, 11(1), 48-69. https://doi.org/10.17984/adyuebd.941434
Bulus, M. (2022). Minimum detectable effect size computations for cluster-level regression discontinuity: Specifications beyond the linear functional form. Journal of Research on Education Effectiveness, 15(1), 151 177. https://doi.org/10.1080/19345747.2021.1947425
Bulus, M., & Dong, N. (2021). Bound-constrained optimization of sample sizes subject to monetary restrictions in planning multilevel randomized trials and regression discontinuity studies. The Journal of Experimental Education, 89(2), 379-401. https://doi.org/10.1080/00220973.2019.1636197
Calver, M., & Fletcher, D. (2020). When ANOVA isn't ideal: Analyzing ordinal data from practical work in biology. The American Biology Teacher, 82(5), 289-294. https://doi.org/10.1525/abt.2020.82.5.289
Carifio, J., & Perla, R. (2008). Resolving the 50-year debate around using and misusing Likert scales. Medical education, 42(12), 1150–1152. https://doi.org/10.1111/j.1365-2923.2008.03172.x
Champagne, C.A., & Curran, P.J. (2017). Using Monte Carlo simulations to demonstrate the importance of statistical power. The Journal of Educational Research, 110(6), 524-532. https://doi.org/10.1080/00220671.2015.1079697
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
de Winter, J.F., & Dodou, D. (2010). Five-point Likert items: t-test versus Mann-Whitney-Wilcoxon. Practical Assessment, Research, and Evaluation, 15(1), 11. https://doi.org/10.7275/bj1p-ts64
de Winter, J.F. (2013) Using the Student's t-test with extremely small sample sizes. Practical Assessment, Research, and Evaluation, 18, 10. https://doi.org/10.7275/e4r6-dj05
Delacre, M., Lakens, D., & Leys, C. (2017). Why Psychologists Should by Default Use Welch’s t-test Instead of Student’s t-test. International Review of Social Psychology, 30(1), 92. https://www.rips-irsp.com/articles/10.5334/irsp.661/
Derrick, B., & White, P. (2017). Comparing two samples from an individual Likert question. International Journal of Mathematics and Statistics, 18(3). Retrieved February 20, 2023, from http://www.ceser.in/ceserp/index.php/ijms/article/view/4997
Dong, N., & Maynard, R. (2013). PowerUp!: A tool for calculating minimum detectable effect sizes and minimum required sample sizes for experimental and quasi-experimental design studies. Journal of Research on Educational Effectiveness, 6(1), 24-67. https://doi.org/10.1080/19345747.2012.673143
Dwivedi, A.K., Mallawaarachchi, I., & Alvarado, L.A. (2017). Analysis of small sample size studies using non-parametric bootstrap test with pooled sampling method. Statistics in Medicine, 36, 2187 - 2205. https://doi.org/10.1002/sim.7263
Field, A. (2009). Discovering statistics using SPSS (3rd ed.). Sage publications.
Glass, G., Peckham, P., & Sanders, J. (1972). Consequences of failure to meet assumptions underlying the fixed effects analysis of variance and covariance. Review of Educational Research, 42, 237-288. https://doi.org/10.3102/00346543042003237
Harpe, S.E. (2015). How to analyze Likert and other rating scale data. Currents in Pharmacy Teaching and Learning, 7, 836-850. https://doi.org/10.1016/j.cptl.2015.08.001
Heeren, T., & D'Agostino, R.B. (1987). Robustness of the two independent samples t-test when applied to ordinal scaled data. Statistics in Medicine, 6(1), 79 90. https://doi.org/10.1002/sim.4780060110
Jamieson S. (2004). Likert scales: how to (ab)use them. Medical education, 38(12), 1217–1218. https://doi.org/10.1111/j.1365-2929.2004.02012.x
Kim, T.K., & Park, J.H. (2019). More about the basic assumptions of t-test: normality and sample size. Korean Journal of Anesthesiology, 72(4), 331 335. https://doi.org/10.4097/kja.d.18.00292
Liddell, T.M., & Kruschke, J.K. (2018). Analyzing ordinal data with metric models: What could possibly go wrong?. Journal of Experimental Social Psychology, 79, 328-348. https://doi.org/10.1016/j.jesp.2018.08.009
Ma, C., Wang, X., Xia, L., Cheng, X., & Qiu, L. (2021). Effect of sample size and the traditional parametric, non-parametric, and robust methods on the establishment of reference intervals: Evidence from real-world data. Clinical Biochemistry, 92, 67–70. https://doi.org/10.1016/j.clinbiochem.2021.03.006
Nanna, M.J., & Sawilowsky, S.S. (1998). Analysis of Likert scale data in disability and medical rehabilitation research. Psychological Methods, 3(1), 55 67. https://doi.org/10.1037/1082-989X.3.1.55
Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in Health Sciences Education, 15, 625-632. https://doi.org/10.1007/s10459-010-9222-y
Ruxton, G.D. (2006). The unequal variance Student’s t testis an underused alternative to Student’s t test and the Mann–Whitney U test. Behavioral Ecology, 17(4), 688–690. https://doi.org/10.1093/beheco/ark016
Sangthong, M. (2020). The Effect of the Likert Point Scale and Sample Size on the Efficiency of Parametric and Non-parametric Tests. Thailand Statistician, 18(1), 55–64.
Schrum, M.L., Johnson, M., Ghuy, M., & Gombolay, M.C. (2020). Four years in review: Statistical practices of Likert scales in human-robot interaction studies. In Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction (pp. 43-52). https://doi.org/10.1145/3371382.3380739
Wiedermann, W., & von Eye, A. (2013). Robustness and power of the parametric t-test and the non-parametric Wilcoxon test under non-independence of observations. Psychological Test and Assessment Modeling, 55(1), 39-61.
Wilcox, R.R. (2012). Introduction to robust estimation and hypothesis testing (3rd ed.). Academic Press.
Wu, H., & Leung, S.O. (2017). Can Likert scales be treated as interval scales? Simulation study. Journal of Social Service Research, 43(4), 527 532. https://doi.org/10.1080/01488376.2017.1329775
Zimmerman D. W. (2004). A note on preliminary tests of equality of variances. British Journal of Mathematical and Statistical Psychology, 57, 173 181. https://doi.org/10.1348/000711004849222
Zimmerman, D.W. & Zumbo, B.D. (1990) The Relative Power of the Wilcoxon-Mann-Whitney Test and Student t Test Under Simple Bounded Transformations. The Journal of General Psychology, 117(4), 425-436, https://doi.org/10.1080/00221309.1990.9921148
Zimmerman, D.W. (1985). Power Functions of the t-test and Mann-Whitney U Test Under Violation of Parametric Assumptions. Perceptual and Motor Skills, 61, 467 - 470. https://doi.org/10.2466/pms.1985.61.2.467
Şimşek, A. S. (2023). The power and type I error of Wilcoxon-Mann-Whitney, Welch’s t, and student’s t tests for Likert-type data. International Journal of Assessment Tools in Education, 10(1), 114-128. https://doi.org/10.21449/ijate.1183622