A SIMULATION STUDY ON TESTS FOR ONE-WAY ANOVA UNDER THE UNEQUAL VARIANCE ASSUMPTION

The classical F-test to compare several population means depends on the assumption of homogeneity of variance of the population and the normality. When these assumptions especially the equality of variance is dropped, the classical F-test fails to reject the null hypothesis even if the data actually provide strong evidence for it. This can be considered a serious problem in some applications, especially when the sample size is not large. To deal with this problem, a number of tests are available in the literature. In this study, the Brown-Forsythe, Weerahandis Generalized F, Parametric Bootstrap, Scott-Smith, One-Stage, One-Stage Range, Welch and Xu-Wangs Generalized F-tests are introduced and a simulation study is performed to compare these tests according to type-1 errors and powers in di¤erent combinations of parameters and various sample sizes. 1. INTRODUCTION In applied statistics an experimenter wants to compare two or more populations measured using independent samples. The classical F(CF) test is used under the assumption that the populations have normal distributions with the same variances. In this paper we consider the problem of comparing the means of k populations with the assumption of heteroscedastic variances. The CF test fails to reject the null hypothesis even for large samples when the population variances are unequal. This is a serious problem, especially for biomedical experiments in which one does not usually have large samples. In such applications each data point can be so vital and expensive. Alternative methods are developed due to this problem. Some of these test statisticsdistribution is not known and the p-value can be found by simulation (Weerahandi, 1995; Weerahandi, 2004). There Received by the editors June 25, 2010, Accepted: October. 26, 2010. 2000 Mathematics Subject Classication. Primary 05C38, 15A15; Secondary 05A15, 15A18.


INTRODUCTION
In applied statistics an experimenter wants to compare two or more populations measured using independent samples.The classical F-(CF) test is used under the assumption that the populations have normal distributions with the same variances.In this paper we consider the problem of comparing the means of k populations with the assumption of heteroscedastic variances.
The CF test fails to reject the null hypothesis even for large samples when the population variances are unequal.This is a serious problem, especially for biomedical experiments in which one does not usually have large samples.In such applications each data point can be so vital and expensive.Alternative methods are developed due to this problem.Some of these test statistics'distribution is not known and the p-value can be found by simulation (Weerahandi, 1995;Weerahandi, 2004).There ESRA YI ¼ GIT AND FIKRI GÖKPINAR are a large number of approximate tests (Chen and Chen, 1998;Chen, 2001;Tsui and Weerahandi, 1989; Krishnamoorthy et al., 2006;Xu andWang, 2007a, 2007b) and exact tests (Bishop and Dudewicz, 1981;Welch, 1951;Scott and Smith, 1971; Brown and Forsythe, 1974) in the literature.In practice, some exact procedures such as the CF, Welch (W), Scott-Smith (SS) and Brown-Forsythe (BF) tests are widely used.Alternative tests have been applied to solve a number of problems when conventional methods are di¢ cult to apply or fail to provide exact solutions.
In this paper we carry out a simulation study to compare the size performance of the CF, W, SS, BF, Chen-Chen's One Stage (OS), Chen-Chen's One Stage Range (OSR), Weerahandi's Generalized F (GF), Xu-Wang's Generalized F (XW) and Parametric Bootstrap (PB) tests when population variances are unequal in one-way ANOVA problems.The type-I error rates and powers of the tests are compared using Monte Carlo simulation using various sample sizes and under various parameter combinations.

TESTS FOR ONE-WAY ANOVA
Let X i1 ; : : : ;X ini be a random sample from N ( i ; 2 i ) i=1,. . .,k.The problem of interest involves testing H 0 : 1 = 2 = : : : = k H 1 : Not all i s are equal i = 1; : : : ; k The standardized between-group sum of squares and the standardized error sum of squares are given in (2.2) and (2.3) when 2 i s are unequal.

Sb = Sb
Most of the test statistics to test the equality of means under heteroscedasticity are based on the standardized between-group sum of squares and standardized error sum of squares.In the rest of this section test statistics are brie ‡y introduced.In this section the W, SS and BF tests, whose distribution can be obtained theorically, are given.GF test and the XW test based on the generalized F-test, whose p-values are obtained by simulation, are given.The OS and OSR tests developed by Chen andChen (1998, 2001) based on Bishop and Dudewicz's (1981) two-stage procedure are investigated.Finally, the PB test developed by Krishnamoorthy et al. (2006) is discussed.Welch (1951) gives the following test statistics.

The Welch Test
For a given level , and an observed value W h of W, this test rejects the H 0 in (2.1) whenever the p-value is given as Scott and Smith (1971) give the following test statistics.
For a given level , and an observed value fs of F s , this test rejects the H 0 in (2.1) whenever the p-value is given as The Brown-Forsythe Test Brown and Forsythe (1974) give the following test statistics.
For a given level , and an observed value B h of B, this test rejects the H 0 in (2.1) whenever the p-value is given as

Weerahandi's Generalized F-test
The sample variances (MLEs) of the k populations are denoted by S 2 i , where De…ne and that Se , B j are all independent random variables.Note also that the random variables can be expressed as Therefore, the generalized p value can be expressed as (1 B 1 )B 2 : : : B k 1 ; : : : ; (2.5) where H k 1;n k is the cumulative distribution function of the F -distribution with k-1 and N-k degrees of freedom.This test rejects the H 0 in (2.1) whenever ph (Weerahandi, 1995a).

Xu-Wang's Generalized F-test
For a bigger value of k the type-I error probability of the generalized F-test exceeds the nominal level.Xu-Wang (2007a) developed some test statistics where its empirical type-I error probability does not exceed its nominal level.Denote v a = 1 ; 2 ; : : : ; k 1 0 and v b = 1 k 1 k , where 1 k 1 is the (k 1) 1vector whose elements are all ones.Then null hypothesis in (2.1) is equal to the null hypothesis as The sample variances (MLEs) of the k populations are denoted by S 2 i , where De…ne Let ya, yb, sa and sb denote the observed values of Y a ; Y b ; S a and S b respectively.
T is a generalized test variable as and the observed value of T is given as where Y N (0; I k 1 ) ; U i 2 ni 1 ; i = 1; : : : ; k: Under the null hypothesis in (2.1), the generalized p-value is given by p = P (T t) and H 0 is rejected if p < .

The One-Stage Test
Chen and Chen (1998) developed the OS procedure since the number of samples that are required at the second stage of two-stage procedure of Bishop and Dudewicz (1981) can be large and impracticable.For each population, the …rst (or any randomly chosen) n 0 (2 n 0 n i ) observation is chosen to calculate the usual sample mean and variance as De…ne weights U i and V i for the observations in the i th sample as [k] is the maximum of (S 2 1 ; : : : ; S 2 k ).Let the …nal weighted sample mean be de…ned by where Chen and Chen (1998) give the following test statistics.
Then we have Under the null hypothesis in (2.1), it follows that F 1 is distributed as which is a quadratic form in the independent student's t variates each with n 0 -1 degrees of freedom (Chen and Chen, 1998).For a given level , and an observed value F 1 h of F 1 , this test rejects the H 0 in (2.1) whenever the p-value is given as

The One-Stage Range Test
In another procedure based on one stage, Chen (2001) gives the following test statistics.
where Xmax ( Xmin ) is the maximum (minumum) of X1 ; ; Xk and z* is the maximum of S 2 1 n1 ; : : : ; which is the range of k independent student's t variates each with n 0 -1 degrees of freedom.For a given level , and an observed value t 1 of T 1 , this test rejects the H 0 in (2.1) whenever the p-value is given as P (R k;n0 1 > t 1 ) < .

A Parametric Bootstrap Approach
In the case of population variances 2 i s are unknown; a test statistic can be obtained by replacing 2  i in (2.2) by S 2 i and is given by As the test statistic in (2.6) is location invariant, without loss of generality, we can take the common mean to be zero.Let X Bi N 0; . Then the parametric boostrap pivot variable can be obtained by replacing Bi and is given by

Bi
(2.7) , where Z i is a standard normal random variable.So the PB pivot variable in (2.7) is distributed as

SIMULATION STUDY
In this section we compare the CF, W, SS, BF, GF, XW, OS, OSR and PB tests according to type-1 errors and powers in di¤erent combinations of parameters and sample sizes.
ESRA YI ¼ GIT AND FIKRI GÖKPINAR 3.1.Comparison between the type I error rates of the tests.In this section we consider the balanced and unbalanced cases from smaller to larger sample sizes where k =3 and k =5 for comparing the tests.The values for the variances vary over a large range so that 2 1 < : : : < 2 k and 2 1 > : : : > 2 k .For each combination of n i and 2 i the rejection rate of each testing procedure is calculated and compared with the nominal level 0.05 when the means are all equal.To estimate the type I error rates of the CF, W, SS, BF tests, we use simulation consisting of 5000 runs for each of the sample sizes and parameter con…gurations.CF, W, SS, BF test statistics are calculated from these generated data and type I errors are estimated by the proportion of test statistics that exceed the critical values calculated from the distributions.To estimate the type I error rates of the GF, PB, OS, OSR and XW tests, we use a two-step simulation.For estimating the type I error rates of the GF test we generate 5000 observed vectors x 1 ; :::; x k ; s 2 1 ; :::; s 2 k , and used 5000 runs for each observed vector to estimate the p value in (2.5).Finally the type I error rate of the GF test are estimated by the proportion of the 5000 p-values that are less than the nominal level .The type I error rates of the PB, OS, OSR and XW tests are similarly estimated.In both cases of equal and unequal variances for k=3 and k =5 simulated type I error probabilities are given in tables 1, 2, 3 and 4.
.1 8 3 2 .0 5 8 8 .5 7 4 0 .06 7 2 .07 3 2 .0 5 1 2 .0 4 7 6 .0 4 8 6 .0 .0 4 8 6 .0 5 6 4 .0 3 0 4 .0 4 9 4 .0 5 9 0 .0 5 6 4 .0 5 1 2 .0 5 6 4 .0 .0 4 7 4 .0 4 7 8 .0 3 1 0 .0 4 9 0 .0 5 1 0 .0 5 2 0 .0 4 7 0 .0 4 7 6 .0  .Table 4.Simulated type I error rates when k=5 and sample sizes are unequal under nominal =0.05 We observe the following from the numerical results in Tables 1, 2, 3 and 4. The CF and SS tests seem to have a type I error probability exceeding the nominal level for the balanced case and small sample sizes.In the case of extreme heteroscedasticity the W, BF, GF and PB tests exceed the nominal level.However, the OS, OSR and XW tests are superior to the other tests.The W, GF and PB tests also seem to be very conservative, when the sample sizes are large.The CF, SS and BF tests exceed the nominal level when the sample sizes are proportional to variances for small sample sizes.The W, GF, OS, OSR, PB tests seem to be very conservative not only for the small sample sizes but also for the large samples.However, the XW test exceeds the nominal level for the large sample sizes.The CF, W, BF, SS and GF tests exceed the nominal level when variances and sample sizes are inversly ESRA YI ¼ GIT AND FIKRI GÖKPINAR proportional.However, the OS, OSR and XW tests seem to be very conservative.The W, GF and PB tests have similarly results when the sample sizes are large.For a bigger value of k the CF, W, SS, BF, GF tests exceed the nominal level when the sample sizes are small.The SS, BF and XW tests have similar results when the sample sizes are large.The OS, OSR and PB tests seem to be very conservative not only for the small sample sizes but also for the large sample sizes.For all cases similar results were found.It appears that the PB, OS and OSR tests are superior to the other tests.

Comparison Between
The Powers Of The Tests.For each combination of n i and 2 i the rejection rate of each testing procedure is calculated and compared with the nominal level 0.05 when the means are not all equal.In this section we use 5000 runs for each of the sample sizes and parameter con…gurations to alculate the powers of the CF, W, SS, BF, OS, OSR, GF, PB, XW tests.For k=3 and k =5 we provide the powers of these tests.

CONCLUSION
In this simulation study for a range of choices of sample sizes and parameter con-…gurations we compared the performance of the above tests for testing the equality of means of one-way ANOVA models under heteroscedasticity.The CF test is not an appropriate test for heteroscedasticity because its type I error rates exceed the intended level 0.05.The same is true for the SS test.The OS and OSR tests appear to be less powerful than the other tests even though their type I error rates are close to the intended level 0.05, regardless of the sample sizes, value of the error variances and the number of means being compared.
The W and PB and especially the GF tests appear to be more powerful than the other tests when k =3 and the sample sizes are small (n 1 , n 2 , n 3 =3, 5, 7).The W and PB tests are superior to the other tests when k =5 and the sample sizes are small (n 1 , n 2 , n 3 =3, 5, 7).When the sample sizes are large the GF, W and PB tests are more powerful than the other tests when both k =3 and k =5.In this case the XW test is also powerful when the variances and sample sizes are inversely proportional.
Although the empirical type I errors of the tests based on the OS procedure are close to their nominal level, the powers of these tests are not as high as those of the GF, W and PB tests.For this reason, the GF, W and PB tests can be used instead of tests based on the OS procedure.

Table 1 .Table 2 .Table 3 .
Simulated type I error rates when k=3 and sample sizes are equal under nominal = Simulated type I error rates when k=3 and sample sizes are unequal under nominal =0.05 Simulated type I error rates when k=5 and sample sizes are equal under nominal =0