ON THE COMPARISON OF THE WELCH TEST AND THE SINGLE-STAGE TEST : A SIMULATION STUDY

It is known that using the F test for testing the equality of means in a one-way ANOVA is misleading when the assumption of equal population variances is violated. When the variances are unknown and unequal, Welch [1] developed the so-called Welch test and Chen and Chen [2] developed the single-stage test for ANOVA. In this paper, Welch and single-stage tests are compared in terms of their powers and the reject ratios of the hypothesis when the hypothesis is true. Simulation results indicate that the power of the singlestage test is better than the Welch test and that this test performs well in terms of the reject ratios for small number of populations and sample sizes.


Introduction
The procedure of testing the equality of means in the conventional one-way ANOVA is based on the assumptions of independence, normality and equal variances.When the populations have di¤erent variances, it is well-known that the results mislead to wrong conclusions.Studies have shown that the F-statistic is not robust under the violation of equal error variances especially in case of unequal sample sizes.In case of unequal variances Cochran suggested a test method where the reciprocals of sample variances are used as weights in the sum of squares explained and he provided a chi-squared test [3].Weighting the terms in the sum of squares explained by using the reciprocals of the estimated variances of the respective sample means was proposed by James [4] and Welch [1].Brown and Forsythe developed a test for one-way layout under heteroscedasticity [5].The two-stage test suggested by Dudewicz and Dalal [6] was applied on variance analysis problems by Bishop and Dudewicz [7].Chen and Chen compared the single-stage test with the two-stage test to test the null hypothesis about the equality of means [2].
In this study the commonly used Welch test and single-stage test are compared by means of their powers and the reject ratios of a true null hypothesis as the realizations of the speci…ed 0 s.The comparisons are made by simulation.The M ELTEM EKIZ, HAM ZA GAM GAM Welch test is introduced in Section 2 and the single-stage test in Section 3. In Section 4, these two tests are compared.

The Welch Test
Let independent Y 0 ij s denote the j-th observation from the i-th population N i ; 2 i , where i = 1; :::; I and j = 1; :::; n i .The sample mean and variance of the i-th sample are formulated,respectively, by The overall sample size and sample mean are shown by the following equations . When the population variances are equal, 2 1 = 2 2 = ::: = 2 I , the classical F-test with degrees of freedom I-1,n-I is appropriate for testing the null hypothesis about the equality of the means, 1 = 2 = ::: = I : When the population variances are unequal, the Welch test can be used [1].This test is de…ned as follows: The weights, Then the weighted sample mean is obtained as Explained weighted sum of squares, the variability of the weights from population to population, can be estimated by Let f i = n i 1 be the degrees of freedom for the i-th sample and Then the Welch test statistic W , which has an approximate F with degrees of freedom ) 1 :

The Single-Stage Test
The single-stage test is an alternative for the Welch test.Let independent Y ij 's, which is given heretofore, denote the j-th observation from the i-th population N i ; 2 i , where i = 1; :::; I and j = 1; :::; n i .Employ the …rst (or randomly chosen) (n i 1) observations to de…ne the sample mean and sample variance, respectively, by and The weighted sample mean is where the weights are random variables and de…ned as In literature the two di¤erent weights U i and V i , given in Eq.(3.1), are formulated as is the maximum of S 2 1 ; :::; S 2 I .Also U i and V i satisfy the following conditions: When the sample variances are given S 2 i = s 2 i , the weighted sample mean Y i: has a conditional normal distribution with mean i and variance Furthermore, for given value of S 2 i S 2 i = s 2 i the statistic t i de…ned as has a conditional normal distribution with mean zero and variance 2 i =s 2 i .It was proved that the conditional normal distributions of t i are unconditional and independent Student's t distributions with n i 2 degrees of freedom [2].In other words, the joint p.d.f. of t 1 ; :::; t I , given S 2 i = s 2 i , was written as the p.d.f. of Student t 0 i s with n i 2 degrees of freedom.Using equation (3.2), t i can be written as These statistics are distributed as independent Student's t with degrees of freedom n i 2. When the sample sizes are equal, i.e. n 1 = ::: = n I = n, the overall weighted mean is formulated by In order to test the null hypothesis the statistic e is recommended [2].It is the sum of squares of independent Student's t variables.Under the null hypothesis, the distribution of statistic e F 1 was obtained by simulations and the critical values were tabulated for the chosen sample sizes n-2=2,3,4,5,6,8 and number of populations I=3,4,5,6,8.For the implementation of the single-stage test e F 1 is calculated then its value is compared with the critical value e F ;I;n 2 .If e F 1 > e F ;I;n 2 ; the null hypothesis that the population means are all equal is rejected.

Comparison of the Welch Test and the Single-Stage Test
In this study, we compared the two tests in two respects.First we compared them in terms of the powers, in other words the reject ratios of the null hypothesis under unequal means.To do this, for given values of i and 2 i (as can be seen in Tablo 1), the data sets are generated from the distribution Y ij N i ; 2 i , when i = 1; : : : I, I = 3; 4; 5; 6; 8, j = 1; :::; n i , n 1 = ::: = n I = n = 4; 5; 6; 7; 8; 10 and = 0:25; 0:10; 0:05; 0:025; 0:01.Then the power of the tests for each statistic are estimated after 10000 simulation runs.The average values of the powers for each test are summarized in Table 1.From Table 1 we see that the power of the single-stage test is better than the Welch test for each sample size as well as levels, when the number of populations are 3, 4. In this case the single-stage test may be a suitable choice.On the other hand the Welch test becomes more powerful as the number of populations and the sample sizes increase.
Secondly, these tests are compared with respect to the reject ratios of a true null hypothesis as the realization of the speci…ed 's, under equal means.For this, the data sets are generated from the distributions Y ij N 2; 2 i after 10000 simulation runs for the same number of populations, sample sizes and variances given above.Then average values of the reject ratios are calculated and tabulated in Table 2 for level of signi…cance = 0:25; 0:10; 0:05; 0:025; 0:01.From the simulation results shown in Table 2 we see that the reject ratios of the single-stage test are more close to the level of signi…cance than those of Welch test, when the number of populations are only 3 and 4 and the sample size is 4. Furthermore in Welch test, the realization of the speci…ed 's increase when the number of populations and the sample sizes increase.On the other hand, the estimated values in Table 2

Discussion and Conclusions
The single-stage test and the Welch test both ignore the assumption of equal variances in one-way ANOVA.Therefore, in this study, these two tests are compared by means of the power of the tests and the reject ratios of true null hypothesis as the realization of the speci…ed 's.For this, we used our computer programs (written in Matlab) based on 10000 simulation runs.Results are given in Table 1 and Table 2 for di¤erent number of groups, sample sizes and values of levels of signi…cance.The power of the single stage test is better than that of the Welch test, when I = 3; 4. If I is 5 or larger, the Welch test gives better results.By comparing these two tests in terms of the reject ratios, the single-stage test has given more close values to the level of signi…cance , when I = 3; 4 and sample size 4.
As a conclusion, the single-stage test has given better results in terms of both power and the reject ratios of true null hypothesis as the realization of the speci…ed 's, when sample size is 4 and number of groups is 3 or 4. So, by using the singlestage test for small sample sizes and number of groups an experimenter can save time and money.Finally, the single-stage test may be his appropriate choise.

1 ;
:::; I are used to de…ne the weighted mean = Usually the weights are unknown and estimated by b

Table 1 .
The powers of the single-stage test and the Welch test.Average value for the power of the single-stage test.**: Average value for the power of the Welch test.

Table 2 .
The reject ratios of the single-stage test and the Welch test.Average value for the reject ratios of the single-stage test.**: Average value for the reject ratios of the Welch test.