A Modified Jonckheere Test Statistic for Ordered Alternatives in Repeated Measures Design

In this article, a new test based on Jonckheere test [1] for  randomized blocks which have dependent observations within block is presented. A weighted sum for each block statistic rather than the unweighted sum proposed by Jonckheereis included. For Jonckheere type statistics, the main assumption is independency of observations within block. In the case of repeated measures design, the assumption of independence is violated. The weighted Jonckheere type statistic for the situation of dependence for different variance-covariance structure and the situation based on ordered alternative hypothesis structure of each block on the design is used. Also, the proposed statistic is compared to the existing test based on Jonckheere in terms of type I error rates by performing Monte Carlo simulation. For the strong correlations, circular bootstrap version of the proposed Jonckheere test provides lower rates of type I error.


Introduction
In medical research, testing the direction change or trend of response levels over time/treatment is very considerable problem.For ordered alternatives in a randomized block design, Jonckheere and Page tests are well-known distribution-free tests [1,2].Zhang and Cabilio [3] developed a generalized Jonckheere test against ordered alternatives for repeated measures in a randomized block design.Repeated measures designs are often just special cases of randomized block designs.This is also the case when a particular sequence is of interest and is used by all subjects-as is the case in many learning experiments.Agresti and Pendergast [4] proposed a rank test to detect treatment effects in repeated measures designs where the observations within blocks are assumed to be correlated.Also, Kepner and Robinson suggest a RT-type statistics similar to the one proposed by Page [2].The RT technique was described in detail by Conover and Iman [5] and compared efficiency of RT and F (ANOVA) statistics based on their asymptotic relative efficiency for the same problem.Repeated measures design tends to be serially correlated, and hence the usual assumptions of analysis of variance or linear regression analysis cannot be applied.The main difference between completely randomized and repeated measures designs is that in the former it is often reasonable to assume independence across all observations, while in the latter case observations are likely to be dependent.If this dependence is not taken into account then inference about treatment effects may be biased or may fail to reveal the true significance of treatment contrasts.Each experimental unit is considered as a block of the design, and the model is shown as in Equation (1): , =  ,−1 +  , ;  , ~(0,1), where  is the length of time period of repeated measures of the  ℎ block or experimental unit,  is the number of experimental unit,   is the time effect of the  ℎ time period,   is the effect of  ℎ block and finally  , 's are independent white noise.Since the observations in each block are dependent, error terms { , } are also assumed dependent and they have zero mean.This dependency weakly forms dependent stationary process, autoregressive (1) model.(1) model assumes that repeated measurements have a first-order autoregressive relationship.The correlation between any two elements is equal to  for adjacent elements,  2 for elements that are separated by a third, and so on.is constrained so that −1 ≤  ≤1.As the time interval increases between experimental units, correlation structure of error terms weakens [3].Since parametric repeated measures analysis has commonly stationary autoregressive model, we will focus on (1) structure in our study.A very common problem in applied statistics is to test the differences between the effects of the treatments.The null and alternative hypothesis are as seen below: In this alternative hypothesis, it is assumed that at least one treatment is different from the others.However, sometimes we have prior knowledge of these treatments that increase or decrease.For example, dose-response studies or psychological experiments are frequently suitable to use ordered alternative hypothesis.When the repeated measures design has ordered alternative hypothesis, it is usually shown as a longitudinal design.In longitudinal designs, the same experiment subjects are measured at different time intervals.The main focus of this kind of experiment is the study of change over time or development or growth.For instance, aggression in students who were transitioning from primary school into middle school was studied by Pellegrini and Long [6].They elicited aggression ratings for each student from their peers and teachers at four different time points: the beginning and end of the last year of primary school, and the beginning and end of the first year of secondary school.The analysis focused on the mean change across time.From primary to secondary school, the major hypothesis was confirmed as mean aggression increased.This was attributed to students establishing (or reestablishing) the social dominance hierarchy through aggression after a major environmental change [6].In this paper, our main attention is to examine ordered alternatives studies.Null and alternative hypothesis of ordered alternatives are given as: where at least one of the inequalities is strict.
In testing ordered hypotheses, Page [2] and Jonckheere [1] [8] states that "positive correlation increases the possibility of the null hypothesis of no trend to be rejected when it is correct, with a probability larger than the assigned (nominal) level of significance".This situation leads to inflation of the variance of the MK test by serial correlation.Many authors have attempted to modify the MK test for serially dependent data to decrease the rejection rate when there is no trend to the nominal significance level [9].Cabilio and Zhang [3] used a generalized version of Jonckheere statistic based on MK statistics for testing the ordered alternative hypothesis.They also obtained the asympyotic distribution of this statistic and used different dependent structure for this testing procedure.As seen from their simulation study, the obtained empirical type I error rates are not close to their nominal levels sufficiently.For this reason, we modified this test statistic and, investigated circular block bootstrap and independent and identically bootstrap methods based on Jonckheere test and modified Jonckheere test.This article is organized as follows.The Jonckheere test and modified version of Jonckheere test are introduced briefly.Also bootstrap methods are examined for the cases dependent series and indepenedent series of a dataset.After this section, we compare the performance of these tests with simulation study in terms of the type 1 error rates of tests.Eventually, we provide a summary for our findings.

Tests for Ordered Alternatives in Repeated Measures of Randomized Blocks
In this section, we examine the Jonckhere test and modified version of Jonckheere test given as follows.

Jonckheere test
The Jonckheere test is based on Mann-Kendall statistic [7].For this reason, we initially give the Mann-Kendal statistic.The Mann-Kendall statistic can be formulated as follows: where (6) and   can be similarly defined by observations of .Under the null hypothesis that  and  are independent and randomly ordered, the statistic  tends to normal distribution for large , with mean and variance as follows [9]: () = 0 and () = ( − 1)(2 + 5)/18.
If the values in  are replaced with the time order of a time series , . ., 1,2, … ,  as in repeated measures design, the test can be used as a trend test [7].In this case, the statistic  reduces to the following equation, . The derivation of the mean and variance of  is given in detail by Kendall [7].
Based on Mann-Kendal statistic, the Jonckheere test [10] is defined as follow: In the Jonckheere test formula, Mann-Kendal statistic for testing trend in a series of observations and   () = ∑ (  () −   ()) < is the non-standardized Kendall's tau correlation between a subject's repeated responses and the alternative ordering where each subject is ranked within itself over time.Also,   () is the rank of the  ℎ subject at time , and (  () −   ()) is either 1 or −1, depending on whether   () >   () or   () <   ().

Modified Jonckheere test
In order to test  0 versus  1 in Eq. ( 4), Skillings and Wolfe [10] used weighted Jonckheere test statistics in a randomized block design [10].In this paper, we modified the Jonckheere test statistics for repeated measures design by using the weighting given in Skillings and Wolfe (1978).Modified Jonckheere Test statistic is proposed for repeated measures designs as follow: In the modified    As seen in the Figure 1, the estimated type 1 error rates of four tests for negative values of  are lower than the test result of positive values of .When the block size is 5, for negative values of  MJI generates the lowest type1 error rates among four of them.For the independent case, MJI generates the closest value to the nominal level.For the positive values of , MJI also provides closer values to nominal level.When block size is increased to 10, 20 and 30, MJI gives the lowest values of type1 error rates even for  = 0.9.
As seen in the Figure 1, there is slight difference between bootstrap methods for the combination  and  except for  = 0.9.Furthermore, it seems that when block size is increased, estimated type1 error rates are decreased for all test statistics.
The lowest results of type 1 error are generated by MJI for the block size 5 for the negative values of  and weak correlation values ( = 0.1, 0.3).For the higher positive correlation values ( = 0.5, 0.7, 0.9) MJC gives the higher value of type 1 error for the block size 5.For the block sizes  = 10, 20, 30, simulation results show similar trend with the block size 5.The test statistics based on circular bootstrap (JC and MJC) provides closer results to each other in terms of type 1 error rates.After moderate positive values of  , MJC and JC generates lower type 1 error rates.
As shown on Figure 3 (a), the lowest type1 error rates are given by MJI for the negative correlation term for the smallest block size.Also, it is seen that MJC and JC generate very close type1 error rates to the nominal level for the negative values of .For the lower positive values of , MJI has lower results of type1 error rates.For the higher positive values of , MJC generates better type1 error rates than JC does.
For the block sizes  = 10, 20, 30, the trend of four tests stays same with  = 5 yet, type1 error rates are decreased.For example, for the block size 5 and  = 0.9, MJC gives type 1 error rate as 0.1615.As block size is increased to 30 for  = 0.9 type 1 error rate is decreased to 0.1284 for MJC. = 0.3 is intersection point for all test of  = 12, circular bootstrap-based tests provide lower type1 error rates for  = 0.5, 0.7, 0.9 as seen in Figure 3.As positive correlation terms are increased, MJC and JC give lower type1 error rates than MJI and JI.As given in Figure 4, the lowest type1 error rates are generated by MJI when correlation terms are negative.It can be seen that that MJC generates very close type 1 error rates to the nominal level for the negative values.When the observation terms are independent, MJI gives the closest type 1 error rates to the nominal alpha.MJI gives lower actual type1 error rates when correlation terms are moderate ( = 0.3, 0.1).For the higher positive ( = 0.5, 0.7, 0.9) correlation terms, MJC generates more reasonable type 1 error rates.The results above stays almost similar with  = 5 when the block size is increased to 10, 20 and 30.However, type 1 error rates are decreased about %2.For example, when the block size is and  = 0.9, MJC gives type 1 error rate as 0.1585.As block size is increased to 30 for  = 0.9 type 1 error rate is decreased to 0.1375 for MJC.The intersection point is  = 0.1 for all test statistics of  = 24 that means it is hard to recognize difference of four tests on this point.Also, circular bootstrap-based tests indicate lower type1 error rates for almost all positive correlation terms except for  = 0.1 for each block size.

Numerical Example
A numerical example is given in order to apply the test statistics to dataset that are generated in Table 1.Consider that six depression patients are given a drug that increases levels of 'happy chemical' in the brain.
At baseline, all six patients have similar levels of this chemical.Researcher measures brain-chemical levels at three subsequent time points after given drug: at 2 months, 3 months and 6 months post-baseline.
Since there is a natural personal variability, we can take each subject as a block.Because experiment subject stays same for each time period, we need to take into consideration the dependence in repeated observations.The both Jonckheere test statistics are appropriate for such a situation.For this dataset, we can define our null and ordered alternative hypothesis as given below based on 3000 bootstrap samples: the bootstrap p-value results of these four statistics is obtained as shown in Table 2.

Discussion and Conclusion
In clinical or medical research, dataset is usually gathered serially or longitudinally on subjects.The group of repeated measures can be analyzed by parametric methods to uncover the mean profile change with ordered alternatives.The modification of Jonckheere tests were developed in order to make use of information contained blocks that have dependent observations.In the literature, some authors investigate large time interval ( = 25, 50, 100) [3], [14].Due to this attitude, they miss out IID bootstrap importance for small time intervals ( = 6  9).Therefore, we take into consideration small time intervals in this study.As seen from the tables, the strong correlation increases the type1 error rates.For negative values of , type1 error rates are very smaller than the nominal level, which shows a very conservative test.While, opposite is true, the rates are highly larger than nominal level that indicates a very liberal test.These problems moderately can be solved for larger block sizes.The accuracy of the estimated type 1 error rates improves when block size is increased to 20 or 30.These tests are mostly liberal for small blocks sizes for small value of .Type 1 error rates for IID bootstrap- As the time intervals are increased, MJC provides lower rates of type1 error for higher positive correlation terms.Generally, when block size gets larger for all the time intervals ( = 6, 9, 12, 24) , type1 error rates get lower.
After moderate values of , the bootstrap methods become quickly distinguishable from their type1 error rates.Whereas circular bootstrap method is reasonable for larger correlation terms ( > 0.5) , IID bootstrap can be preferred for smaller values of ( < 0.5).It is seen that the type1 error rates for these tests are affected by the value of correlation term.Generally, the type1 error rates for these tests are somewhat higher than the nominal.It appears that there is quite difference between the two bootstrap-based methods.As far as we have realized that a similar behavior has been observed for unweighted trend statistics as in [3].
Since any trend will be eliminated due to shuffling, circular bootstrap method is preferred for the dataset which has strong dependent structure as in explained Section 2 broadly.As seen in Figures, circular bootstrap-based tests provide lower type1 error rates for more number of positive correlation terms as  is increased.For instance, MJC and JC generate lower type1 error rates for combination of  = 9 and  = 0.7, 0.9.For  = 24, MJC and JC provide lower type1 error rates for  = 0.3, 0.5, 0.7, 0.9.We also should note that the tests will mostly be liberal for small block sizes.These tests should be applied with caution when both  and b are small.Especially, for high positive correlation terms, estimated type error rates are increased for large time intervals.
1,0.3,0.5,0.7,0.9) and standard normal white noise.The simulation study also provides opportunity to investigate effect of negative, positive correlation terms and independent case ρ = ( 0) .Figures 1,2,3,4 display type1 error rates of all tests corresponding to nominal significance level of 0.05 based on IID bootstrap and circular bootstrap methods.
in the range of [0.000; 0.0447] for negative correlation terms and [0.043; 0.3] for positive correlation terms.
[13]kheere test formula,   () = One of the most powerful nonparametric tools is bootstrap methods for estimating certain statistical properties of interest.In the bootstrap methods, to get new samples with the same number of samples from the original dataset, the core dataset are shuffled many times with replacement.The test statistic is obtained from each bootstrap sample.Since the shuffling of the core data will exterminate trend, distribution of test statistics become trendfree.By comparing the original test statistic with the new trend-free test statistic, type1 error rates can be estimated.Resampling with or without replacement from a given dataset  1 ,  2 , … ,   to give a distribution of estimator is the main function of the bootstrap method.One of the bootstrap methods is the independent and identically distributed (IID) bootstrap method by performing picking from {1, … , } with equal probability with replacement.The IID bootstrap method is based on the notion that the data  1 ,  2 , … ,   only represents a single realization of all possible combinations of a sample.Therefore, the distribution of an estimate  � obtained by IID bootstrap method is an estimate of  � when  1 ,  2 , … ,   derived from a common distribution function ( ≤ ).One of the other bootstrap methods is the block bootstrap method proposed by Kunsch[11]for a stationary dependent data.One of the most commonly used block bootstrap methods is the circular block bootstrap introduced by Politis and Romano[12].When the dataset isserially dependent, bootstrap method is performed in circular blocks.Hence, the correlation of the original dataset pattern is saved.Dehling and Wendler[13]gave the properties of circular block bootstrap method for Ustatistics when the observations are dependent.Let  1 ,  2 , … ,   be a sample in the  ℎ block for  ℎ time interval.It is shown that circular block length as , and the number of circular block as  = ‖/‖ from now on, brackets mean the largest integer which is equal to a smaller than the /.In the simulation study, we choice  = √, it balances variance ratio for positive and negative correlation terms of model.Thus, we used it as the optimal block length for the following simulation study.Each sampled circular block include consecutive observations of length .The  consecutive observations are taken from  1 ,  2 , … ,   depending upon starting point picked randomly from {1, … , }.
4. Simulation StudyIn this section, the performances of Jonckheere test statistic and modified Jonckheere test statistic based on two different bootstrap methods are investigated in terms of type 1 error rates.Jonckheere test statistic based on IID bootstrap (JI), Jonckheere test statistic based on circular bootstrap (JC), modified Jonckheere test statistic based on IID bootstrap (MJI) and

Table 1 .
Hypothetical dataset for repeated measures of blocks.