The Comparison of the Estimators for the Parameters of the General Linear Regression Model via Simulation and Two Real Life Data Examples

Bu calismada genel dogrusal regresyon modelinin parametrelerine yonelik bir cok tahmin edicinin ki bunlar en kucuk kareler (EKK) tahmin edicileri, Huber ve Tukey M-tahmin edicileri, S-tahmin edicileri ve MM-tahmin edicileri olmak uzere etkinlik ve dayanikliliklarini simulasyon yoluyla karsilastirdik. Oncelikle her bir yontem icin Matlab kullanilarak program yazildi. Daha sonra bir cok model altinda kapsamli bir simulasyon calismasi yurutuldu. Sonuclar literaturle uyumlu olmakla beraber ustunde durulmasi gereken bazi onemli noktalar da bulunmustur. Literaturde onerildigi sekilde genel olarak MM-tahmin edicileri en etkin tahmin edicilerdir ve burada ele alinan dayanikli tahmin ediciler arasinda S-tahmin edicileri en az etkinlige sahiptirler. Dogal olarak EKK tahmin edicileri hassas yapilari sebebiyle varsayilan modelden sapmalardan kotu bir sekilde etkilenmektedirler. Ayrica hata teriminin varyansinin EKK tahmin edicisi yansizken burada ele alinan dayanikli tahmin edicilerinin genelde yanli oldugu bulunmustur. Bunun yaninda hata teriminin varyansinin MM-tahmin edicisi diger dayanikli tahmin edicilere gore daha az yanliyken orneklem hacmi arttikca da yan miktari digerlerine gore daha hizli bir sekilde azalmaktadir. Calismanin sonunda daha aydinlatici olmasi icin ilgili yorumlariyla beraber iki gercek hayat verisi ornegi verilmistir.


Introduction
The general linear regression (GLR) model covers many particular cases and for this reason it can be used for general purposes [1].The GLR model can be given as follows Here, Y is the vector of the response variable, X is the matrix of the independent variables, β is the vector of the parameters, ε is the vector of the error term, n is the sample size and p is the number of slope parameters.The assumptions related to Eq. ( 1) are where I is the identity matrix.The second assumption implies constant variance and the independence of the error terms (under normality).In some situations, instead of the second assumption, the following assumption can be made for flexibility about the variance of the error terms and their independence but at an expense of much more complicated estimation process.

Σ ) (  ε Var
In real life, there are very rare situations where the parameters are known.Thus, the sample should be used effectively to estimate the parameters of interest.The least squares (LS) method is generally used for the estimation of the parameters since it is very easy and straightforward but also known to be very sensitive against deviations from the assumed models and distributions.Quite many estimators which are called robust have been proposed so far to compensate for the sensitive nature of the LS estimators but none of them were fully efficient under normality although some of them possess satisfactory efficiencies.In this study we intended to compare the efficiency and robustness of the most popular estimators related to the GLR model via simulation.In the literature their efficiencies are already known but there are more points to be revealed about their properties.For this purpose, we first wrote programs for each method by using Matlab and conducted an extensive simulation study by using our own programs.Using our own programs in the simulations is an important contribution in the area.We also found some interesting features of the robust estimators of the variance of the error term.Conducting the regression analysis of the real life data sets using several graphical and numerical tools is another merit of the study.This paper is organized as follows.In Section 2 we give more detailed information about the literature and the methods used in this study.Section 3 presents the simulation results and the related comments.In Section 4 two real life data examples are given to illustrate the usage of the mentioned methods in the estimation of the parameters of interest with the related graphics and comments.The final section includes discussion and some concluding remarks.

The Literature Review and the Methods Included in the Study
In this section we will introduce the methods used in this study in detail.

The least squares method
The LS method is based on finding the estimators which minimizes the error sum of squares and used in many areas because of its easiness in the estimation process.Specifically, it is defined as follows It can also be defined by using the matrix format as min ε ε (3) Depending on Eq. ( 1), taking the derivative w.r.t.β gives the following LS estimator of The LS estimator of 2 σ is the minimized errors sum of squares divided by the degrees of freedom of the residuals which is given below for our case as

The weighted least squares method
The weighted least squares (WLS) method is based on finding the estimators minimizing the weighted error sum of squares.In fact both the WLS and the LS are special cases of the generalized least squares (GLS) which is based on finding the estimators minimizing the Mahalanobis distance of the errors.The WLS method with the given weights i w is defined as , we can also give its definition by using the matrix format as follows min ε W ε (7) Taking the derivative w.r.t.β gives the following WLS estimator

The iteratively reweighted least squares algorithm
Most of the robust estimation methods require iteration since they cannot be obtained explicitly.In order to solve them by iteration, generally, the iteratively reweighted least squares (IRWLS) algorithm is utilized.Andersen [2] gave the definition of its steps as follows Step 1: In the beginning, let iteration number q be zero, q=0.
) ( ˆq β is obtained as an initial estimate by using the LS method. Step 2: At iteration number q, the residuals ) (q i e are obtained by using the estimate ) ( ˆq β .By using the residuals ) (q i e , we calculate the estimate for the standard deviation of the error term ) ( ˆq σ with the median absolute deviation (MAD) from the formulas given below as proposed by Hampel et al. [3].
The coefficient 1.4826 in Eq. ( 9) is used to make the estimator of the standard deviation unbiased under normality (see Hampel et al. [3] for details).
Step 3: The residuals are standardized with and the weights ) (q i w are obtained by using ) (q i u .
Step 4: The WLS estimator is obtained which minimizes the weighted error sum of squares . When expressed in the matrix format, it can be given as below Step 5: The condition given in Eq. ( 12) is checked for convergence at iteration q for a prespecified small value δ .The iteration stops at convergence.Otherwise, we continue to Step 6.
Step 6: Iteration number q is increased by 1 unit, q=q+1.Then, we go to Step 2.

The least median of squares
The least median of squares (LMS) method was found by Rousseeuw [4].It is based on minimizing the median of the squares of the errors which is given below The LMS method is known to have high break down point (BDP) but low efficiency under normality [5].

The M-estimators
The M-estimators were introduced by Huber [6] as a result of a search to find a robust alternative for the LS method which is known to be very sensitive to possible shifts from the assumed model.The principle of the M-estimation is minimizing the sum of a selected ρ function of the errors instead of the sum of squares of them.Thus, in this sense, the LS method is a special case of the M-estimation method.More specifically, the definition of the M-estimators can be given by the following expression There are many alternatives for ρ function serving different purposes which can be found in Türkay [7].Though depending on the selection of ρ function, in general, the M-estimators are robust with low BDP and high efficiency w.r.t. the LMS estimators.In general, ρ functions are not linear, and for this reason, the estimation process requires iteration.It is quite common to use the IRWLS algorithm to obtain the estimators.Susanti et al. [8] gave an algorithm which can be used to obtain the M-estimators.Its only difference from the IRWLS algorithm is that it includes some tests about the validity of the regression model, the existence of outliers and the significance of the independent variables in the model.For the M-estimators, we used two weight functions, namely, the Huber and Tukey bisquare weight functions.The weight function of Huber is [7]  The weight function of Tukey bisquare is [7] We used c=1.345 and c=4.685 in our study, respectively, for the cases of the Huber and Tukey Mestimation as suggested by Holland and Welsch [9] to maintain 95% asymptotic efficiency w.r.t. the LS estimators under normality.

The S-estimators
The S-estimators which possess high BDP were found by Rousseeuw and Yohai [10].They are called the Sestimators because they are based on the scale estimation of the errors.It is the generalized form of the LMS method [11].The S-estimation method minimizes the sum of the function ρ of the scaled errors satisfying the conditions defined in Rousseeuw and Yohai [10].It is aimed to increase the efficiency of the LMS method by using a robust but more efficient scale estimator than the median [8].In this sense, it is defined by the following expression For a specific sample, according to Rousseeuw and Yohai [10], the following equation is solved to obtain the S-estimators Here, K is the expected value of ρ under the standard normal distribution, s ˆ is the S-estimator of scale for the error term and i e are the residuals calculated by using the S-estimator of the β vector [5].Susanti et al. [8] gave an algorithm to obtain the S-estimators as follows Step 1: β ˆ is obtained as an initial estimate by using the LS method.
Step 2: The residuals are obtained by using the latest estimate β ˆ.
Step 3: By using the residuals obtained in the previous step, we calculate the estimate for the standard deviation of the error term by using the following formula Step 4: The residuals are standardized as .
Step 5: The weights i w are obtained by the following formula Step 6: β ˆ is obtained from Eq. ( 8) by using the weights i w by utilizing the WLS method.
Step 7: The steps 2-6 are repeated till convergence between the latter and former estimates is established.
Rousseeuw and Yohai [10] suggested using c=1.547so that the BDP of the S-estimators is 0.5 (50%).Stuart [12] stated that the following objective function which is associated with the Tukey bisquare weight function can be used in obtaining the S-estimators By taking numerical integration in Matlab, we obtained the corresponding BDP values for some specific values of c and K where K is directly related to the value of c. Rousseeuw and Yohai [10] also provided the asymptotic relative efficiencies of the Sestimators for some selected values of c w.r.t. the LS estimators under normality.Table 1 given here is in exact conformity with the corresponding tables in Rousseeuw and Yohai [10] and Stuart [12].
It was noted by Rousseeuw and Yohai [10] that the Sestimators with the tuning constant c=1.547 can hardly be used as a final estimate because of a very low asymptotic efficiency of 28.7% w.r.t. the LS estimators under normality but they can be used as an initial estimate because of the high BDP of 50%.

The MM-estimators
The MM-estimators were found by Yohai [13] to maintain both high efficiency under normality and robustness with high BDP at the same time.He suggested using the S-estimators with the tuning constant c=1.547 in the early stage to maintain a BDP of 50%, and using the S-estimators for the standard deviation and the M-estimators for β both with the tuning constant c=4.685 in the remaining stages to maintain an asymptotic relative efficiency of 95% w.r.t. the LS estimators under normality using the Tukey bisquare function in all stages.The steps of the algorithm to obtain the MM-estimators are [8] Step 1: β ˆ is obtained as an initial estimate by using the S-estimation method with the tuning constant c=1.547.
Step 2: The residuals are obtained by using the latest estimate β ˆ.
Step 3: By using the residuals obtained in the previous step, we calculate the estimate for the standard deviation s σ ˆ by using the S-estimation method but with the tuning constant c=4.685.
Step 4: The residuals are standardized as .
Step 5: The weights i w are obtained by using the Tukey bisquare function as in Eq. ( 16) with c=4.685.
Step 6: β ˆ is obtained from Eq. (8) by using the weights i w by utilizing the WLS method.
Step 7: The steps 2-6 are repeated till convergence between the latter and former estimates is established.

The Simulation Results
In order to compare the efficiency and robustness of the estimators mentioned in this paper, namely, the LS estimators, the Huber and Tukey M-estimators, the S-estimators and the MM-estimators, an extensive simulation study was conducted including several models.Although all the programs in this study were written in Matlab for the GLR model given in Eq. ( 1), for easy interpretation and commentary, the simulations were conducted for the simple linear regression model which is a special case of the GLR model and also given below (see Mutlu [14] for details) Remark: For the models 2-4, X is a design variable instead of a normally distributed (or a stochastic) variate.Although this is not realistic in most of the situations in real life, X is generally assumed to be a design variable in regression analysis for easiness in theoretical inferences.See Sazak et al. [15] for a detailed discussion on this topic.Since this is quite common in the literature, we took X as a design variable for these models.
We produced For all the iterative algorithms, we used 0.00001  δ in Eq. ( 12) to check whether the convergence is established.For easy interpretation, we scaled the errors by dividing them by for model 3 and 4, respectively, so that Tables 2-4 include the simulation results for the model 1 for n=30, 50 and 100, respectively.
Depending on the simulation results, in general, we see a quite good efficiency of the MM-estimators even for the bivariate normal distribution.We see that all the estimators of 0 β and 1 β produce unbiased estimates.When we investigate the estimation of 2  σ , all the estimators have downward bias other than the LS estimator.In general, the biases of the robust estimators tend to decrease as the sample size increases but the bias of the S-estimator of 2  σ is the largest of all.For this model, as expected, the best performance was shown by the LS estimators but the MM-estimators have quite high efficiencies compared to the other robust estimators.The MM-estimator of 2 σ has higher efficiency than the LS counterpart with the advantage of producing a biased estimator.Although this may sound weird, some methods deliberately produce bias to have smaller variance and mse such as ridge regression.As the sample size increases, the bias of the MM-estimator of 2 σ becomes almost zero which makes it slightly less efficient than the LS estimator but its efficiency is still quite impressive such as 99.71%.The worst performance was shown by the S-estimators for the model 1.This is not a surprising result since the Sestimators are generally used for the initial stages in iterations.We give the simulation results for the model 2 in Tables 5-7.The results are quite similar with those in the model 1.In this part of the study we investigate the robustness of the estimators we mentioned before by using the model 3 and 4. The simulation results belonging to the model 3 are given in Tables 8-10.This model represents the mixture type outlier model.For 0 β and 1 β , although there are no big differences between the M-estimators and the MM-estimators, in general, we can say that the MM-estimators outperform the M-estimators.The LS estimators are the least efficient ones because of their sensitive nature.The S-estimators show the worst performance among the robust estimators mentioned here.For 2 σ , we see very interesting results.The only unbiased estimates are produced by the LS estimator and all the robust estimators produce downward bias and this downward bias does not get smaller as the sample size increases.This fact makes the LS estimator of 2 σ , asymptotically the most efficient one.Although the robust estimators are taking advantage of their bias compared to the LS estimator for small sample sizes (such as 30), as the sample size increases, their relative efficiencies drop steadily and dramatically.Most of the robust estimators of 2 σ get worse than the LS estimator after the sample size of 30.The only robust estimator of 2  σ which survives after 30 is the MM-estimator.Even at the sample size of 100, it is 113.61% efficient w.r.t. the LS estimator of 2  σ but surely, it will not last so long since the bias stays the same.Tables 11-13 include the simulation results for the model 4 which represents the Dixon's outlier model.For this model, the effect of the outliers on the LS estimators is more devastating than that in the model 3. Here, for 0 β and 1 β , the M-estimators are performing better than the MM-estimators.Among the M-estimators, there are no big differences.Again for 0 β and 1 β , compared to the model 3, the Sestimators are doing better, at least not as bad as they are doing in the model 3.When we investigate the situation in the estimation of 2  σ , we see that, similar to the model 3, all the robust estimators have downward bias and it stays the same regardless of the sample size but the effect of this bias is much more devastating than that in the model 3.Even at the sample size of 30, only the MM-estimator could survive since only the MM-estimator is better than the LS estimator of 2 σ .For the large sample size (n=100), even the efficiency of the MM-estimator drops below 100% (91.89%) w.r.t. the LS estimator.The M-estimators and the S-estimator of 2 σ are very badly affected from the bias similar to the model 3 but here the situation is much worse.This simulation result also shows the asymptotic superiority of the LS estimator of 2 σ because of its unbiasedness, surely for the distributions with finite mean and variance.

Real Life Data Examples
In this section we illustrate the methods explained in this study with two real life data examples.For both examples we will explore the data sets in detail using several graphical and numerical tools and give the results with the related comments.

The real life data example 1
For the first example we will work on the R air quality data set containing 153 daily readings of air quality values between May 1 and September 30 in 1973 from New York including 6 variables.Because of the missing values in the data set, only 111 observations will be used in the regression analysis [12].The response variable Y is the mean ozone concentration (in parts per billion) from 13:00 to 15:00 hours at Roosevelt Island.There are three independent variables which are 1 X ; the solar radiation in Langleys in the frequency band 4000-7700 from 08:00 to 12:00 hours at Central Park, 2 X ; the average wind speed (in miles per hour) between 07:00 and 10:00 hours at LaGuardia Airport and 3 X ; the maximum daily temperature in degrees Fahrenheit again at LaGuardia Airport.
First we assume that the GLR model given in Eq. ( 1) and the very general conditions related to this model given just after it are right which we will check in the following stages.Then, we obtain and give the parameter estimates produced by the estimators mentioned in this study in Table 14.It is quite surprising to see that the S-estimators produced the smallest MSE and the largest 2  R values with great differences from the corresponding values produced by the other estimators despite the known low efficiency of the S-estimators in the literature.While the Huber and Tukey M-estimators and the MMestimators produced close results, the Huber Mestimators are slightly worse.The LS estimators produced the poorest result regarding the MSE and 2 R .Now we will investigate the validity of the model, the assumptions and possible outliers if any.Please note that the estimate of 2  σ and the MSE value for the LS method are the same while for the other methods they are different.In order to verify the current model, we conduct residual analysis based on the LS estimates.The plot of Y ˆ vs. the studentized deleted residuals ( ) is given in Figure 1 (see Neter et al. [1] for the definitions).The quantile-quantile (Q-Q) plot of under normality is given in Figure 3 which is in fact needed just for the hypothesis testing process and the possible full efficiency of the LS estimators.It also shows the existence of the outliers one of which being very extreme while also showing the approximate normality of the remaining residuals.From all the plots obtained here, we can observe that there are totally 3 outliers one of which being very extreme.These are, from the largest magnitude to the smallest, the 77 th ( ) (i i t =5.1440), the 34 th ( ) (i i t =2.8963) and the 23 rd ( ) (i i t =2.6813) observations.One should discard these observations unless a robust method is utilized.The existence of the outliers may be the reason of the high performance of the S-estimators for this real life data set.

The real life data example 2
For the second example we will work on the data set obtained from 22 patients who applied to Hacettepe University Hospital in Ankara, the capital city of Turkey [16].The data set contains 3 independent variables,    4.This plot verifies that the model is right and the assumptions are valid since there is no systematic behavior and any change in the variability of the residuals.We just observe one outlier on the negative side of the residuals, possibly showing an observation which is much smaller than the others.This outlier is also detected by the boxplot given in Figure 5.The Q-Q plot of under normality is given in Figure 6.The Q-Q plot shows that there is one outlier and the rest of the residuals have almost a perfect normal distribution.The outlier is the 7 th observation with ) (i i t =-2.7939.Again for this real life data set, we can say that the Sestimators may be better because of the existing outlier since they are known with their extreme robustness despite their inefficiency compared to the other robust estimators.

Discussion and Conclusion
In this study we compared several estimators for the parameters of the GLR model including the LS estimators, the Huber and Tukey M-estimators, the Sestimators and the MM-estimators via simulation by using our own programs written in Matlab.We obtained results consistent with the literature but also found some interesting results to be remarked.The MM-estimators are, in general, the most efficient ones as expected.The S-estimators are the least efficient ones among the robust estimators studied in this paper.The LS estimators are naturally the most efficient ones under normality but too sensitive to the deviations from the assumed models.We have also found that the robust estimators of the variance of the error term are generally biased and in some situations they stay biased despite the increase in the sample size whereas the LS estimator of the variance of the error term is always unbiased.Among the robust estimators of the variance of the error term, the MM-estimator is less biased than the others and its bias tends to get smaller faster compared to the others in most of the situations.In order to be more illustrative, we gave two real life data examples using some extra statistical measures and graphics to check the validity of the GLR model and the assumptions made.We also gave the values of the MSE and 2 R for the comparison of the estimators mentioned here.Surprisingly, the S-estimators showed brilliant The existence of the outliers may be the reason of this performance.This fact also made the LS estimators the worst of all.The case studies are beneficial for illustration but they are also useful to see that all the real life data sets are original and they have to be considered on their own.

and 2 σ
for all the estimators we mentioned before and obtained their simulated means, biases, variances and mean square errors (mse), and calculated the relative efficiency (REff) of the estimators w.r.t. the LS estimators.The REff of 1 θ w.r.t. 2 θ , both being the estimators of θ , can be given by the following formula

Figure 1 .
Figure 1.The plot of Y ˆ vs.

Figure 1
Figure 1 shows no sign of a mispecified model or invalidity of the assumptions since the residuals are randomly distributed without any systematic band while showing a couple of positive outliers one of which being very extreme and tagged on the plot.The boxplot of ) (i i t given in Figure 2 also supports the existence of the outliers.

Figure 2 .
Figure 2. The boxplot of

Figure 3 .
Figure 3.The Q-Q plot of

1 X 2 X 3 X
; the amount of the osteocalcin hormone, ; the amount of the parathyroid hormone and ; the age.The response variable Y is the bone mineral density.Again, we assumed the GLR model and the related conditions in the first place of Sample Data versus Standard Normal and obtained the estimates depending on them.The estimates are given in

Figure 4 .
Figure 4.The plot of Y ˆ vs.

for the example 2 Figure 5 .
Figure 5.The boxplot of

Figure 6 .
Figure 6.The Q-Q plot of Sample Data versus Standard Normal N. Mutlu, H.S. Sazak / The Comparison of the Estimators for the Parameters of the General Linear Regression Model via Simulation and Two Real Life Data Examples 130 performance in both examples despite their known low efficiency compared to the other robust estimators.

Table 1 .
The asymptotic relative efficiency and BDP of the S-estimators corresponding to some selected values of c and K for the Tukey bisquare function

Table 2 .
. Mutlu, H.S. Sazak / The Comparison of the Estimators for the Parameters of the General Linear Regression Model via Simulation and Two Real Life Data Examples 124 The simulated values for the model 1 with n=30 N

Table 3 .
The simulated values for the model 1 with n=50

Table 4 .
The simulated values for the model 1 with n=100

Table 5 .
The simulated values for the model 2 with n=30

Table 6 .
The simulated values for the model 2 with n=50

Table 7 .
The simulated values for the model 2 with n=100

Table 8 .
The simulated values for the model 3 with n=30

Table 9 .
The simulated values for the model 3 with n=50

Table 10 .
The simulated values for the model 3 with n=100

Table 11 .
The simulated values for the model 4 with n=30

Table 12 .
The simulated values for the model 4 with n=50

Table 13 .
The simulated values for the model 4 with n=100

Table 14 .
The regression estimates for the example 1

Table 15 .
It is again surprising to see that the best result is obtained by using the Sestimators regarding the MSE and 2 R although the S- estimators are known to be inefficient compared to the other robust estimators.The MSE and 2 R values of the other robust estimators are very close to each other.The LS estimators show the poorest performance producing the largest MSE and the lowest 2 R values.

Table 15 .
The Now we will investigate the validity of the model and the assumptions accepted in the beginning of the study.The plot of Y ˆ vs.