A Comparison of Different Ridge Parameters under Both Multicollinearity and Heteroscedasticity ve Değişen Varyans Altında Farklı Ridge Parametrelerinin Bir Karşılaştırması

: One of the major problems in fitting an appropriate linear regression model is multicollinearity which occurs when regressors are highly correlated. To overcome this problem, ridge regression estimator which is an alternative method to the ordinary least squares (OLS) estimator, has been used. Heteroscedasticity, which violates the assumption of constant variances, is another major problem in regression estimation. To solve this violation problem, weighted least squares estimation is used to fit a more robust linear regression equation. However, when there is both multicollinearity and heteroscedasticity problem, weighted ridge regression estimation should be employed. Ridge regression depends on the ridge parameter which does not have an explicit form of calculation. There are various ridge parameters proposed in the literature. A simulation study was conducted to compare the performances of these ridge parameters for both multicollinear and heteroscedastic data. The following factors were varied: the number of regressors, sample sizes and degrees of multicollinearity. The performances of the parameters were compared using mean square error. The study also shows that when the data are both heteroscedastic and multicollinear, the estimation performances of the ridge parameters differs from the case for only multicollinear data.


Introduction
One of the assumptions of the classical linear regression model is nonexistence of heteroscedasticity. Heteroscedasticity is the situation that occurs when the error terms vary. When this assumption is violated, then the Gauss-Markov theorem does not apply. In this case, ordinary least squares (OLS) estimator is not the best linear unbiased estimator (BLUE) having the minimum variance among the other unbiased estimators. The linear regression model is given as where is a × 1 vector of observations is a × 1 vector of unknown regression coefficients, is a × known design matrix of rank and is a × 1 vector random variable having multivariate normal distribution with mean vector and variancecovariance matrix 2 where is an identity matrix of order .
OLS estimator minimizes the residual sum of squares (RSS) in a linear regression model. The calculation of the RSS is given as follows.
The weighted least square estimator is the alternative method to OLS when the error is heteroscedastic. In the weighted least squares estimation, the weighted sum of squares given below is minimized.
where, = 1/ 2 so that, the maximum likelihood estimation is recovered. Another violation of assumptions in a classical linear regression model is the problem of collinearity. Multicollinearity exists when the regressors are related to each other. There are some techniques proposed to overcome this problem such as ridge regression. Ridge regression is a biased estimation technique which was introduced by Hoerl and Kennard [1]. For the linear regression model given in Equation (1), The usual least squares estimate (LSE) or the maximum likelihood estimate (MLE) of is given by This estimate depends on the characteristics of the matrix = ′ . If there are dependencies among the columns of the matrix , this is a problem called multicollinearity. Then, the least square estimators do not give correct estimates. Hoerl and Kennard [1] suggested a method called ridge regression to solve that problem. They use a modified and take ( ) = + , ≥ 0. Then the resulting estimators become which are known as ridge regression estimators. The constant > 0 is called the ridge parameter.
There is not an explicit way of calculating k, however, in literature there are many different formulas proposed by different researchers for estimating such as Hoerl and Kennard [1,2].
where ̂2 is the estimated error variance from ordinary least square (OLS) regression and ̂m ax 2 is the square of the maximum of unknown regression coefficient estimate.
where ̂ is the ith unknown regression coefficient OLS estimate.

Literature review
There is a considerable amount of studies in ridge regression analysis dealing with the estimation of the ridge parameters. Hence, in this section, we only present some recently published studies. Macedo et al. [17] present a new method to estimate the ridge parameter, based on the ridge trace and an analytical method borrowed from maximum entropy. Based on a simulation study, Mansson et al. [18] have found that increasing the correlation between the independent variables has a negative effect on the mean square error (MSE) and prediction sum of square (PRESS) of some considered ridge parameters. Mansson and Shukur [19] have investigated some logistic ridge regression parameters and they have shown that there is at least one ridge regression estimator that has a lower mean square error than the maximum likelihood method for all situations. Salam [20] has introduced an alternative procedure having a smaller mean square error for determining the ridge parameter. Khalaf [21] proposes two ridge regression parameters and demonstrates the performance of the proposed estimators outperforming the OLS and other estimators. Hamed et al. [22] propose a technique related to ridge parameter selection which depends on a mathematical programming model. Mansson et al. [23] introduce a new Ridge Regression Granger Causality (RRGC) test and they compare it to the GC test employing some Monte Carlo simulations. Dorugade [24] introduces some new ridge parameter estimators based on the correlation between the response and regressors and tests their optimality through simulation. Wong and Chiu [25] compare the mean squared errors of 26 different ridge parameter estimators. They also propose a new approach which minimizes the empirical mean squared errors iteratively. Al Somahi et al. [26] propose some new methods for choosing the suitable ridge parameter for logistic regression. Duzan and Shariff [27] investigate the robustness of the ridge regression method. They show that the system stabilizes in a region of k, where k is a positive quantity less than one. The values of k depend on the degree of correlation between the independent variables. They have also shown that k is a linear function of the correlation between the independent variables. Kibria and Banik [28] make a comparison of 28 different ridge regression estimators and they propose five new ones. They conduct a simulation study to evaluate the performances of them. Alibuhtto [29] has generated simulation data with different levels of correlation coefficient by Monte Carlo techniques. The level of multicollinearity is determined by the correlation matrix, the variance influence factor (VIF) and the condition number. It was found that the ridge parameter k and sample sizes are negatively correlated with a significance level of 5%. Lukman and Ayinde [30] classify the estimators based on the ones of Hoerl and Kennard [1,2] into different forms and various types. They also propose some modifications to improve those estimators. Bhat and Raju [31] present some popular ridge estimators and provide a generalized class of ridge estimators as well as a modified ridge estimator. They evaluate the performance of them through a Monte Carlo simulation technique. Uzuke et al. [32] consider some ridge estimators as well as proposing some new methods as a solution for skewed eigenvalues of the matrix of explanatory variables. They have found that when the sample size increases, the Prediction Sum of Squares (PRESS) value decreases as the correlation coefficient becomes large. Macedo [33] has improved the ridge-GME parameter estimator, which combines ridge regression and generalized maximum entropy to eliminate the subjectivity in the analysis of the ridge trace. Lukman et al. [34] classify the estimators based on Dorugade [14] into different forms. They also provide some new ridge estimators. Giacalone et al. [35] provide various proposed ridge estimators, then introduce method, based on -norm estimation. Their method is an adaptive robust procedure which is used when the residual distribution deviates from normality. They state that their new approach produces more efficient estimates for different levels of multicollinearity.

Simulation process
At the stage of generating collinear data having heteroscedasticity, The linear regression model considered is as follows where the coefficients are determined to be identically 1. The regressors, which have a certain degree of multicollinearity within the linear regression model, have been generated by the following equality where, where s the degree of multicollinearity which is the assumed correlation between regressors and and is the error term having the standard normal distribution. The simulation design is summarized in Table 1. For each type of generation the study has been replicated for 10,000. The comparison has been made according to the following Mean Square Error (MSE) criterion.

Results and Discussions
The following are the results of the MSE values provided in Table 2 to 8: As the degree of collinearity increases for some of the cases, the MSE values of some of the ridge parameters either tend to decrease or increase. However, for a large number of regressors, the MSE values of the ridge parameters increases regardless of the sample size. The ridge parameter K5 performs well for weak degrees of multicollinearity in most cases. Also, the ridge parameters performing well in the previous study did not succeed well in this study. Instead, if the data set is heteroscedastic, then only K5, K8, K19 and K20 estimators should be considered for use. For moderate or fairly strong degrees of multicollinearity in any sample size, when the number of regressors is less than or equal to 5, K20 estimator usually seems to perform the best. For large number of regressors, in small sample sizes, K8 appears to be the best ridge parameter while K19 takes the lead for larger sample sizes. On the other hand, in a study given by Göktaş and Sevinç [16] they showed that when the degree of multicollinearity is large as 0.5, then K12 is the best for multicollinear data. When the degree of multicollinearity is low as 0.3 and there are three regressors, K21 seems to be the best estimator for any sample size. However, when the number of regressors increases to 7, K15 seems to be the best for a sample size less than 250. Moreover, for large sample sizes, K25 and K21 produce the best results.
Briefly, this study and the previous study of Göktaş and Sevinç [16] show that when the data are only multicollinear or both multicollinear and heteroscedastic, there is not a specific ridge parameter having the best estimation performance.
We think this study will be helpful for researchers who have to use the weighted ridge regression method with data involving both multicollinearity and heteroscedasticity by guiding them in the selection of the appropriate ridge parameter, taking the number of regressors, sample size and the degree of multicollinearity into consideration.