Genetic Algorithm Based Outlier Detection Using Bayesian Information Criterion in Multiple Regression Models Having Multicollinearity Problems
Abstract
Multiple linear regression models are widely used applied statistical techniques and they are most useful devices for extracting and understanding the essential features of datasets. However, in multiple linear regression models problems arise when a serious outlier observation or multicollinearity present in the data. In regression however, the situation is somewhat more complex in the sense that some outlying points will have more influence on the regression than others. An important problem with outliers is that they can strongly influence the estimated model, especially when using least squares method. Nevertheless, outlier data are often the special points of interests in many practical situations. Another problem is multicollinearity in multiple linear regression (MLR) models, defined as linear dependencies among the independent variables. The purpose of this study is to define multicollinearity and outlier detection method using a Genetic Algorithm (GA) and Bayesian Information Criterion (BIC) in multiple regression models. Also, GA with BIC is to illustrate the algorithm with real and simulation data for outlier detection in MLR models having multicollinearity problems.
Key Words: Bayesian Information Criterion, Genetic Algorithms, Multicollinearity, Multiple Linear Regression, Outlier Detection.
Keywords
References
- Acuna, E., Rodriguez, C., “On Detection Of Outliers And Their Effect In Supervised Classification”, http://academic.uprm.edu/~eacuna/vene31.pdf
- Amidan, B., Ferryman, T., Cooley, S., “Data Outlier Detection Using the Chebyshew Theorem, In: Aerospace”, IEEE Aerospace Conference Proceedings, IEEE, Piscataway NJ USA, 3814- 3819 (2005).
- Atkinson, A.C., “Influential Observations, High Leverage Regression”, Statistical Science, 1: 397-402 (1986). Outliers in Linear
- Barnett, V., Lewis T., “Outliers in Statistical Data 3rd ed.”, John Wiley and Sons, USA (1994).
- Belsley, D.A., “Conditioning Diagnostics”, Wiley, New York (1991).
- Barker, M., “A Comparisons of Principal Component Regression and Partial Least Squares Regression”, Multivariate Project (1997).
- Birkes and Dodge, “Alternative Methods of Regression”, 3th ed., John Wiley & Sons, Canada (1993).
- Davies, L., Gather, U., “The Identification of Multiple Outliers”, Journal of the American Statistical Association, 88(423): 797-801 (1993).
Details
Primary Language
English
Subjects
-
Journal Section
-
Publication Date
March 26, 2010
Submission Date
March 26, 2010
Acceptance Date
-
Published in Issue
Year 2009 Volume: 22 Number: 3