Research Article

Outlier Detection in Multiple Regression Models Using Genetic Algorithms and Bayesian Information Criteria

Volume: 6 Number: 1 July 15, 2008
  • Özlem Gürünlü Alma *
  • Serdar Kurt
  • Aybars Uğur
EN TR

Outlier Detection in Multiple Regression Models Using Genetic Algorithms and Bayesian Information Criteria

Abstract

Statistical models, particularly regression models, are most useful devices for extracting and understanding the essential features of datasets. However, most of the databases in real-world include a particular amount of abnormal values, generally termed as outliers. An accurate identification of outliers plays a significant role in statistical analysis especially regression models. Nevertheless, many classical statistical models are blindly applied to data sets containing outliers, the results can be misleading at best. The appearance of outliers can exert negative influences on the fit of the multiple regression models. The aim of this study is to define outlier detection method using Genetic Algorithms (GA) with Bayesian Information Criterion (BIC) and to illustrate the algorithm with real and simulation data. We use a fitness function which is based on BIC in this algorithm. The criteria’s value indicates a better model to fit data, the presence of one or more outliers will negatively impact the regression model and result in larger BIC values.

Keywords

References

  1. Abe, N., Zadronzy, B., and Langford, J., 2006. Outlier detection by active learning. ACM. Proceedings of the 12th ACM SIGKDD International conference on Knowledge Discovery and Data Mining, 767-772, New York, USA.
  2. Acuna, E., and Rodriguez, C., 2005. On detection of outliers and their effect in supervised classification, http://academic.uprm.edu/~eacuna/vene31.pdf, 30 April 2008.
  3. Amidan, B., Ferryman, and T., Cooley S., 2005. Data outlier detection using the Chebyshew theorem. IEEE Aerospace Conference Proceedings, IEEE, Piscataway NJ USA, 3814-3819.
  4. Atkinson, A.C., 1986. Influential observations, high leverage points, and outliers in linear regression. Statistical Science, 1, 397-402.
  5. Barnett, V., and Lewis, T., 1994. Outliers in statistical data. John Wiley and Sons, USA.
  6. Ben-Gal I., 2005. Outlier detection.,131-146. In: Maimon O. and Rokach L., Data mining and knowledge discovery handbook. Springer, USA.
  7. Bozdogan, H., 2004. Statistical data mining and knowledge discovery. Chapman and Hall/CRC, USA.
  8. Breitenbach, M., and Grudic, G.Z., 2005. Clustering through ranking on manifolds. Proceedings of the 22nd International Conference on Machine Learning, 73-80, New York, USA.

Details

Primary Language

English

Subjects

Statistics

Journal Section

Research Article

Authors

Özlem Gürünlü Alma * This is me
Türkiye

Serdar Kurt This is me
Türkiye

Aybars Uğur This is me
Türkiye

Publication Date

July 15, 2008

Submission Date

January 4, 2008

Acceptance Date

-

Published in Issue

Year 2008 Volume: 6 Number: 1

APA
Gürünlü Alma, Ö., Kurt, S., & Uğur, A. (2008). Outlier Detection in Multiple Regression Models Using Genetic Algorithms and Bayesian Information Criteria. İstatistik Araştırma Dergisi, 6(1), 38-51. https://izlik.org/JA82FB63DX
AMA
1.Gürünlü Alma Ö, Kurt S, Uğur A. Outlier Detection in Multiple Regression Models Using Genetic Algorithms and Bayesian Information Criteria. JSRTR. 2008;6(1):38-51. https://izlik.org/JA82FB63DX
Chicago
Gürünlü Alma, Özlem, Serdar Kurt, and Aybars Uğur. 2008. “Outlier Detection in Multiple Regression Models Using Genetic Algorithms and Bayesian Information Criteria”. İstatistik Araştırma Dergisi 6 (1): 38-51. https://izlik.org/JA82FB63DX.
EndNote
Gürünlü Alma Ö, Kurt S, Uğur A (July 1, 2008) Outlier Detection in Multiple Regression Models Using Genetic Algorithms and Bayesian Information Criteria. İstatistik Araştırma Dergisi 6 1 38–51.
IEEE
[1]Ö. Gürünlü Alma, S. Kurt, and A. Uğur, “Outlier Detection in Multiple Regression Models Using Genetic Algorithms and Bayesian Information Criteria”, JSRTR, vol. 6, no. 1, pp. 38–51, July 2008, [Online]. Available: https://izlik.org/JA82FB63DX
ISNAD
Gürünlü Alma, Özlem - Kurt, Serdar - Uğur, Aybars. “Outlier Detection in Multiple Regression Models Using Genetic Algorithms and Bayesian Information Criteria”. İstatistik Araştırma Dergisi 6/1 (July 1, 2008): 38-51. https://izlik.org/JA82FB63DX.
JAMA
1.Gürünlü Alma Ö, Kurt S, Uğur A. Outlier Detection in Multiple Regression Models Using Genetic Algorithms and Bayesian Information Criteria. JSRTR. 2008;6:38–51.
MLA
Gürünlü Alma, Özlem, et al. “Outlier Detection in Multiple Regression Models Using Genetic Algorithms and Bayesian Information Criteria”. İstatistik Araştırma Dergisi, vol. 6, no. 1, July 2008, pp. 38-51, https://izlik.org/JA82FB63DX.
Vancouver
1.Özlem Gürünlü Alma, Serdar Kurt, Aybars Uğur. Outlier Detection in Multiple Regression Models Using Genetic Algorithms and Bayesian Information Criteria. JSRTR [Internet]. 2008 Jul. 1;6(1):38-51. Available from: https://izlik.org/JA82FB63DX