Research Article
BibTex RIS Cite

Doğrusal Regresyonda Uç Değer Tespit Yöntemlerinin Karşılaştırılması: Çok Kriterli Karar Verme Yaklaşımı

Year 2023, Volume: 7 Issue: 2, 333 - 347, 29.12.2023
https://doi.org/10.26650/acin.1327370

Abstract

Bu makale, doğrusal regresyonda bilinen ve çağdaş aykırı değer tespit yöntemlerini değerlendirmek için bir dizi simülasyon çalışmasının uygulanmasına odaklanmaktadır. Bu simülasyonlar, gözlem sayılarının, parametre sayılarının ve kirlenmenin yönü ve oranı dahil olmak üzere farklı parametreler için gerçekleştirilmiştir. Kaydedilen nihai parametre tahminleri ve Çok Kriterli Karar Verme (ÇKKV) araçları kullanılarak tahmincilerin sıralanması sağlanmıştır. Çalışma, tahmincilerin başarısının simülasyon ayarlarına bağlı olarak değiştiğini ortaya koymaktadır. ÇKKV analizi sonuçları, kirlenme yönünün ve oranının bilinmediği durumlarda uygulanabilecek tahmincilerin sınırlı sayıda olduğunu göstermektedir. Ayrıca, en başarılı yöntemler artan hesaplama zamanı gerektirirken, bazı alternatifler orta sıralamalarla kısa süreler içinde uygulanabilirlik göstermektedir. Bu bulgular, altta yatan modelin bilindiği ve potansiyel aykırı değerlerin olabileceği senaryolarda regresyon analizi kullanan araştırmacılar için değerli öngörüler sunmaktadır.

Project Number

-

References

  • R. Adnan, H. Setan, and M. N. Mohamad. Identifying multiple outliers in linear regression: Robust fit and clustering approach. In The 10th FIG International Symposium on Deformation Measurements, SESSION X : THEORY OF DEFORMATION ANALYSIS II, pages 380-389, Orange, California, USA, 2000. google scholar
  • S. Barratt, G. Angeris, and S. Boyd. Minimizing a sum of clipped convex functions. Optimization Letters, 14:2443-2459, 2020. google scholar
  • D. A. Belsley, E. Kuh, and R. E. Welsch. Regression diagnostics: Identifying influential data and sources of collinearity. 1980. ISBN 0-471-05856-4. google scholar
  • J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah. Julia: A fresh approach to numerical computing. SIAM review, 59(1):65-98, 2017. doi:10.1137/141000671. google scholar
  • N. Billor and G. Kiral. A comparison of multiple outlier detection methods for regression data. Communications in Statistics—Simulation and Computation®, 37(3):521-545, 2008. google scholar
  • N. Billor, A. S. Hadi, and P. F. Velleman. Bacon: blocked adaptive computationally efficient outlier nominators. Computational statistics & data analysis, 34(3):279-298, 2000. google scholar
  • N. Billor, S. Chatterjee, and A. S. Hadi. A re-weighted least squares method for robust regression estimation. American journal of mathematical and management sciences, 26(3-4):229-252, 2006. google scholar
  • S. Chatterjee and M. Machler. Robust regression: A weighted least squares approach. Communications in Statistics-Theory andMethods, 26(6): 1381-1394, 1997. google scholar
  • K. Deb. Multi-objective evolutionary algorithms. Springer handbook of computational intelligence, pages 995-1015, 2015. google scholar
  • D. Diakoulaki, G. Mavrotas, and L. Papayannakis. Determining objective weights in multiple criteria problems: The critic method. Computers & Operations Research, 22(7):763-770, 1995. doi:10.1016/0305-0548(94)00059-h. google scholar
  • A. S. Hadi and S. Chatterjee. Regression analysis by example. John Wiley & Sons, 2015. google scholar
  • A. S. Hadi and J. S. Simonoff. Procedures for the identification of multiple outliers in linear models. Journal of the American statistical association, 88(424):1264-1272, 1993. google scholar
  • D. M. Hawkins and D. Olive. Applications and algorithms for least trimmed sum of absolute deviations regression. Computational Statistics & Data Analysis, 32(2):119-134, 1999. google scholar
  • L. Huo, T.-H. Kim, and Y. Kim. Robust estimation of covariance and its application to portfolio optimization. Finance Research Letters, 9(3): 121-134, 2012. google scholar
  • C.-L. Hwang and K. Yoon. Methods for Multiple Attribute Decision Making. Springer Berlin Heidelberg, 1981. google scholar
  • F. Kianifard and W. H. Swallow. Using recursive residuals, calculated on adaptively-ordered observations, to identify outliers in linear regression. Biometrics, pages 571-585, 1989. google scholar
  • S. C. Narula, P. H. Saldiva, C. D. Andre, S. N. Elian, A. F. Ferreira, and V. Capelozzi. The minimum sum of absolute errors regression: a robust alternative to the least squares regression. Statistics in medicine, 18(11):1401-1417, 1999. google scholar
  • S. Opricovic. Multicriteria optimization of civil engineering systems, 1998. google scholar
  • S. Opricovic and G.-H. Tzeng. Multicriteria planning of post-earthquake sustainable reconstruction. Computer-Aided Civil and Infrastructure Engineering, 17(3):211-220, may 2002. doi:10.1111/1467-8667.00269. google scholar
  • D. Pena and V. J. Yohai. The detection of influential subsets in linear regression by using an influence matrix. Journal of the Royal Statistical Society: Series B (Methodological), 57(1):145-156, 1995. google scholar
  • A. Rahmatullah Imon. Identifying multiple influential observations in linear regression. Journal of Applied statistics, 32(9):929-946, 2005. google scholar
  • P. J. Rousseeuw. Least median of squares regression. Journal of the American statistical association, 79(388):871-880, 1984. google scholar
  • P. J. Rousseeuw and K. V. Driessen. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3):212-223, 1999. google scholar
  • P. J. Rousseeuw and K. Van Driessen. Computing lts regression for large data sets. Data mining and knowledge discovery, 12:29-45, 2006. google scholar
  • M. H. Satman. A new algorithm for detecting outliers in linear regression. International Journal of statistics and Probability, 2(3):101, 2013. google scholar
  • M. H. Satman. Fast online detection of outliers using least-trimmed squares regression with non-dominated sorting based initial subsets. International Journal of Advanced Statistics and Probability, 3(1):53, 2015. google scholar
  • M. H. Satman, S. Adiga, G. Angeris, and E. Akadal. Linregoutliers: A julia package for detecting outliers in linear regression. Journal of Open Source Software, 6(57):2892, 2021a. doi:10.21105/joss.02892. google scholar
  • M. H. Satman, B. F. Yıldırım, and E. Kuruca. Jmcdm: A julia package for multiple-criteria decision-making tools. Journal of Open Source Software, 6(65):3430, 2021b. doi:10.21105/joss.03430. google scholar
  • D. M. Sebert, D. C. Montgomery, and D. A. Rollier. A clustering algorithm for identifying multiple outliers in linear regression. Computational statistics & data analysis, 27(4):461-484, 1998. google scholar
  • S. Van Aelst and P. Rousseeuw. Minimum volume ellipsoid. Wiley Interdisciplinary Reviews: Computational Statistics, 1(1):71-82, 2009. google scholar
  • J. W. Wisnowski, D. C. Montgomery, and J. R. Simpson. A comparative analysis of multiple outlier detection procedures in the linear regression model. Computational statistics & data analysis, 36(3):351-382, 2001. google scholar
  • K. Yu, Z. Lu, and J. Stander. Quantile regression: applications and current research areas. Journal of the Royal Statistical Society: Series D (The Statistician), 52(3):331-350, 2003. google scholar
  • E. K. Zavadskas and Z. Turskis. A new additive ratio assessment (aras) method in multicriteria decision-making, 2010. google scholar
  • E. K. Zavadskas, A. Kaklauskas, and V. Sarka. The new method of multicriteria complex proportional assessment of projects, 1994. google scholar
  • E. K. Zavadskas, Z. Turskis, and J. Antucheviciene. Optimization of weighted aggregated sum product assessment. Electronics and Electrical Engineering, 122(6), jun 2012. doi:10.5755/j01.eee.122.6.1810. google scholar

Comparison of Outlier Detection Methods in Linear Regression: A Multiple-Criteria Decision-Making Approach

Year 2023, Volume: 7 Issue: 2, 333 - 347, 29.12.2023
https://doi.org/10.26650/acin.1327370

Abstract

This paper focuses on the application of a suite of simulation studies to assess wellknown and contemporary outlier detection methods in linear regression. These simulations vary across different parameters, including the number of observations, parameters, levels, and direction of contamination. The recorded final parameter estimates are used to rank the methods using Multiple-criteria decision-making (MCDM) tools. The study reveals that method success varies based on simulation settings. MCDM analysis results indicate a limited set of applicable methods when the contamination structure and level are unknown. Additionally, the most successful methods demand increased computation time, while some alternatives exhibit applicability within shorter durations with median rankings. These findings offer valuable insights for researchers employing regression analysis in scenarios where the underlying model is known, and the possibility of potential outliers exists.

Supporting Institution

-

Project Number

-

Thanks

-

References

  • R. Adnan, H. Setan, and M. N. Mohamad. Identifying multiple outliers in linear regression: Robust fit and clustering approach. In The 10th FIG International Symposium on Deformation Measurements, SESSION X : THEORY OF DEFORMATION ANALYSIS II, pages 380-389, Orange, California, USA, 2000. google scholar
  • S. Barratt, G. Angeris, and S. Boyd. Minimizing a sum of clipped convex functions. Optimization Letters, 14:2443-2459, 2020. google scholar
  • D. A. Belsley, E. Kuh, and R. E. Welsch. Regression diagnostics: Identifying influential data and sources of collinearity. 1980. ISBN 0-471-05856-4. google scholar
  • J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah. Julia: A fresh approach to numerical computing. SIAM review, 59(1):65-98, 2017. doi:10.1137/141000671. google scholar
  • N. Billor and G. Kiral. A comparison of multiple outlier detection methods for regression data. Communications in Statistics—Simulation and Computation®, 37(3):521-545, 2008. google scholar
  • N. Billor, A. S. Hadi, and P. F. Velleman. Bacon: blocked adaptive computationally efficient outlier nominators. Computational statistics & data analysis, 34(3):279-298, 2000. google scholar
  • N. Billor, S. Chatterjee, and A. S. Hadi. A re-weighted least squares method for robust regression estimation. American journal of mathematical and management sciences, 26(3-4):229-252, 2006. google scholar
  • S. Chatterjee and M. Machler. Robust regression: A weighted least squares approach. Communications in Statistics-Theory andMethods, 26(6): 1381-1394, 1997. google scholar
  • K. Deb. Multi-objective evolutionary algorithms. Springer handbook of computational intelligence, pages 995-1015, 2015. google scholar
  • D. Diakoulaki, G. Mavrotas, and L. Papayannakis. Determining objective weights in multiple criteria problems: The critic method. Computers & Operations Research, 22(7):763-770, 1995. doi:10.1016/0305-0548(94)00059-h. google scholar
  • A. S. Hadi and S. Chatterjee. Regression analysis by example. John Wiley & Sons, 2015. google scholar
  • A. S. Hadi and J. S. Simonoff. Procedures for the identification of multiple outliers in linear models. Journal of the American statistical association, 88(424):1264-1272, 1993. google scholar
  • D. M. Hawkins and D. Olive. Applications and algorithms for least trimmed sum of absolute deviations regression. Computational Statistics & Data Analysis, 32(2):119-134, 1999. google scholar
  • L. Huo, T.-H. Kim, and Y. Kim. Robust estimation of covariance and its application to portfolio optimization. Finance Research Letters, 9(3): 121-134, 2012. google scholar
  • C.-L. Hwang and K. Yoon. Methods for Multiple Attribute Decision Making. Springer Berlin Heidelberg, 1981. google scholar
  • F. Kianifard and W. H. Swallow. Using recursive residuals, calculated on adaptively-ordered observations, to identify outliers in linear regression. Biometrics, pages 571-585, 1989. google scholar
  • S. C. Narula, P. H. Saldiva, C. D. Andre, S. N. Elian, A. F. Ferreira, and V. Capelozzi. The minimum sum of absolute errors regression: a robust alternative to the least squares regression. Statistics in medicine, 18(11):1401-1417, 1999. google scholar
  • S. Opricovic. Multicriteria optimization of civil engineering systems, 1998. google scholar
  • S. Opricovic and G.-H. Tzeng. Multicriteria planning of post-earthquake sustainable reconstruction. Computer-Aided Civil and Infrastructure Engineering, 17(3):211-220, may 2002. doi:10.1111/1467-8667.00269. google scholar
  • D. Pena and V. J. Yohai. The detection of influential subsets in linear regression by using an influence matrix. Journal of the Royal Statistical Society: Series B (Methodological), 57(1):145-156, 1995. google scholar
  • A. Rahmatullah Imon. Identifying multiple influential observations in linear regression. Journal of Applied statistics, 32(9):929-946, 2005. google scholar
  • P. J. Rousseeuw. Least median of squares regression. Journal of the American statistical association, 79(388):871-880, 1984. google scholar
  • P. J. Rousseeuw and K. V. Driessen. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3):212-223, 1999. google scholar
  • P. J. Rousseeuw and K. Van Driessen. Computing lts regression for large data sets. Data mining and knowledge discovery, 12:29-45, 2006. google scholar
  • M. H. Satman. A new algorithm for detecting outliers in linear regression. International Journal of statistics and Probability, 2(3):101, 2013. google scholar
  • M. H. Satman. Fast online detection of outliers using least-trimmed squares regression with non-dominated sorting based initial subsets. International Journal of Advanced Statistics and Probability, 3(1):53, 2015. google scholar
  • M. H. Satman, S. Adiga, G. Angeris, and E. Akadal. Linregoutliers: A julia package for detecting outliers in linear regression. Journal of Open Source Software, 6(57):2892, 2021a. doi:10.21105/joss.02892. google scholar
  • M. H. Satman, B. F. Yıldırım, and E. Kuruca. Jmcdm: A julia package for multiple-criteria decision-making tools. Journal of Open Source Software, 6(65):3430, 2021b. doi:10.21105/joss.03430. google scholar
  • D. M. Sebert, D. C. Montgomery, and D. A. Rollier. A clustering algorithm for identifying multiple outliers in linear regression. Computational statistics & data analysis, 27(4):461-484, 1998. google scholar
  • S. Van Aelst and P. Rousseeuw. Minimum volume ellipsoid. Wiley Interdisciplinary Reviews: Computational Statistics, 1(1):71-82, 2009. google scholar
  • J. W. Wisnowski, D. C. Montgomery, and J. R. Simpson. A comparative analysis of multiple outlier detection procedures in the linear regression model. Computational statistics & data analysis, 36(3):351-382, 2001. google scholar
  • K. Yu, Z. Lu, and J. Stander. Quantile regression: applications and current research areas. Journal of the Royal Statistical Society: Series D (The Statistician), 52(3):331-350, 2003. google scholar
  • E. K. Zavadskas and Z. Turskis. A new additive ratio assessment (aras) method in multicriteria decision-making, 2010. google scholar
  • E. K. Zavadskas, A. Kaklauskas, and V. Sarka. The new method of multicriteria complex proportional assessment of projects, 1994. google scholar
  • E. K. Zavadskas, Z. Turskis, and J. Antucheviciene. Optimization of weighted aggregated sum product assessment. Electronics and Electrical Engineering, 122(6), jun 2012. doi:10.5755/j01.eee.122.6.1810. google scholar
There are 35 citations in total.

Details

Primary Language English
Subjects Data Mining and Knowledge Discovery, Statistical Data Science
Journal Section Research Article
Authors

Mehmet Hakan Satman 0000-0002-9402-1982

Project Number -
Publication Date December 29, 2023
Submission Date July 14, 2023
Published in Issue Year 2023 Volume: 7 Issue: 2

Cite

APA Satman, M. H. (2023). Comparison of Outlier Detection Methods in Linear Regression: A Multiple-Criteria Decision-Making Approach. Acta Infologica, 7(2), 333-347. https://doi.org/10.26650/acin.1327370
AMA Satman MH. Comparison of Outlier Detection Methods in Linear Regression: A Multiple-Criteria Decision-Making Approach. ACIN. December 2023;7(2):333-347. doi:10.26650/acin.1327370
Chicago Satman, Mehmet Hakan. “Comparison of Outlier Detection Methods in Linear Regression: A Multiple-Criteria Decision-Making Approach”. Acta Infologica 7, no. 2 (December 2023): 333-47. https://doi.org/10.26650/acin.1327370.
EndNote Satman MH (December 1, 2023) Comparison of Outlier Detection Methods in Linear Regression: A Multiple-Criteria Decision-Making Approach. Acta Infologica 7 2 333–347.
IEEE M. H. Satman, “Comparison of Outlier Detection Methods in Linear Regression: A Multiple-Criteria Decision-Making Approach”, ACIN, vol. 7, no. 2, pp. 333–347, 2023, doi: 10.26650/acin.1327370.
ISNAD Satman, Mehmet Hakan. “Comparison of Outlier Detection Methods in Linear Regression: A Multiple-Criteria Decision-Making Approach”. Acta Infologica 7/2 (December 2023), 333-347. https://doi.org/10.26650/acin.1327370.
JAMA Satman MH. Comparison of Outlier Detection Methods in Linear Regression: A Multiple-Criteria Decision-Making Approach. ACIN. 2023;7:333–347.
MLA Satman, Mehmet Hakan. “Comparison of Outlier Detection Methods in Linear Regression: A Multiple-Criteria Decision-Making Approach”. Acta Infologica, vol. 7, no. 2, 2023, pp. 333-47, doi:10.26650/acin.1327370.
Vancouver Satman MH. Comparison of Outlier Detection Methods in Linear Regression: A Multiple-Criteria Decision-Making Approach. ACIN. 2023;7(2):333-47.