Research Article
BibTex RIS Cite

ÇOK DEĞİŞKENLİ AYKIRI DEĞER TESPİTİ İÇİN KLASİK VE DAYANIKLI MAHALANOBİS UZAKLIK ÖLÇÜTLERİ: FİNANSAL VERİ İLE BİR UYGULAMA

Year 2019, Issue: 25, 267 - 282, 25.10.2019
https://doi.org/10.18092/ulikidince.579570

Abstract

Çok değişkenli veri setlerinde
aykırı değerlerin varlığı anakütle parametre tahminini zorlaştırmakta ve hata
varyansını arttırarak kullanılan istatistiki testin gücünü azaltmaktadır. Bu
durum, değişkenlerin eşit varyansa ve çok değişkenli normal dağılıma sahip
olduğu varsayımlarından sapmalara sebep olmaktadır. Çok değişkenli aykırı değer
tespitinde kullanılan tekniklerden biri olan Mahalanobis uzaklığı, aykırı
değişkenlere karşı hassas ölçütler olan çok değişkenli ortalamalar ve kovaryans
matrisine dayalı olarak hesaplanmakta; çok değişkenli veri setlerinde aykırı
gözlemlerin tespitinin engellenmesi veya normal gözlemlerin aykırı gözlem
olarak tespit edilmesi problemlerine karşı dayanıklı ölçütlerle de
kullanılmaktadır. Bu çalışmada, çok değişkenli aykırı değer tespitinde
kullanılan klasik ve dayanıklı Mahalanobis ölçütlerinin aykırı gözlem
tespitlerinin karşılaştırılması amaçlanmıştır. Uygulama verisi olarak, Ocak
2013 – Aralık 2017 döneminde New York ve NASDAQ borsasında yatırımcılar
tarafından gerçekleştirilen 1.239.507 adet hisse senedi alım ve satım işlemi
kullanılmıştır. Aykırı işlemlerin tespitinde miktar ve hacim değişkenleri ele
alınarak, her bir işlem için klasik ve dayanıklı ölçütlere dayalı uzaklık
skorları hesaplanarak, söz konusu teknikler karşılaştırılmıştır. Çalışma
sonucunda, klasik Mahalanobis ölçütü ve En Küçük Hacimli Elipsoid ile tespit
edilemeyen maskelenmiş aykırı gözlemlerin, Hızlı Minimum Kovaryans Determinant
yöntemiyle tespit edilmiş olduğu; söz konusu yöntemin finans uygulama alanında
çok değişkenli veri setlerinde aykırı gözlemlerin tespiti için kullanılabilecek
etkin bir yöntem olduğu sonucuna ulaşılmıştır.   

References

  • Aggarwal, Charu C., Outlier Analysis, Springer, 2013.
  • Arteaga, T.G., Alcantud, J.C.R., Calle, R.A. (2016). A cardinal dissensus measure based on the Mahalanobis distance, European Journal of Operational Research, 251(2), 575-585.
  • Carminati, M., Caron, R., Maggi, F., Epifani, I., Zanero, S. (2015). BankSealer: A decision support system for online banking fraud analysis and investigation, Computers & Security, 53, 175-186.
  • Carrato, R.G.H. (2018). Wind farm monitoring using Mahalanobis distance and fuzzy clustering, Renewable Energy, 123(C), 526-540.
  • Chang, C.C. (2012). A boosting approach for supervised Mahalanobis distance metric learning, Pattern Recognition, 45(2), 844-862.
  • Cheng, T. C. & Victoria-Feser, M. P. (2002). High breakdown estimation of multivariate mean and covariance with missing observations, British Journal of Mathematical and Statistical Psychology, 55, 317-335.
  • Cho, S., Hong, H., Ha, B.C. (2010). A hybrid approach based on the combination of variable selection using decision trees and case-based reasoning using the Mahalanobis distance: For bankruptcy prediction, Expert Systems with Applications, 37(4), 3482-3488.
  • Coakley, C. W., Hettmansperger, T. P. (1993). A Bounded Influence, High Breakdown, Efficient Regression Estimator, Journal of the American Statistical Association, 88, 872-880.
  • Daszykowski, M., Kaczmarek, K., Vander Heyden, Y., & Walczak, B. (2007). Robust statistics in data analysis – a review: basic concepts. Chemometrics and Intelligent Laboratory Systems, 85(2), 203–219.
  • Domingues, R., Filippone, M., Michiardi, P., Zouaoui, J. (2018). A comparative evaluation of outlier detection algorithms: Experiments and analyses, Pattern Recognition, 74, 406-421.
  • Fauvel, M., Chanussot, J., Benediktsson, J.A., Villa, A. (2013). Parsimonious Mahalanobis kernel for the classification of high dimensional data, Pattern Recognition, 46(3), 845-854.
  • Haldar, N., Khan, F., Ali, A., Abbas, H. (2016). Arrhythmia Classification using Mahalanobis Distance based Improved Fuzzy C-Means Clustering for Mobile Health Monitoring Systems. Neurocomputing, 220, 221-235.
  • Hardin, J. & Rocke, D.M. (2005). The Distributions of Robust Distances, Journal of Computational and Graphical Statistics, 14(4), 1-19.
  • Hawkins, D. (1980). Identification of Outliers, Chapman and Hall, 1980.
  • Hawkins, D.M., & Olive, D.J. (1999). Improved feasible solution algorithm for high breakdown estimation. Computational Statistics and Data Analysis, 30, 1-11.
  • Hodge, Victoria J., Austin, J. (2004). A Survey of Outlier Detection Methodologies, Artificial Intelligence Review, 22(2), 85-126.
  • Hubert, M. & Debruyne, M. (2010). Minimum Covariance Determinant, Computational Statistics, 2(1), 36-43.
  • Jaffel, I., Taouali, O., Faouzi Harkat, M., Messaoud, H. (2015). A Fault Detection Index Using Principal Component Analysis And Mahalanobis Distance, IFAC-PapersOnLine, 48(21), 1397-1401.
  • Johnson, R. A. & Wichern, D. W. (2002). Applied Multivariate Statistical Analysis (5. Baskı). Prentice Hall, Upper Saddle River, NJ.
  • Ke, T., Lv, H., Sun, M., Zhang, L. (2018). A biased least squares support vector machine based on Mahalanobis distance for PU learning, Physica A: Statistical Mechanics and its Applications, 509, 422-438.
  • Leys, C., Klein, O., Dominicy, Y., Ley, C. (2018). Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance, Journal of Experimental Social Psychology, 74, 150-156.
  • Melnykov, I. & Melnykov, V. (2014). On K-means algorithm with the use of Mahalanobis distances, Statistics & Probability Letters, 84, 88-95.
  • Nguyen, B., Morell, C., Baets, B.D. (2018). Distance metric learning for ordinal classification based on triplet constraints, Knowledge-Based Systems, 142, 17-28.
  • Pompella, M. & Dicanio, A. (2017). Ratings based Inference and Credit Risk: Detecting likely-to-fail Banks with the PC-Mahalanobis Method, Economic Modelling, 67, 34-44.
  • Pozzolo, A.D., Caelen, O., Borgne, Y.L., Waterschoot, S., Bontempi, G. (2014). Learned lessons in credit card fraud detection from a practitioner perspective, Expert Systems with Applications, 41(10), 4915-4928.
  • Qiu, Z., Zhou, B., Yuan, J. (2017). Protein–protein interaction site predictions with minimum covariance determinant and Mahalanobis distance, Journal of Theoretical Biology, 433, 57-63.
  • Rocke, D. M., Woodruff, D. L. (1996). Identification of Outliers in Multivariate Data, Journal of the American Statistical Association, 91, 1047-1061. Rousseeuw, P.J. (1985). Multivariate Estimation With High Breakdown Point, Mathematical Statistics and Applications, 1, 283-297.
  • Rousseeuw, P.J. & Leroy, A.M. (1987). Robust Regression & Outlier Detection, Wiley&Sons, New Jersey.
  • Rousseeuw, P. J. & Zomeren, B. C. V. (1990). Unmasking Multivariate Outliers and Leverage Points, Journal of the American Statistical Association, 185(411), 633-634 Rousseeuw, P.J. & Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41(3), 212-223.
  • Shang, J., Chen, M.Y., Zhang, H. (2018). Fault detection based on augmented kernel Mahalanobis distance for nonlinear dynamic processes, Computers & Chemical Engineering, 109, 311-312
  • Shulgin, S., Zinkina, J., Korotayev, A., Andreev, A. (2017). “Neighbors in values”: A new dataset of cultural distances between countries based on individuals’ values, and its application to the study of global trade, Research in International Business and Finance, 42, 966-985.
  • Stöckl, S. & Hanke, M. (2014). Financial Applications of the Mahalanobis Distance, Applied Economics and Finance, 1(2), 78-84.
  • Suo, M., Zhu, B., Zhang, Y., An, R., Li, S. (2018). Fuzzy Bayes risk based on Mahalanobis distance and Gaussian kernel for weight assignment in labeled multiple attribute decision making, Journal of Knowledge-Based Systems, 152(C), 26-39.
  • Thode, H.C. (2002). Testing for Normality, Marcel Dekker, New York.
  • Wang, P.C., Su, C.T., Chen, K.H., Chen, N.H. (2011). The application of rough set and Mahalanobis distance to enhance the quality of OSA diagnosis, Expert Systems with Applications, 38(6), 7828-7836,
  • Wang, Q., Wan, J., Yuan, Y. (2018). Locality constraint distance metric learning for traffic congestion detection, Pattern Recognition, 75, 272-281.
  • Warren, R. Smith, R., Cybenko, A. (2011). Use Of Mahalanobis Distance For Detecting Outliers And Outlier Clusters In Markedly Non-Normal Data: A Vehicular Traffic Example, Air Force Research Laboratory Human Effectiveness Directorate Report, 1-52.
  • Willems, G., Joe, H., Zamar, R. (2009). Diagnosing Multivariate Outliers Detected by Robust Estimators, Journal of Computational and Graphical Statistics, 18(1), 73-91
  • Xiang, S., Nie, F., Zhang, C. (2008). Learning a Mahalanobis distance metric for data clustering and classification, Pattern Recognition, 41(12), 3600-3612,
  • Vukovic, O. (2015). Analysing Bank Real Estate Portfolio Management by Using Impulse Response Function, Mahalanobis Distance and Financial Turbulence, Procedia Economics and Finance, 30, 932-938.

CLASSICAL AND ROBUST MAHALANOBIS DISTANCE MEASURES FOR OUTLIER DETECTION: AN APPLICATION IN STOCK EXCHANGES

Year 2019, Issue: 25, 267 - 282, 25.10.2019
https://doi.org/10.18092/ulikidince.579570

Abstract

The existence of outliers in multivariate data sets contaminates the
parameter estimations and reduces the power of the statistical test by
increasing the variance of the errors. This situation leads to deviations from
the assumptions that the variables have equal variance and multivariate normal
distribution. Mahalanobis distance is one of the techniques frequently used in
multivariate outliers and it is calculated on the basis of multivariate
location and covariance matrix, which are sensitive measures against outliers.
In addition, due to the problems such as misidentification of a normal
observation as an outlier and the presence of masking of an outlier, robust
measures have been used. In this study, it is aimed to compare the performance
of classical and robust Mahalanobis measures. 1.239.507 stock transactions
executed by investors between the periods of January 2013 - December 2017 in
New York Stock Exchange and NASDAQ are used for analysis. In order to determine
outlying transactions, volume and value of trade have been analysed.
Mahalanobis distances based on classical and robust measures have been
calculated for each transaction and the measures are compared. As a result, the
masked observations which cannot be detected by classical and robust Minimum
Volume Ellipsoid measures, have been detected as outlying by Fast - Minimum
Covariance Determinant (Fast MCD) measure. It has been concluded that Fast MCD
can be used as an efficient estimator of multivariate location and scatter in
presence of masked data for multivariate datasets in financial applications. 

References

  • Aggarwal, Charu C., Outlier Analysis, Springer, 2013.
  • Arteaga, T.G., Alcantud, J.C.R., Calle, R.A. (2016). A cardinal dissensus measure based on the Mahalanobis distance, European Journal of Operational Research, 251(2), 575-585.
  • Carminati, M., Caron, R., Maggi, F., Epifani, I., Zanero, S. (2015). BankSealer: A decision support system for online banking fraud analysis and investigation, Computers & Security, 53, 175-186.
  • Carrato, R.G.H. (2018). Wind farm monitoring using Mahalanobis distance and fuzzy clustering, Renewable Energy, 123(C), 526-540.
  • Chang, C.C. (2012). A boosting approach for supervised Mahalanobis distance metric learning, Pattern Recognition, 45(2), 844-862.
  • Cheng, T. C. & Victoria-Feser, M. P. (2002). High breakdown estimation of multivariate mean and covariance with missing observations, British Journal of Mathematical and Statistical Psychology, 55, 317-335.
  • Cho, S., Hong, H., Ha, B.C. (2010). A hybrid approach based on the combination of variable selection using decision trees and case-based reasoning using the Mahalanobis distance: For bankruptcy prediction, Expert Systems with Applications, 37(4), 3482-3488.
  • Coakley, C. W., Hettmansperger, T. P. (1993). A Bounded Influence, High Breakdown, Efficient Regression Estimator, Journal of the American Statistical Association, 88, 872-880.
  • Daszykowski, M., Kaczmarek, K., Vander Heyden, Y., & Walczak, B. (2007). Robust statistics in data analysis – a review: basic concepts. Chemometrics and Intelligent Laboratory Systems, 85(2), 203–219.
  • Domingues, R., Filippone, M., Michiardi, P., Zouaoui, J. (2018). A comparative evaluation of outlier detection algorithms: Experiments and analyses, Pattern Recognition, 74, 406-421.
  • Fauvel, M., Chanussot, J., Benediktsson, J.A., Villa, A. (2013). Parsimonious Mahalanobis kernel for the classification of high dimensional data, Pattern Recognition, 46(3), 845-854.
  • Haldar, N., Khan, F., Ali, A., Abbas, H. (2016). Arrhythmia Classification using Mahalanobis Distance based Improved Fuzzy C-Means Clustering for Mobile Health Monitoring Systems. Neurocomputing, 220, 221-235.
  • Hardin, J. & Rocke, D.M. (2005). The Distributions of Robust Distances, Journal of Computational and Graphical Statistics, 14(4), 1-19.
  • Hawkins, D. (1980). Identification of Outliers, Chapman and Hall, 1980.
  • Hawkins, D.M., & Olive, D.J. (1999). Improved feasible solution algorithm for high breakdown estimation. Computational Statistics and Data Analysis, 30, 1-11.
  • Hodge, Victoria J., Austin, J. (2004). A Survey of Outlier Detection Methodologies, Artificial Intelligence Review, 22(2), 85-126.
  • Hubert, M. & Debruyne, M. (2010). Minimum Covariance Determinant, Computational Statistics, 2(1), 36-43.
  • Jaffel, I., Taouali, O., Faouzi Harkat, M., Messaoud, H. (2015). A Fault Detection Index Using Principal Component Analysis And Mahalanobis Distance, IFAC-PapersOnLine, 48(21), 1397-1401.
  • Johnson, R. A. & Wichern, D. W. (2002). Applied Multivariate Statistical Analysis (5. Baskı). Prentice Hall, Upper Saddle River, NJ.
  • Ke, T., Lv, H., Sun, M., Zhang, L. (2018). A biased least squares support vector machine based on Mahalanobis distance for PU learning, Physica A: Statistical Mechanics and its Applications, 509, 422-438.
  • Leys, C., Klein, O., Dominicy, Y., Ley, C. (2018). Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance, Journal of Experimental Social Psychology, 74, 150-156.
  • Melnykov, I. & Melnykov, V. (2014). On K-means algorithm with the use of Mahalanobis distances, Statistics & Probability Letters, 84, 88-95.
  • Nguyen, B., Morell, C., Baets, B.D. (2018). Distance metric learning for ordinal classification based on triplet constraints, Knowledge-Based Systems, 142, 17-28.
  • Pompella, M. & Dicanio, A. (2017). Ratings based Inference and Credit Risk: Detecting likely-to-fail Banks with the PC-Mahalanobis Method, Economic Modelling, 67, 34-44.
  • Pozzolo, A.D., Caelen, O., Borgne, Y.L., Waterschoot, S., Bontempi, G. (2014). Learned lessons in credit card fraud detection from a practitioner perspective, Expert Systems with Applications, 41(10), 4915-4928.
  • Qiu, Z., Zhou, B., Yuan, J. (2017). Protein–protein interaction site predictions with minimum covariance determinant and Mahalanobis distance, Journal of Theoretical Biology, 433, 57-63.
  • Rocke, D. M., Woodruff, D. L. (1996). Identification of Outliers in Multivariate Data, Journal of the American Statistical Association, 91, 1047-1061. Rousseeuw, P.J. (1985). Multivariate Estimation With High Breakdown Point, Mathematical Statistics and Applications, 1, 283-297.
  • Rousseeuw, P.J. & Leroy, A.M. (1987). Robust Regression & Outlier Detection, Wiley&Sons, New Jersey.
  • Rousseeuw, P. J. & Zomeren, B. C. V. (1990). Unmasking Multivariate Outliers and Leverage Points, Journal of the American Statistical Association, 185(411), 633-634 Rousseeuw, P.J. & Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41(3), 212-223.
  • Shang, J., Chen, M.Y., Zhang, H. (2018). Fault detection based on augmented kernel Mahalanobis distance for nonlinear dynamic processes, Computers & Chemical Engineering, 109, 311-312
  • Shulgin, S., Zinkina, J., Korotayev, A., Andreev, A. (2017). “Neighbors in values”: A new dataset of cultural distances between countries based on individuals’ values, and its application to the study of global trade, Research in International Business and Finance, 42, 966-985.
  • Stöckl, S. & Hanke, M. (2014). Financial Applications of the Mahalanobis Distance, Applied Economics and Finance, 1(2), 78-84.
  • Suo, M., Zhu, B., Zhang, Y., An, R., Li, S. (2018). Fuzzy Bayes risk based on Mahalanobis distance and Gaussian kernel for weight assignment in labeled multiple attribute decision making, Journal of Knowledge-Based Systems, 152(C), 26-39.
  • Thode, H.C. (2002). Testing for Normality, Marcel Dekker, New York.
  • Wang, P.C., Su, C.T., Chen, K.H., Chen, N.H. (2011). The application of rough set and Mahalanobis distance to enhance the quality of OSA diagnosis, Expert Systems with Applications, 38(6), 7828-7836,
  • Wang, Q., Wan, J., Yuan, Y. (2018). Locality constraint distance metric learning for traffic congestion detection, Pattern Recognition, 75, 272-281.
  • Warren, R. Smith, R., Cybenko, A. (2011). Use Of Mahalanobis Distance For Detecting Outliers And Outlier Clusters In Markedly Non-Normal Data: A Vehicular Traffic Example, Air Force Research Laboratory Human Effectiveness Directorate Report, 1-52.
  • Willems, G., Joe, H., Zamar, R. (2009). Diagnosing Multivariate Outliers Detected by Robust Estimators, Journal of Computational and Graphical Statistics, 18(1), 73-91
  • Xiang, S., Nie, F., Zhang, C. (2008). Learning a Mahalanobis distance metric for data clustering and classification, Pattern Recognition, 41(12), 3600-3612,
  • Vukovic, O. (2015). Analysing Bank Real Estate Portfolio Management by Using Impulse Response Function, Mahalanobis Distance and Financial Turbulence, Procedia Economics and Finance, 30, 932-938.
There are 40 citations in total.

Details

Primary Language Turkish
Journal Section Articles
Authors

M.fevzi Esen

Mehpare Tımor

Publication Date October 25, 2019
Published in Issue Year 2019 Issue: 25

Cite

APA Esen, M., & Tımor, M. (2019). ÇOK DEĞİŞKENLİ AYKIRI DEĞER TESPİTİ İÇİN KLASİK VE DAYANIKLI MAHALANOBİS UZAKLIK ÖLÇÜTLERİ: FİNANSAL VERİ İLE BİR UYGULAMA. Uluslararası İktisadi Ve İdari İncelemeler Dergisi(25), 267-282. https://doi.org/10.18092/ulikidince.579570

______________________________________________________

Address: Karadeniz Technical University Department of Economics Room Number 213  

61080 Trabzon / Turkey

e-mail : uiiidergisi@gmail.com