Research Article
BibTex RIS Cite

Year 2022, Volume: 14 Issue: 2, 87 - 96, 31.12.2022

Abstract

References

  • Bakar, Z.A., Mohemad, R., Ahmad, A. and Deris, M.M. (2006). A comparative study for outlier detection techniques in data mining. In 2006 IEEE Conference on Cybernetics and Intelligent Systems IEEE, 1-6.
  • Barnett, V. and Lewis, T. (1994). Outliers in Statistical Data. Wiley, Great Britain.
  • Campulova, M., Michalek, J. and Moucka, J. (2019). Generalised linear model-based algorithm for detection of outliers in environmental data and comparison with semi-parametric outlier detection methods. Atmospheric Pollution Research, 10(4), 1015-1023.
  • Cook, R.D. (1979). Influential observations in linear regression, Journal of the American Statistical Association, 74, 1691-74.
  • Cook, R.D. and Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall, New York.
  • Daneshgar, A., Javadi, R. and Razavi, S.S. (2013). Clustering and outlier detection using isoperimetric number of trees. Pattern Recognition, 46(12), 3371-3382.
  • Ester, M., Kriegel, H.P., Sander, J. and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD’96 Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 96, 226–231.
  • Gan, G., Ma, C. and Wu, J. (2020). Data Clustering: Theory, Algorithms, and Applications. Philadelphia, PA, USA SIAM Press.
  • Hadi, A.S. and Simonoff, J.S. (1993). Procedures for the identification of multiple outliers in linear models. Journal of the American Statistical Association, 88, 1264-1272.
  • Hahsler, M., Piekenbrock, M. and Doran, D. (2019). dbscan: Fast density-based clustering with R. Journal of Statistical Software, 91(1), 1-30.
  • Hastie, T., Tibshirani, R. and Friedman, J.H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York.
  • Huang, J., Zhu, Q., Yang, L., Cheng, D. and Wu, Q. (2017). A novel outlier cluster detection algorithm without top-n parameter. Knowledge-Based Systems, 121, 32-40.
  • Huber, P.J. (1977). Robust covariances. In Statistical Decision Theory and Related Topics, 165-191.
  • Kima, S.S., Parkb, S.H. and Krzanowskic, W.J. (1974). Simultaneous variable selection and outlier identification in linear regression using the mean-shift outlier model. Journal of Applied Statistics, 35(3), 283–291.
  • Montgomery, D.C. and Peck, E.A. (1992). Introduction to Linear Regression Analysis. John Wiley & Sons, New York.
  • Rao, C.R., Toutenburg, H. and Fieger, A. (1999). Linear Models: Least Squares and Alternatives. Second edition, Springer.
  • Rencher, A.C. (2000). Linear Models in Statistics. John Wiley & Sons, New York.
  • Rousseeuw, P.J. and Leroy, A.M. (1987). Robust Regression and Outlier Detection. John Wiley & Sons, New York.
  • SAS Customer Support. http://support.sas.com/
  • Taylan, P., Yerlikaya-Ozkurt, F. and Weber, G.W. (2014). An approach to the mean shift outlier model ¨ by Tikhonov regularization and conic programming. Intelligent Data Analysis, 18(1), 79-94.
  • Wang, Y.F., Jiong, Y., Su, G.P. and Qian, Y.R. (2019). A new outlier detection method based on OPTICS. Sustainable Cities and Society, 45, 197-212.
  • Xia, J., Gao, L., Kong, K., Zhao, Y., Chen, Y., Kui, X. and Liang, Y. (2018). Exploring linear projections for revealing clusters, outliers, and trends in subsets of multi-dimensional datasets. Journal of Visual Languages and Computing, 48, 52-60.
  • Xu, X., Liu, H., Li, L. and Yao, M. (2018). A comparison of outlier detection techniques for highdimensional data. International Journal of Computational Intelligence Systems, 11(1), 652-662.
  • Xu, R. and Wunsch, D. (2008). Clustering. John Wiley & Sons, New Jersey

A New Computational Approach Based on Density Clustering for Outlier Problems in Linear Models

Year 2022, Volume: 14 Issue: 2, 87 - 96, 31.12.2022

Abstract

Recently, collection of huge amount of data and analysis of that much data have vital importance for human activities in many different application areas. Advanced statistical methods play crucial role for
modeling of such data when the data contains outliers. Although there are number of outlier detection methods for revealing outlier observations in data, most of them may not be reasonable and appropriate for prediction purposes due to structural and requirements of modeling. In this study, density based clustering algorithm named Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is considered in order to detect the location of outlier observations effectively with respect to form of the model for given data set. Based on obtained results, the Mean Shift Outlier Model (MSOM) is constructed as a robust linear model. This newly proposed computational approach based on DBSCAN uses power of data clustering and also minimize the impact of the outlier observations by MSOM. The numerical examples are also presented to reveal the performance of the proposed approach in this study.

References

  • Bakar, Z.A., Mohemad, R., Ahmad, A. and Deris, M.M. (2006). A comparative study for outlier detection techniques in data mining. In 2006 IEEE Conference on Cybernetics and Intelligent Systems IEEE, 1-6.
  • Barnett, V. and Lewis, T. (1994). Outliers in Statistical Data. Wiley, Great Britain.
  • Campulova, M., Michalek, J. and Moucka, J. (2019). Generalised linear model-based algorithm for detection of outliers in environmental data and comparison with semi-parametric outlier detection methods. Atmospheric Pollution Research, 10(4), 1015-1023.
  • Cook, R.D. (1979). Influential observations in linear regression, Journal of the American Statistical Association, 74, 1691-74.
  • Cook, R.D. and Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall, New York.
  • Daneshgar, A., Javadi, R. and Razavi, S.S. (2013). Clustering and outlier detection using isoperimetric number of trees. Pattern Recognition, 46(12), 3371-3382.
  • Ester, M., Kriegel, H.P., Sander, J. and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD’96 Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 96, 226–231.
  • Gan, G., Ma, C. and Wu, J. (2020). Data Clustering: Theory, Algorithms, and Applications. Philadelphia, PA, USA SIAM Press.
  • Hadi, A.S. and Simonoff, J.S. (1993). Procedures for the identification of multiple outliers in linear models. Journal of the American Statistical Association, 88, 1264-1272.
  • Hahsler, M., Piekenbrock, M. and Doran, D. (2019). dbscan: Fast density-based clustering with R. Journal of Statistical Software, 91(1), 1-30.
  • Hastie, T., Tibshirani, R. and Friedman, J.H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York.
  • Huang, J., Zhu, Q., Yang, L., Cheng, D. and Wu, Q. (2017). A novel outlier cluster detection algorithm without top-n parameter. Knowledge-Based Systems, 121, 32-40.
  • Huber, P.J. (1977). Robust covariances. In Statistical Decision Theory and Related Topics, 165-191.
  • Kima, S.S., Parkb, S.H. and Krzanowskic, W.J. (1974). Simultaneous variable selection and outlier identification in linear regression using the mean-shift outlier model. Journal of Applied Statistics, 35(3), 283–291.
  • Montgomery, D.C. and Peck, E.A. (1992). Introduction to Linear Regression Analysis. John Wiley & Sons, New York.
  • Rao, C.R., Toutenburg, H. and Fieger, A. (1999). Linear Models: Least Squares and Alternatives. Second edition, Springer.
  • Rencher, A.C. (2000). Linear Models in Statistics. John Wiley & Sons, New York.
  • Rousseeuw, P.J. and Leroy, A.M. (1987). Robust Regression and Outlier Detection. John Wiley & Sons, New York.
  • SAS Customer Support. http://support.sas.com/
  • Taylan, P., Yerlikaya-Ozkurt, F. and Weber, G.W. (2014). An approach to the mean shift outlier model ¨ by Tikhonov regularization and conic programming. Intelligent Data Analysis, 18(1), 79-94.
  • Wang, Y.F., Jiong, Y., Su, G.P. and Qian, Y.R. (2019). A new outlier detection method based on OPTICS. Sustainable Cities and Society, 45, 197-212.
  • Xia, J., Gao, L., Kong, K., Zhao, Y., Chen, Y., Kui, X. and Liang, Y. (2018). Exploring linear projections for revealing clusters, outliers, and trends in subsets of multi-dimensional datasets. Journal of Visual Languages and Computing, 48, 52-60.
  • Xu, X., Liu, H., Li, L. and Yao, M. (2018). A comparison of outlier detection techniques for highdimensional data. International Journal of Computational Intelligence Systems, 11(1), 652-662.
  • Xu, R. and Wunsch, D. (2008). Clustering. John Wiley & Sons, New Jersey
There are 24 citations in total.

Details

Primary Language English
Subjects Mathematical Sciences
Journal Section Research Article
Authors

Fatma Yerlikaya Özkurt

Publication Date December 31, 2022
Acceptance Date July 27, 2022
Published in Issue Year 2022 Volume: 14 Issue: 2

Cite

APA Yerlikaya Özkurt, F. (2022). A New Computational Approach Based on Density Clustering for Outlier Problems in Linear Models. Istatistik Journal of The Turkish Statistical Association, 14(2), 87-96.
AMA Yerlikaya Özkurt F. A New Computational Approach Based on Density Clustering for Outlier Problems in Linear Models. IJTSA. December 2022;14(2):87-96.
Chicago Yerlikaya Özkurt, Fatma. “A New Computational Approach Based on Density Clustering for Outlier Problems in Linear Models”. Istatistik Journal of The Turkish Statistical Association 14, no. 2 (December 2022): 87-96.
EndNote Yerlikaya Özkurt F (December 1, 2022) A New Computational Approach Based on Density Clustering for Outlier Problems in Linear Models. Istatistik Journal of The Turkish Statistical Association 14 2 87–96.
IEEE F. Yerlikaya Özkurt, “A New Computational Approach Based on Density Clustering for Outlier Problems in Linear Models”, IJTSA, vol. 14, no. 2, pp. 87–96, 2022.
ISNAD Yerlikaya Özkurt, Fatma. “A New Computational Approach Based on Density Clustering for Outlier Problems in Linear Models”. Istatistik Journal of The Turkish Statistical Association 14/2 (December2022), 87-96.
JAMA Yerlikaya Özkurt F. A New Computational Approach Based on Density Clustering for Outlier Problems in Linear Models. IJTSA. 2022;14:87–96.
MLA Yerlikaya Özkurt, Fatma. “A New Computational Approach Based on Density Clustering for Outlier Problems in Linear Models”. Istatistik Journal of The Turkish Statistical Association, vol. 14, no. 2, 2022, pp. 87-96.
Vancouver Yerlikaya Özkurt F. A New Computational Approach Based on Density Clustering for Outlier Problems in Linear Models. IJTSA. 2022;14(2):87-96.