A new binary classifier robust on noisy domains based on kNN algorithm

Müge Acar

doi:10.17671/gazibtd.1534334

Research Article

A new binary classifier robust on noisy domains based on kNN algorithm

Year 2024, , 309 - 321, 31.10.2024

Müge Acar

https://doi.org/10.17671/gazibtd.1534334

Abstract

Classification is an effective technique commonly used in data analysis by systematically arranging groups or categories according to established criteria. The classifier's success relies on the classifier itself and the quality of the data. However, in real-world applications, it is inevitable for datasets to contain mislabeled instances, which may cause misclassification challenges that classifiers have to handle. This study aims for a quantitative assessment of the classification of noisy data through a new kNN-based classification algorithm and to increase the performance of classical kNN by efficiently classifying the data. We perform various numerical experiments on real-world data sets to prove our new algorithm's performance. We obtain high standards of accuracy levels on various noisy datasets. We propose that this new technique can provide high standard accuracy levels in binary classification problems. We compared the new kNN and classical kNN algorithms in various noise levels (10%, 20%, 30%, and 40%) on distinct datasets by measuring in terms of test accuracy. Also, we compared our new algorithm with popular classification algorithms and in the vast majority, we obtained better test accuracy results.

Keywords

Binary classification, Noisy data, Data Mining

References

Bootkrajang J. “A generalised label noise model for classification in the presence of annotation errors.” Neurocomputing, 192, 61-71, 2016.
Garcia LP, De Carvalho AC, Lorena AC. “Effect of label noise in the complexity of classification problems.” Neurocomputing, 160, 108-119, 2015.
Sáez JA, Galar M, Luengo, J, Herrera, F. “Tackling the problem of classification with noisy data using multiple classifier systems: analysis of the performance and robustness.” Information Sciences, 247, 1-20, 2013.
Sáez, JA, Corchado, E. “ANCES: A novel method to repair attribute noise in classification problems.” Pattern Recognition, 121, 108198, 2022
Zhu X, Wu X. “Class noise vs. attribute noise: A quantitative study.” Artificial Intelligence Review, 22(3), 177-210, 2004.
Sluban B, Lavrač, N. “Relating ensemble diversity and performance: A study in class noise detection.” Neurocomputing, 160, 120-131, 2015.
Luengo J, Sánchez-Tarragó D, Prati RC, Herrera F. “Multiple instance classification: Bag noise filtering for negative instance noise cleaning.” Information Sciences, 579, 388-400, 2021.
García-Gil D, Luengo J, García S, Herrera F. “Enabling smart data: noise filtering in big data classification.” Information Sciences, 479, 135-152, 2019.
Wang ZY, Luo XY, Liang J. “A Label Noise Robust Stacked Auto-Encoder Algorithm for Inaccurate Supervised Classification Problems.” Mathematical Problems in Engineering, 2019.
Marsala C, Petturiti D. “Rank discrimination measures for enforcing monotonicity in decision tree induction.” Information Sciences, 291, 143-171, 2015.
Zhu J, Liao S, Lei Z, Li S Z “Multi-label convolutional neural network based pedestrian attribute classification.” Image and Vision Computing, 58, 224-229, 2017.
Chao L, Zhipeng J, Yuanjie Z. “A novel reconstructed training-set SVM with roulette cooperative coevolution for financial time series classification.” Expert Systems with Applications, 123, 283-298, 2019.
Liao Y, Vemuri VR. “Use of k-nearest neighbor classifier for intrusion detection.”Computers & Security, 21(5), 439-448, 2002.
García-Pedrajas N, Ortiz-Boyer D. “Boosting k-nearest neighbor classifier by means of input space projection.” Expert Systems with Applications, 36(7), 10570-10582, 2009.
Wang ZY, Luo XY, Liang J. “A Label Noise Robust Stacked Auto-Encoder Algorithm for Inaccurate Supervised Classification Problems.”Mathematical Problems in Engineering, 2019.
Triguero I, García‐Gil D, Maillo J, Luengo J, García S, Herrera F. “Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(2), e1289, 2019.
Mantas C J, Abellan J. ´ “Analysis and extension of decision trees based on imprecise probabilities: Application on noisy data.” Expert Systems with Applications, 41(5), 2514–2525, 2014a.
Alam MM, Gazuruddin M, Ahmed N, Motaleb A, Rana M, Shishir RR, Rahman RM., “Classification of deep-SAT images under label noise. Applied” Artificial Intelligence, 35(14), 1196-1218, 2021.
Mantas CJ, Abellan J. “Credal-C4.5 decision tree based on imprecise probabilities to classify noisy data.” Expert Systems with Applications, 41(10), 4625–4637, 2014b.
Mantas, C. J., Abellan, J., & Castellano, J. G. ´ “Analysis of Credal-C4.5 for classification in noisy domains.” Expert Systems with Applications, 61, 314–326, 2016.
Maillo J, García S, Luengo J, Herrera, F, Triguero, I. “Fast and scalable approaches to accelerate the fuzzy k-Nearest neighbors classifier for big data.” IEEE Transactions on Fuzzy Systems, 28(5), 874-886, 2019.
Dua D, Graff C. “UCI Machine Learning Repository” [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. 2019.
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH, “The WEKA data mining software: an update”. ACM SIGKDD explorations newsletter, 11(1), 10-18, 2009.
Shokrzade A, Ramezani M, Tab FA, Mohammad, MA. “A novel extreme learning machine based kNN classification method for dealing with big data.” Expert Systems with Applications, 115293, 2021.
Liu CL, Lee CH, Lin PM. “A fall detection system using k-nearest neighbor classifier.” Expert systems with Applications, 37(10), 7174-7181, 2010.
Catal C., “Software fault prediction: A literature review and current trends.” Expert Systems with Applications, 38(4), 4626-4636, 2011
Yıldırım S, Yıldız T. “Türkçe için karşılaştırmalı metin sınıflandırma analizi” Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 24(5), 879-886, 2018
Saglam A, Baykan NA. “Continuous time threshold selection for binary classification on polarized data” Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 25(5), 596-602, 2019
Goodfellow, I., Bengio, Y., & Courville, A. Deep learning. MIT Press.,2016
Bishop, C. M. Pattern recognition and machine learning. Springer, 2006
Mansour, R. F., Abdel-Khalek, S., Hilali-Jaghdam, I., Nebhen, J., Cho, W., & Joshi, G. P. An intelligent outlier detection with machine learning empowered big data analytics for mobile edge computing. Cluster Computing, 1-13. 2023.
Dash, C. S. K., Behera, A. K., Dehuri, S., & Ghosh, A. An outliers detection and elimination framework in classification task of data mining. Decision Analytics Journal, 6, 100164. 2023
Li, J., Zhang, J., Zhang, J., & Zhang, S., Quantum KNN classification with K Value selection and neighbor selection. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 2023

Gürültülü Verilere Dayanıklı kNN Algoritması Temelli Yeni Bir İkili Sınıflandırma Algoritması

Year 2024, , 309 - 321, 31.10.2024

Müge Acar

https://doi.org/10.17671/gazibtd.1534334

Abstract

Sınıflandırma, belirlenmiş bazı kriterlere göre kategoriler halinde sistematik olarak verilerin analizinde kullanılan etkili bir tekniktir. Sınıflandırıcının başarısı, sınıflandırıcının kendisine ve verilerin kalitesine bağlıdır. Bununla birlikte, gerçek hayat uygulamalarında, veri kümelerinin yanlış etiketlenmiş örnekler içermesi kaçınılmazdır. Gerçek hayat verileri gürültü olarak bilinen yanlış etiketlenmiş örnekler içerebilir. Bu da yanlış sınıflandırmalara neden olabilir. Bu çalışma, yeni bir kNN (k en yakın komşuluk algortiması) tabanlı sınıflandırma algoritması ile gürültü verilerinin sınıflandırılmasının nicel bir değerlendirmesini ve verileri verimli bir şekilde sınıflandırarak klasik kNN'nin performansını artırmayı amaçlamaktadır. Bu yeni tekniğin, gürültü verileriyle ikili sınıflandırma problemlerinde yüksek standart doğruluk seviyeleri sağlayabileceğini önermekteyiz. Bu çalışma, sınıflandırmadan önce gürültü noktaları tespit edilmesini dikkate alarak ikili sınıflandırma problemlerinde kNN tekniğinin performansını arttırabilmektedir. Yeni kNN ve klasik kNN algoritmalarını farklı gürültü seviyelerinde (%10, %20, %30 ve %40) farklı veri setlerinde test doğruluğu açısından ölçerek karşılaştırıldı ve başarılı sonuçlar elde edildi. Ayrıca yeni algoritma popüler sınıflandırma algoritmalarıyla karşılaştırıldı ve daha iyi test doğruluğu sonuçları elde edildi.

Keywords

İkili Sınıflandırma, Gürültü Verisi, Veri Madenciliği

References

Bootkrajang J. “A generalised label noise model for classification in the presence of annotation errors.” Neurocomputing, 192, 61-71, 2016.
Garcia LP, De Carvalho AC, Lorena AC. “Effect of label noise in the complexity of classification problems.” Neurocomputing, 160, 108-119, 2015.
Sáez JA, Galar M, Luengo, J, Herrera, F. “Tackling the problem of classification with noisy data using multiple classifier systems: analysis of the performance and robustness.” Information Sciences, 247, 1-20, 2013.
Sáez, JA, Corchado, E. “ANCES: A novel method to repair attribute noise in classification problems.” Pattern Recognition, 121, 108198, 2022
Zhu X, Wu X. “Class noise vs. attribute noise: A quantitative study.” Artificial Intelligence Review, 22(3), 177-210, 2004.
Sluban B, Lavrač, N. “Relating ensemble diversity and performance: A study in class noise detection.” Neurocomputing, 160, 120-131, 2015.
Luengo J, Sánchez-Tarragó D, Prati RC, Herrera F. “Multiple instance classification: Bag noise filtering for negative instance noise cleaning.” Information Sciences, 579, 388-400, 2021.
García-Gil D, Luengo J, García S, Herrera F. “Enabling smart data: noise filtering in big data classification.” Information Sciences, 479, 135-152, 2019.
Wang ZY, Luo XY, Liang J. “A Label Noise Robust Stacked Auto-Encoder Algorithm for Inaccurate Supervised Classification Problems.” Mathematical Problems in Engineering, 2019.
Marsala C, Petturiti D. “Rank discrimination measures for enforcing monotonicity in decision tree induction.” Information Sciences, 291, 143-171, 2015.
Zhu J, Liao S, Lei Z, Li S Z “Multi-label convolutional neural network based pedestrian attribute classification.” Image and Vision Computing, 58, 224-229, 2017.
Chao L, Zhipeng J, Yuanjie Z. “A novel reconstructed training-set SVM with roulette cooperative coevolution for financial time series classification.” Expert Systems with Applications, 123, 283-298, 2019.
Liao Y, Vemuri VR. “Use of k-nearest neighbor classifier for intrusion detection.”Computers & Security, 21(5), 439-448, 2002.
García-Pedrajas N, Ortiz-Boyer D. “Boosting k-nearest neighbor classifier by means of input space projection.” Expert Systems with Applications, 36(7), 10570-10582, 2009.
Wang ZY, Luo XY, Liang J. “A Label Noise Robust Stacked Auto-Encoder Algorithm for Inaccurate Supervised Classification Problems.”Mathematical Problems in Engineering, 2019.
Triguero I, García‐Gil D, Maillo J, Luengo J, García S, Herrera F. “Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(2), e1289, 2019.
Mantas C J, Abellan J. ´ “Analysis and extension of decision trees based on imprecise probabilities: Application on noisy data.” Expert Systems with Applications, 41(5), 2514–2525, 2014a.
Alam MM, Gazuruddin M, Ahmed N, Motaleb A, Rana M, Shishir RR, Rahman RM., “Classification of deep-SAT images under label noise. Applied” Artificial Intelligence, 35(14), 1196-1218, 2021.
Mantas CJ, Abellan J. “Credal-C4.5 decision tree based on imprecise probabilities to classify noisy data.” Expert Systems with Applications, 41(10), 4625–4637, 2014b.
Mantas, C. J., Abellan, J., & Castellano, J. G. ´ “Analysis of Credal-C4.5 for classification in noisy domains.” Expert Systems with Applications, 61, 314–326, 2016.
Maillo J, García S, Luengo J, Herrera, F, Triguero, I. “Fast and scalable approaches to accelerate the fuzzy k-Nearest neighbors classifier for big data.” IEEE Transactions on Fuzzy Systems, 28(5), 874-886, 2019.
Dua D, Graff C. “UCI Machine Learning Repository” [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. 2019.
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH, “The WEKA data mining software: an update”. ACM SIGKDD explorations newsletter, 11(1), 10-18, 2009.
Shokrzade A, Ramezani M, Tab FA, Mohammad, MA. “A novel extreme learning machine based kNN classification method for dealing with big data.” Expert Systems with Applications, 115293, 2021.
Liu CL, Lee CH, Lin PM. “A fall detection system using k-nearest neighbor classifier.” Expert systems with Applications, 37(10), 7174-7181, 2010.
Catal C., “Software fault prediction: A literature review and current trends.” Expert Systems with Applications, 38(4), 4626-4636, 2011
Yıldırım S, Yıldız T. “Türkçe için karşılaştırmalı metin sınıflandırma analizi” Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 24(5), 879-886, 2018
Saglam A, Baykan NA. “Continuous time threshold selection for binary classification on polarized data” Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 25(5), 596-602, 2019
Goodfellow, I., Bengio, Y., & Courville, A. Deep learning. MIT Press.,2016
Bishop, C. M. Pattern recognition and machine learning. Springer, 2006
Mansour, R. F., Abdel-Khalek, S., Hilali-Jaghdam, I., Nebhen, J., Cho, W., & Joshi, G. P. An intelligent outlier detection with machine learning empowered big data analytics for mobile edge computing. Cluster Computing, 1-13. 2023.
Dash, C. S. K., Behera, A. K., Dehuri, S., & Ghosh, A. An outliers detection and elimination framework in classification task of data mining. Decision Analytics Journal, 6, 100164. 2023
Li, J., Zhang, J., Zhang, J., & Zhang, S., Quantum KNN classification with K Value selection and neighbor selection. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 2023

There are 33 citations in total.

Details

Primary Language	English
Subjects	Data Mining and Knowledge Discovery
Journal Section	Articles
Authors	Müge Acar 0000-0001-6937-1211
Publication Date	October 31, 2024
Submission Date	August 16, 2024
Acceptance Date	October 15, 2024
Published in Issue	Year 2024

Cite

APA	Acar, M. (2024). A new binary classifier robust on noisy domains based on kNN algorithm. Bilişim Teknolojileri Dergisi, 17(4), 309-321. https://doi.org/10.17671/gazibtd.1534334

Article Files

Full Text