Research Article
BibTex RIS Cite

A Hybrid Technique for Detection and Handling Noise in Binary Classification

Year 2025, Volume: 30 Issue: 2, 584 - 595, 31.08.2025
https://doi.org/10.53433/yyufbed.1632877

Abstract

Binary classification is a widely utilized method in data mining. However, the presence of noise within the training dataset can significantly impact classification accuracy. Our aim in this study is to identify such noisy data by using polyhedral conic functions. Then the dataset is reconstructed by making the necessary changes to enhance the effectiveness of binary classification studies by improving the quality of the training data.

References

  • Astorino, A., Gaudioso, M., & Seeger, A. (2014). Conic separation of finite sets I. The homogeneous case. Journal of Convex Analysis, 21(1), 1-28.
  • Bagirov, A. M., & Mardaneh, K. (2006). Modified global k-means algorithm for clustering in gene expression data sets. In Proceedings of The 2006 Workshop on Intelligent Systems for Bioinformatics, 73-82. https://dl.acm.org/doi/10.5555/1274172.1274176
  • Blaszczynski, J., & Stefanowski, J. (2015). Neighbourhood sampling in bagging for imbalanced data. Neurocomputing, 150, 529-542. https://doi.org/10.1016/j.neucom.2014.07.064
  • Bramer, M. (2020). Principles of data mining. Springer. https://doi.org/10.1007/978-1-4471-7493-6
  • Brodley, C. E., & Friedl, M. A. (1999). Identifying mislabeled training data. Journal of Artificial Intelligence Research, 11, 131–167. https://doi.org/ 10.1613/jair.606
  • Gasimov, R. N., & Ozturk, G. (2006). Separation via polyhedral conic functions. Optimization Methods and Software, 21(4), 527-540. https://doi.org/10.1080/10556780600723252
  • Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., & Heming, J. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, 178-210. https://doi.org/10.1016/j.ins.2022.11.139
  • Karimi, D., Dou, H., Warfield, S. K., & Gholi, A. (2020). Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis. Medical Image Analysis, 65, 101759. https://doi.org/10.1016/j.media.2020.101759
  • Kasimbeyli, R. (2010). A nonlinear cone separation theorem and scalarization in nonconvex vector optimization. SIAM Journal on Optimization, 20(3), 1591-1619. https://doi.org/10.1137/070694089
  • Kelly, M., Longjohn, R., & Nottingham, K. (2023). UCI machine learning repository, University of California, School of Information and Computer Science. Erişim tarihi: 28.01.2025. http://archive.ics.uci.edu/ml
  • Kumar, R., Jayaraman, V. K., & Kulkarni, B. D. (2005). A SVM classifier incorporating simultaneous noise reduction and feature selection: Illustrative case examples. Pattern Recognition, 38(1), 41-49. https://doi.org/10.1016/j.patcog.2004.06.002
  • Liu, J., Huang, L-W., Shao, Y-H., Chen, W-J. & Li, C-N. (2024). A nonlinear kernel SVM classifier via L0/1 soft-margin loss with classification performance. Journal of Computational and Applied Mathematics, 437, 115471. https://doi.org/10.1016/j.cam.2023.115471
  • Mangasarian, O. L. (2002). A finite Newton method for classification. Journal of Machine Learning Research, 3, 887–911. https://doi.org/10.1080/1055678021000028375
  • Ozturk, G., & Ciftci, M. (2015). Clustering based polyhedral conic functions algorithm in classification. Journal of Industrial and Management Optimization, 11(3), 921-932. https://doi.org/10.3934/jimo.2015.11.921
  • Patil, S. B., & Deshmukh, S. R. (2011, November). Use of support vector machine, decision tree and naive Bayesian techniques for wind speed classification. In International Conference on Power and Energy Systems (ICPES). https://doi.org/10.1109/ICPES.2011.6156687
  • Saez, J. A., Galar M., Luengo, J., & Herrera, F. (2016). INFFC: An iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Information Fusion, 27, 19–32. https://doi.org/10.1016/j.inffus.2015.04.002
  • Segata, N., Blanzieri, E., Delany, S. J., & Cunningham, P. (2010). Noise reduction for instance-based learning with a local maximal margin approach. Journal of Intelligent Information Systems, 35(2), 245-264. https://doi.org/10.1007/s10844-009-0101-z
  • Szeghalmy, S., & Fazekas, A. (2024). A comparative study on noise filtering of imbalanced data sets. Knowledge-Based Systems, 301, 112236. https://doi.org/10.1016/j.knosys.2024.112236
  • Uylas Sati, N. (2015). A binary classification algorithm based on polyhedral conic functions. Düzce University Journal of Science and Technology, 3(1), 152-161.
  • Uylas, N. (2013). Methods based on mathematical optimization for data classification. (PhD), Ege University, Institute of Natural and Applied Science, Izmir, Turkey.
  • Wang, R. Y., Storey, V. C., & Firth, C. P. (1995). A framework for analysis of data quality research. IEEE Transactions on Knowledge and Data Engineering, 7(4), 623-640. https://doi.org/10.1109/69.404034
  • Zhu, X., & Wu, X. (2004). Class noise vs. attribute noise: A quantitative study. Artificial Intelligence Review, 22(3), 177-210. https://doi.org/10.1007/s10462-004-0751-8

İkili Sınıflandırmada Gürültünün Algılanması ve Ele Alınması için Hibrit Bir Teknik

Year 2025, Volume: 30 Issue: 2, 584 - 595, 31.08.2025
https://doi.org/10.53433/yyufbed.1632877

Abstract

İkili sınıflandırma, günümüzde veri madenciliği araştırmacıları tarafından sıkça kullanılan yöntemlerden birisidir. Bu yöntemde, eğitim kümesindeki verilerde saklı gürültü doğruluk değerini önemli ölçüde etkilemektedir. Bu çalışmada amacımız bu gürültü verilerini çok yüzlü konik fonksiyonlar yardımıyla belirlemek ve sonrasında gerekli değişiklikleri yaparak yeniden oluşturduğumuz eğitim veri kümesi ile ikili sınıflandırma çalışmalarında daha etkili sonuçlar elde etmektir.

References

  • Astorino, A., Gaudioso, M., & Seeger, A. (2014). Conic separation of finite sets I. The homogeneous case. Journal of Convex Analysis, 21(1), 1-28.
  • Bagirov, A. M., & Mardaneh, K. (2006). Modified global k-means algorithm for clustering in gene expression data sets. In Proceedings of The 2006 Workshop on Intelligent Systems for Bioinformatics, 73-82. https://dl.acm.org/doi/10.5555/1274172.1274176
  • Blaszczynski, J., & Stefanowski, J. (2015). Neighbourhood sampling in bagging for imbalanced data. Neurocomputing, 150, 529-542. https://doi.org/10.1016/j.neucom.2014.07.064
  • Bramer, M. (2020). Principles of data mining. Springer. https://doi.org/10.1007/978-1-4471-7493-6
  • Brodley, C. E., & Friedl, M. A. (1999). Identifying mislabeled training data. Journal of Artificial Intelligence Research, 11, 131–167. https://doi.org/ 10.1613/jair.606
  • Gasimov, R. N., & Ozturk, G. (2006). Separation via polyhedral conic functions. Optimization Methods and Software, 21(4), 527-540. https://doi.org/10.1080/10556780600723252
  • Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., & Heming, J. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, 178-210. https://doi.org/10.1016/j.ins.2022.11.139
  • Karimi, D., Dou, H., Warfield, S. K., & Gholi, A. (2020). Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis. Medical Image Analysis, 65, 101759. https://doi.org/10.1016/j.media.2020.101759
  • Kasimbeyli, R. (2010). A nonlinear cone separation theorem and scalarization in nonconvex vector optimization. SIAM Journal on Optimization, 20(3), 1591-1619. https://doi.org/10.1137/070694089
  • Kelly, M., Longjohn, R., & Nottingham, K. (2023). UCI machine learning repository, University of California, School of Information and Computer Science. Erişim tarihi: 28.01.2025. http://archive.ics.uci.edu/ml
  • Kumar, R., Jayaraman, V. K., & Kulkarni, B. D. (2005). A SVM classifier incorporating simultaneous noise reduction and feature selection: Illustrative case examples. Pattern Recognition, 38(1), 41-49. https://doi.org/10.1016/j.patcog.2004.06.002
  • Liu, J., Huang, L-W., Shao, Y-H., Chen, W-J. & Li, C-N. (2024). A nonlinear kernel SVM classifier via L0/1 soft-margin loss with classification performance. Journal of Computational and Applied Mathematics, 437, 115471. https://doi.org/10.1016/j.cam.2023.115471
  • Mangasarian, O. L. (2002). A finite Newton method for classification. Journal of Machine Learning Research, 3, 887–911. https://doi.org/10.1080/1055678021000028375
  • Ozturk, G., & Ciftci, M. (2015). Clustering based polyhedral conic functions algorithm in classification. Journal of Industrial and Management Optimization, 11(3), 921-932. https://doi.org/10.3934/jimo.2015.11.921
  • Patil, S. B., & Deshmukh, S. R. (2011, November). Use of support vector machine, decision tree and naive Bayesian techniques for wind speed classification. In International Conference on Power and Energy Systems (ICPES). https://doi.org/10.1109/ICPES.2011.6156687
  • Saez, J. A., Galar M., Luengo, J., & Herrera, F. (2016). INFFC: An iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Information Fusion, 27, 19–32. https://doi.org/10.1016/j.inffus.2015.04.002
  • Segata, N., Blanzieri, E., Delany, S. J., & Cunningham, P. (2010). Noise reduction for instance-based learning with a local maximal margin approach. Journal of Intelligent Information Systems, 35(2), 245-264. https://doi.org/10.1007/s10844-009-0101-z
  • Szeghalmy, S., & Fazekas, A. (2024). A comparative study on noise filtering of imbalanced data sets. Knowledge-Based Systems, 301, 112236. https://doi.org/10.1016/j.knosys.2024.112236
  • Uylas Sati, N. (2015). A binary classification algorithm based on polyhedral conic functions. Düzce University Journal of Science and Technology, 3(1), 152-161.
  • Uylas, N. (2013). Methods based on mathematical optimization for data classification. (PhD), Ege University, Institute of Natural and Applied Science, Izmir, Turkey.
  • Wang, R. Y., Storey, V. C., & Firth, C. P. (1995). A framework for analysis of data quality research. IEEE Transactions on Knowledge and Data Engineering, 7(4), 623-640. https://doi.org/10.1109/69.404034
  • Zhu, X., & Wu, X. (2004). Class noise vs. attribute noise: A quantitative study. Artificial Intelligence Review, 22(3), 177-210. https://doi.org/10.1007/s10462-004-0751-8
There are 22 citations in total.

Details

Primary Language English
Subjects Mathematical Optimisation, Numerical and Computational Mathematics (Other)
Journal Section Natural Sciences and Mathematics / Fen Bilimleri ve Matematik
Authors

Nur Uylaş Satı 0000-0003-1553-9466

Publication Date August 31, 2025
Submission Date February 4, 2025
Acceptance Date July 14, 2025
Published in Issue Year 2025 Volume: 30 Issue: 2

Cite

APA Uylaş Satı, N. (2025). A Hybrid Technique for Detection and Handling Noise in Binary Classification. Yüzüncü Yıl Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 30(2), 584-595. https://doi.org/10.53433/yyufbed.1632877