Araştırma Makalesi
BibTex RIS Kaynak Göster

A Hybrid Technique for Detection and Handling Noise in Binary Classification

Yıl 2025, Cilt: 30 Sayı: 2, 584 - 595, 31.08.2025
https://doi.org/10.53433/yyufbed.1632877

Öz

Binary classification is a widely utilized method in data mining. However, the presence of noise within the training dataset can significantly impact classification accuracy. Our aim in this study is to identify such noisy data by using polyhedral conic functions. Then the dataset is reconstructed by making the necessary changes to enhance the effectiveness of binary classification studies by improving the quality of the training data.

Kaynakça

  • Astorino, A., Gaudioso, M., & Seeger, A. (2014). Conic separation of finite sets I. The homogeneous case. Journal of Convex Analysis, 21(1), 1-28.
  • Bagirov, A. M., & Mardaneh, K. (2006). Modified global k-means algorithm for clustering in gene expression data sets. In Proceedings of The 2006 Workshop on Intelligent Systems for Bioinformatics, 73-82. https://dl.acm.org/doi/10.5555/1274172.1274176
  • Blaszczynski, J., & Stefanowski, J. (2015). Neighbourhood sampling in bagging for imbalanced data. Neurocomputing, 150, 529-542. https://doi.org/10.1016/j.neucom.2014.07.064
  • Bramer, M. (2020). Principles of data mining. Springer. https://doi.org/10.1007/978-1-4471-7493-6
  • Brodley, C. E., & Friedl, M. A. (1999). Identifying mislabeled training data. Journal of Artificial Intelligence Research, 11, 131–167. https://doi.org/ 10.1613/jair.606
  • Gasimov, R. N., & Ozturk, G. (2006). Separation via polyhedral conic functions. Optimization Methods and Software, 21(4), 527-540. https://doi.org/10.1080/10556780600723252
  • Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., & Heming, J. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, 178-210. https://doi.org/10.1016/j.ins.2022.11.139
  • Karimi, D., Dou, H., Warfield, S. K., & Gholi, A. (2020). Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis. Medical Image Analysis, 65, 101759. https://doi.org/10.1016/j.media.2020.101759
  • Kasimbeyli, R. (2010). A nonlinear cone separation theorem and scalarization in nonconvex vector optimization. SIAM Journal on Optimization, 20(3), 1591-1619. https://doi.org/10.1137/070694089
  • Kelly, M., Longjohn, R., & Nottingham, K. (2023). UCI machine learning repository, University of California, School of Information and Computer Science. Erişim tarihi: 28.01.2025. http://archive.ics.uci.edu/ml
  • Kumar, R., Jayaraman, V. K., & Kulkarni, B. D. (2005). A SVM classifier incorporating simultaneous noise reduction and feature selection: Illustrative case examples. Pattern Recognition, 38(1), 41-49. https://doi.org/10.1016/j.patcog.2004.06.002
  • Liu, J., Huang, L-W., Shao, Y-H., Chen, W-J. & Li, C-N. (2024). A nonlinear kernel SVM classifier via L0/1 soft-margin loss with classification performance. Journal of Computational and Applied Mathematics, 437, 115471. https://doi.org/10.1016/j.cam.2023.115471
  • Mangasarian, O. L. (2002). A finite Newton method for classification. Journal of Machine Learning Research, 3, 887–911. https://doi.org/10.1080/1055678021000028375
  • Ozturk, G., & Ciftci, M. (2015). Clustering based polyhedral conic functions algorithm in classification. Journal of Industrial and Management Optimization, 11(3), 921-932. https://doi.org/10.3934/jimo.2015.11.921
  • Patil, S. B., & Deshmukh, S. R. (2011, November). Use of support vector machine, decision tree and naive Bayesian techniques for wind speed classification. In International Conference on Power and Energy Systems (ICPES). https://doi.org/10.1109/ICPES.2011.6156687
  • Saez, J. A., Galar M., Luengo, J., & Herrera, F. (2016). INFFC: An iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Information Fusion, 27, 19–32. https://doi.org/10.1016/j.inffus.2015.04.002
  • Segata, N., Blanzieri, E., Delany, S. J., & Cunningham, P. (2010). Noise reduction for instance-based learning with a local maximal margin approach. Journal of Intelligent Information Systems, 35(2), 245-264. https://doi.org/10.1007/s10844-009-0101-z
  • Szeghalmy, S., & Fazekas, A. (2024). A comparative study on noise filtering of imbalanced data sets. Knowledge-Based Systems, 301, 112236. https://doi.org/10.1016/j.knosys.2024.112236
  • Uylas Sati, N. (2015). A binary classification algorithm based on polyhedral conic functions. Düzce University Journal of Science and Technology, 3(1), 152-161.
  • Uylas, N. (2013). Methods based on mathematical optimization for data classification. (PhD), Ege University, Institute of Natural and Applied Science, Izmir, Turkey.
  • Wang, R. Y., Storey, V. C., & Firth, C. P. (1995). A framework for analysis of data quality research. IEEE Transactions on Knowledge and Data Engineering, 7(4), 623-640. https://doi.org/10.1109/69.404034
  • Zhu, X., & Wu, X. (2004). Class noise vs. attribute noise: A quantitative study. Artificial Intelligence Review, 22(3), 177-210. https://doi.org/10.1007/s10462-004-0751-8

İkili Sınıflandırmada Gürültünün Algılanması ve Ele Alınması için Hibrit Bir Teknik

Yıl 2025, Cilt: 30 Sayı: 2, 584 - 595, 31.08.2025
https://doi.org/10.53433/yyufbed.1632877

Öz

İkili sınıflandırma, günümüzde veri madenciliği araştırmacıları tarafından sıkça kullanılan yöntemlerden birisidir. Bu yöntemde, eğitim kümesindeki verilerde saklı gürültü doğruluk değerini önemli ölçüde etkilemektedir. Bu çalışmada amacımız bu gürültü verilerini çok yüzlü konik fonksiyonlar yardımıyla belirlemek ve sonrasında gerekli değişiklikleri yaparak yeniden oluşturduğumuz eğitim veri kümesi ile ikili sınıflandırma çalışmalarında daha etkili sonuçlar elde etmektir.

Kaynakça

  • Astorino, A., Gaudioso, M., & Seeger, A. (2014). Conic separation of finite sets I. The homogeneous case. Journal of Convex Analysis, 21(1), 1-28.
  • Bagirov, A. M., & Mardaneh, K. (2006). Modified global k-means algorithm for clustering in gene expression data sets. In Proceedings of The 2006 Workshop on Intelligent Systems for Bioinformatics, 73-82. https://dl.acm.org/doi/10.5555/1274172.1274176
  • Blaszczynski, J., & Stefanowski, J. (2015). Neighbourhood sampling in bagging for imbalanced data. Neurocomputing, 150, 529-542. https://doi.org/10.1016/j.neucom.2014.07.064
  • Bramer, M. (2020). Principles of data mining. Springer. https://doi.org/10.1007/978-1-4471-7493-6
  • Brodley, C. E., & Friedl, M. A. (1999). Identifying mislabeled training data. Journal of Artificial Intelligence Research, 11, 131–167. https://doi.org/ 10.1613/jair.606
  • Gasimov, R. N., & Ozturk, G. (2006). Separation via polyhedral conic functions. Optimization Methods and Software, 21(4), 527-540. https://doi.org/10.1080/10556780600723252
  • Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., & Heming, J. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, 178-210. https://doi.org/10.1016/j.ins.2022.11.139
  • Karimi, D., Dou, H., Warfield, S. K., & Gholi, A. (2020). Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis. Medical Image Analysis, 65, 101759. https://doi.org/10.1016/j.media.2020.101759
  • Kasimbeyli, R. (2010). A nonlinear cone separation theorem and scalarization in nonconvex vector optimization. SIAM Journal on Optimization, 20(3), 1591-1619. https://doi.org/10.1137/070694089
  • Kelly, M., Longjohn, R., & Nottingham, K. (2023). UCI machine learning repository, University of California, School of Information and Computer Science. Erişim tarihi: 28.01.2025. http://archive.ics.uci.edu/ml
  • Kumar, R., Jayaraman, V. K., & Kulkarni, B. D. (2005). A SVM classifier incorporating simultaneous noise reduction and feature selection: Illustrative case examples. Pattern Recognition, 38(1), 41-49. https://doi.org/10.1016/j.patcog.2004.06.002
  • Liu, J., Huang, L-W., Shao, Y-H., Chen, W-J. & Li, C-N. (2024). A nonlinear kernel SVM classifier via L0/1 soft-margin loss with classification performance. Journal of Computational and Applied Mathematics, 437, 115471. https://doi.org/10.1016/j.cam.2023.115471
  • Mangasarian, O. L. (2002). A finite Newton method for classification. Journal of Machine Learning Research, 3, 887–911. https://doi.org/10.1080/1055678021000028375
  • Ozturk, G., & Ciftci, M. (2015). Clustering based polyhedral conic functions algorithm in classification. Journal of Industrial and Management Optimization, 11(3), 921-932. https://doi.org/10.3934/jimo.2015.11.921
  • Patil, S. B., & Deshmukh, S. R. (2011, November). Use of support vector machine, decision tree and naive Bayesian techniques for wind speed classification. In International Conference on Power and Energy Systems (ICPES). https://doi.org/10.1109/ICPES.2011.6156687
  • Saez, J. A., Galar M., Luengo, J., & Herrera, F. (2016). INFFC: An iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Information Fusion, 27, 19–32. https://doi.org/10.1016/j.inffus.2015.04.002
  • Segata, N., Blanzieri, E., Delany, S. J., & Cunningham, P. (2010). Noise reduction for instance-based learning with a local maximal margin approach. Journal of Intelligent Information Systems, 35(2), 245-264. https://doi.org/10.1007/s10844-009-0101-z
  • Szeghalmy, S., & Fazekas, A. (2024). A comparative study on noise filtering of imbalanced data sets. Knowledge-Based Systems, 301, 112236. https://doi.org/10.1016/j.knosys.2024.112236
  • Uylas Sati, N. (2015). A binary classification algorithm based on polyhedral conic functions. Düzce University Journal of Science and Technology, 3(1), 152-161.
  • Uylas, N. (2013). Methods based on mathematical optimization for data classification. (PhD), Ege University, Institute of Natural and Applied Science, Izmir, Turkey.
  • Wang, R. Y., Storey, V. C., & Firth, C. P. (1995). A framework for analysis of data quality research. IEEE Transactions on Knowledge and Data Engineering, 7(4), 623-640. https://doi.org/10.1109/69.404034
  • Zhu, X., & Wu, X. (2004). Class noise vs. attribute noise: A quantitative study. Artificial Intelligence Review, 22(3), 177-210. https://doi.org/10.1007/s10462-004-0751-8
Toplam 22 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Matematikte Optimizasyon, Sayısal ve Hesaplamalı Matematik (Diğer)
Bölüm Fen Bilimleri ve Matematik / Natural Sciences and Mathematics
Yazarlar

Nur Uylaş Satı 0000-0003-1553-9466

Yayımlanma Tarihi 31 Ağustos 2025
Gönderilme Tarihi 4 Şubat 2025
Kabul Tarihi 14 Temmuz 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 30 Sayı: 2

Kaynak Göster

APA Uylaş Satı, N. (2025). A Hybrid Technique for Detection and Handling Noise in Binary Classification. Yüzüncü Yıl Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 30(2), 584-595. https://doi.org/10.53433/yyufbed.1632877