Research Article
BibTex RIS Cite

kNN Yaklaşımını Temel Alan Düzenlenmiş En Yakın Komşu Veri İndirgeme Yönteminin İyileştirilmiş Bir Versiyonu

Year 2025, Volume: 27 Issue: 81, 376 - 381, 29.09.2025
https://doi.org/10.21205/deufmd.2025278105

Abstract

Makine öğrenimi alanında, dengesiz veri kümeleriyle başa çıkmak önemli bir zorluk olmaya devam etmekte olup, genellikle çeşitli örnekleme teknikleriyle ele alınmaktadır. Bu teknikler arasında, Düzeltilmiş En Yakın Komşu (ENN) alt örnekleme yöntemi, sınıflandırıcı performansını artırma ve sınıf dengesizliğini azaltma yeteneğiyle geniş çapta tanınmaktadır. Ancak geleneksel ENN yönteminin, potansiyel olarak bilgilendirici örneklerin kaldırılması ve karmaşık veri kümelerinde yetersiz performans gibi sınırlamaları vardır. Bu makale, k-Nearest Neighbors (k-NN) yaklaşımını kullanarak örnek kaldırma sürecini iyileştiren ENN alt örnekleme yönteminin geliştirilmiş bir versiyonunu sunmaktadır. Önerilen yöntem, geleneksel ENN'yi, bilgilendirici örnekleri daha iyi korurken aynı zamanda gürültüyü etkili bir şekilde azaltan k-NN algoritmasına dayalı daha gelişmiş bir komşu değerlendirme kriteri ekleyerek iyileştirmektedir. Birçok benchmark veri kümesinde yapılan kapsamlı deneylerle, geliştirilmiş ENN yöntemimizin sınıflandırma doğruluğu, F1 skoru ve AUC açısından geleneksel ENN ve diğer en son alt örnekleme tekniklerine kıyasla üstün performans sergilediğini gösteriyoruz. Sonuçlar, geliştirilmiş ENN yönteminin yalnızca sınıf dengesizliği sorununu daha etkili bir şekilde hafifletmekle kalmayıp, aynı zamanda veri bütünlüğünü daha yüksek seviyede koruduğunu ve böylece makine öğrenimi modellerinin dayanıklılığını ve güvenilirliğini artırdığını göstermektedir. Bu ilerleme, dengesiz veri kümeleriyle çalışan uygulayıcılar için değerli bir araç sunarak, daha doğru ve verimli tahmin modellerinin geliştirilmesine katkıda bulunmaktadır.

References

  • Maneerat, T., Iam-On, N., Boongoen, T., Kirimasthong, K., Naik, N., Yang, L., Shen, Q. 2025. Optimisation of Multiple Clustering-Based Undersampling Using Artificial Bee Colony: Application to Improved Detection of Obfuscated Patterns without Adversarial Training, Information Sciences, 687, Article 121407. https://doi.org/10.1016/j.ins.2024.121407
  • Ghasemkhani, B., Yilmaz, R., Kut, A., Birant, D., 2023, Logistic Model Tree Forest for Steel Plates Faults Prediction, Machines, 11 (7), 679, https://doi.org/10.3390/machines11070679.
  • Sun, P., Du, Y., Xiong, S. 2024. Nearest neighbors and density-based undersampling for imbalanced data classification with class overlap, Neurocomputing, 609, Article 128492. https://doi.org/10.1016/j.neucom.2024.128492.
  • Zuo, Y., Wan, M., Shen, Y., Wang, X., He, W., Bi, Y., Liu, X., Deng, Z. 2024. ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique, Computational Biology and Chemistry, 113, Article 108212. https://doi.org/10.1016/j.compbiolchem.2024.108212.
  • Lim, D., Van Doorsselaere, T., Nakariakov, V. M., Kolotkov, D. Y., Gao, Y., Berghmans, D. 2024. "Undersampling Effects on Observed Periods of Coronal Oscillations," Astronomy & Astrophysics, 690(L8). https://doi.org/10.1051/0004-6361/202451684.
  • Nasibov, E., Dogan, A. 2016. An Efficient Algorithm for Classification of EEG Eye State Data, Global Journal of Information Technology: Emerging Technologies, 6 (3), 158-165, https://doi.org/10.18844/gjit.v6i3.
  • Wainer, J. 2024. An Empirical Evaluation of Imbalanced Data Strategies from a Practitioner's Point of View, Expert Systems with Applications, 256, Article 124863. https://doi.org/10.1016/j.eswa.2024.124863.
  • Bach, M. 2022, New Undersampling Method Based on the kNN Approach, 26th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2022), Procedia Computer Science 207, 3397-3406, https://doi.org/10.1016/j.procs.2022.09.399
  • Kim, Y., Choi, W., Choi, W., Ko, G., Han, S., Kim, H.-C., Kim, D., Lee, D.-G., Shin, D. W., Lee, Y. 2024. A Machine Learning Approach Using Conditional Normalizing Flow to Address Extreme Class Imbalance Problems in Personal Health Record,. BioData Mining, 17(1), Article 14. https://doi.org/10.1186/s13040-024-00366-0.
  • Hancock, J. T., Wang, H., Khoshgoftaar, T. M., Liang, Q. 2024. Data Reduction Techniques for Highly Imbalanced Medicare Big Data, Journal of Big Data, 11(1), Article 8. https://doi.org/10.1186/s40537-023-00869-3.
  • Yang, C., Fridgeirsson, E. A., Kors, J. A., Reps, J. M., Rijnbeek, P. R. 2024. Impact of Random Oversampling and Random Undersampling on the Performance of Prediction Models Developed Using Observational Health Data, Journal of Big Data, 11(1), Article 7. https://doi.org/10.1186/s40537-023-00857-7.
  • Kubicka, F., Nitschke, L., Penzkofer, T., Tan, Q., Nickel, M.D., Wakonig, K.M., Fahlenkamp, U.L., Lerchbaumer, M., Michallek, F., Dommerich, S., Hamm, B., Wagner, M., Walter-Rittel, T. 2024. "Dynamic contrast enhanced MRI of the head and neck region using a VIBE sequence with Cartesian undersampling and compressed sensing," Magnetic Resonance Imaging, 113, Article 110220. https://doi.org/10.1016/j.mri.2024.110220.

An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach

Year 2025, Volume: 27 Issue: 81, 376 - 381, 29.09.2025
https://doi.org/10.21205/deufmd.2025278105

Abstract

In the field of machine learning, handling imbalanced datasets remains a critical challenge, often addressed through various sampling techniques. Among these techniques, the Edited Nearest Neighbor (ENN) undersampling method is widely recognized for its ability to enhance classifier performance by reducing class imbalance. However, the traditional ENN method has limitations, such as the removal of potentially informative instances and suboptimal performance in complex datasets. This paper presents an improved version of the ENN undersampling method, leveraging the k-Nearest Neighbors (kNN) approach to refine the selection process for instance removal. The proposed method improves upon the traditional ENN by incorporating a more sophisticated neighbor evaluation criterion based on the k-NN algorithm, which better preserves informative instances while effectively reducing noise. Through extensive experiments on multiple benchmark datasets, we demonstrate that our improved ENN method achieves superior performance in terms of classification accuracy, F1-score, and AUC, compared to the traditional ENN and other state-of-the-art undersampling techniques. The results indicate that the improved ENN method not only mitigates the class imbalance problem more effectively but also maintains a higher level of data integrity, thereby enhancingthe robustness and reliability of machine learning models. This advancement provides a valuable tool for practitioners dealing with imbalanced datasets, contributing to the development of more accurate and efficient predictive models.

References

  • Maneerat, T., Iam-On, N., Boongoen, T., Kirimasthong, K., Naik, N., Yang, L., Shen, Q. 2025. Optimisation of Multiple Clustering-Based Undersampling Using Artificial Bee Colony: Application to Improved Detection of Obfuscated Patterns without Adversarial Training, Information Sciences, 687, Article 121407. https://doi.org/10.1016/j.ins.2024.121407
  • Ghasemkhani, B., Yilmaz, R., Kut, A., Birant, D., 2023, Logistic Model Tree Forest for Steel Plates Faults Prediction, Machines, 11 (7), 679, https://doi.org/10.3390/machines11070679.
  • Sun, P., Du, Y., Xiong, S. 2024. Nearest neighbors and density-based undersampling for imbalanced data classification with class overlap, Neurocomputing, 609, Article 128492. https://doi.org/10.1016/j.neucom.2024.128492.
  • Zuo, Y., Wan, M., Shen, Y., Wang, X., He, W., Bi, Y., Liu, X., Deng, Z. 2024. ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique, Computational Biology and Chemistry, 113, Article 108212. https://doi.org/10.1016/j.compbiolchem.2024.108212.
  • Lim, D., Van Doorsselaere, T., Nakariakov, V. M., Kolotkov, D. Y., Gao, Y., Berghmans, D. 2024. "Undersampling Effects on Observed Periods of Coronal Oscillations," Astronomy & Astrophysics, 690(L8). https://doi.org/10.1051/0004-6361/202451684.
  • Nasibov, E., Dogan, A. 2016. An Efficient Algorithm for Classification of EEG Eye State Data, Global Journal of Information Technology: Emerging Technologies, 6 (3), 158-165, https://doi.org/10.18844/gjit.v6i3.
  • Wainer, J. 2024. An Empirical Evaluation of Imbalanced Data Strategies from a Practitioner's Point of View, Expert Systems with Applications, 256, Article 124863. https://doi.org/10.1016/j.eswa.2024.124863.
  • Bach, M. 2022, New Undersampling Method Based on the kNN Approach, 26th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2022), Procedia Computer Science 207, 3397-3406, https://doi.org/10.1016/j.procs.2022.09.399
  • Kim, Y., Choi, W., Choi, W., Ko, G., Han, S., Kim, H.-C., Kim, D., Lee, D.-G., Shin, D. W., Lee, Y. 2024. A Machine Learning Approach Using Conditional Normalizing Flow to Address Extreme Class Imbalance Problems in Personal Health Record,. BioData Mining, 17(1), Article 14. https://doi.org/10.1186/s13040-024-00366-0.
  • Hancock, J. T., Wang, H., Khoshgoftaar, T. M., Liang, Q. 2024. Data Reduction Techniques for Highly Imbalanced Medicare Big Data, Journal of Big Data, 11(1), Article 8. https://doi.org/10.1186/s40537-023-00869-3.
  • Yang, C., Fridgeirsson, E. A., Kors, J. A., Reps, J. M., Rijnbeek, P. R. 2024. Impact of Random Oversampling and Random Undersampling on the Performance of Prediction Models Developed Using Observational Health Data, Journal of Big Data, 11(1), Article 7. https://doi.org/10.1186/s40537-023-00857-7.
  • Kubicka, F., Nitschke, L., Penzkofer, T., Tan, Q., Nickel, M.D., Wakonig, K.M., Fahlenkamp, U.L., Lerchbaumer, M., Michallek, F., Dommerich, S., Hamm, B., Wagner, M., Walter-Rittel, T. 2024. "Dynamic contrast enhanced MRI of the head and neck region using a VIBE sequence with Cartesian undersampling and compressed sensing," Magnetic Resonance Imaging, 113, Article 110220. https://doi.org/10.1016/j.mri.2024.110220.
There are 12 citations in total.

Details

Primary Language English
Subjects Performance Evaluation
Journal Section Research Article
Authors

Alican Doğan 0000-0002-0553-2888

Early Pub Date September 25, 2025
Publication Date September 29, 2025
Submission Date October 2, 2024
Acceptance Date November 16, 2024
Published in Issue Year 2025 Volume: 27 Issue: 81

Cite

APA Doğan, A. (2025). An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi, 27(81), 376-381. https://doi.org/10.21205/deufmd.2025278105
AMA Doğan A. An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach. DEUFMD. September 2025;27(81):376-381. doi:10.21205/deufmd.2025278105
Chicago Doğan, Alican. “An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the KNN Approach”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi 27, no. 81 (September 2025): 376-81. https://doi.org/10.21205/deufmd.2025278105.
EndNote Doğan A (September 1, 2025) An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27 81 376–381.
IEEE A. Doğan, “An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach”, DEUFMD, vol. 27, no. 81, pp. 376–381, 2025, doi: 10.21205/deufmd.2025278105.
ISNAD Doğan, Alican. “An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the KNN Approach”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27/81 (September2025), 376-381. https://doi.org/10.21205/deufmd.2025278105.
JAMA Doğan A. An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach. DEUFMD. 2025;27:376–381.
MLA Doğan, Alican. “An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the KNN Approach”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi, vol. 27, no. 81, 2025, pp. 376-81, doi:10.21205/deufmd.2025278105.
Vancouver Doğan A. An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach. DEUFMD. 2025;27(81):376-81.