An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach

Alican Doğan

doi:10.21205/deufmd.2025278105

Araştırma Makalesi

kNN Yaklaşımını Temel Alan Düzenlenmiş En Yakın Komşu Veri İndirgeme Yönteminin İyileştirilmiş Bir Versiyonu

Yıl 2025, Cilt: 27 Sayı: 81, 376 - 381, 29.09.2025

Alican Doğan

https://doi.org/10.21205/deufmd.2025278105

Öz

Makine öğrenimi alanında, dengesiz veri kümeleriyle başa çıkmak önemli bir zorluk olmaya devam etmekte olup, genellikle çeşitli örnekleme teknikleriyle ele alınmaktadır. Bu teknikler arasında, Düzeltilmiş En Yakın Komşu (ENN) alt örnekleme yöntemi, sınıflandırıcı performansını artırma ve sınıf dengesizliğini azaltma yeteneğiyle geniş çapta tanınmaktadır. Ancak geleneksel ENN yönteminin, potansiyel olarak bilgilendirici örneklerin kaldırılması ve karmaşık veri kümelerinde yetersiz performans gibi sınırlamaları vardır. Bu makale, k-Nearest Neighbors (k-NN) yaklaşımını kullanarak örnek kaldırma sürecini iyileştiren ENN alt örnekleme yönteminin geliştirilmiş bir versiyonunu sunmaktadır. Önerilen yöntem, geleneksel ENN'yi, bilgilendirici örnekleri daha iyi korurken aynı zamanda gürültüyü etkili bir şekilde azaltan k-NN algoritmasına dayalı daha gelişmiş bir komşu değerlendirme kriteri ekleyerek iyileştirmektedir. Birçok benchmark veri kümesinde yapılan kapsamlı deneylerle, geliştirilmiş ENN yöntemimizin sınıflandırma doğruluğu, F1 skoru ve AUC açısından geleneksel ENN ve diğer en son alt örnekleme tekniklerine kıyasla üstün performans sergilediğini gösteriyoruz. Sonuçlar, geliştirilmiş ENN yönteminin yalnızca sınıf dengesizliği sorununu daha etkili bir şekilde hafifletmekle kalmayıp, aynı zamanda veri bütünlüğünü daha yüksek seviyede koruduğunu ve böylece makine öğrenimi modellerinin dayanıklılığını ve güvenilirliğini artırdığını göstermektedir. Bu ilerleme, dengesiz veri kümeleriyle çalışan uygulayıcılar için değerli bir araç sunarak, daha doğru ve verimli tahmin modellerinin geliştirilmesine katkıda bulunmaktadır.

Anahtar Kelimeler

ENN , CxKNN , Veri İndirgeme , Sınıflandırma

Kaynakça

Maneerat, T., Iam-On, N., Boongoen, T., Kirimasthong, K., Naik, N., Yang, L., Shen, Q. 2025. Optimisation of Multiple Clustering-Based Undersampling Using Artificial Bee Colony: Application to Improved Detection of Obfuscated Patterns without Adversarial Training, Information Sciences, 687, Article 121407. https://doi.org/10.1016/j.ins.2024.121407
Ghasemkhani, B., Yilmaz, R., Kut, A., Birant, D., 2023, Logistic Model Tree Forest for Steel Plates Faults Prediction, Machines, 11 (7), 679, https://doi.org/10.3390/machines11070679.
Sun, P., Du, Y., Xiong, S. 2024. Nearest neighbors and density-based undersampling for imbalanced data classification with class overlap, Neurocomputing, 609, Article 128492. https://doi.org/10.1016/j.neucom.2024.128492.
Zuo, Y., Wan, M., Shen, Y., Wang, X., He, W., Bi, Y., Liu, X., Deng, Z. 2024. ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique, Computational Biology and Chemistry, 113, Article 108212. https://doi.org/10.1016/j.compbiolchem.2024.108212.
Lim, D., Van Doorsselaere, T., Nakariakov, V. M., Kolotkov, D. Y., Gao, Y., Berghmans, D. 2024. "Undersampling Effects on Observed Periods of Coronal Oscillations," Astronomy & Astrophysics, 690(L8). https://doi.org/10.1051/0004-6361/202451684.
Nasibov, E., Dogan, A. 2016. An Efficient Algorithm for Classification of EEG Eye State Data, Global Journal of Information Technology: Emerging Technologies, 6 (3), 158-165, https://doi.org/10.18844/gjit.v6i3.
Wainer, J. 2024. An Empirical Evaluation of Imbalanced Data Strategies from a Practitioner's Point of View, Expert Systems with Applications, 256, Article 124863. https://doi.org/10.1016/j.eswa.2024.124863.
Bach, M. 2022, New Undersampling Method Based on the kNN Approach, 26th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2022), Procedia Computer Science 207, 3397-3406, https://doi.org/10.1016/j.procs.2022.09.399
Kim, Y., Choi, W., Choi, W., Ko, G., Han, S., Kim, H.-C., Kim, D., Lee, D.-G., Shin, D. W., Lee, Y. 2024. A Machine Learning Approach Using Conditional Normalizing Flow to Address Extreme Class Imbalance Problems in Personal Health Record,. BioData Mining, 17(1), Article 14. https://doi.org/10.1186/s13040-024-00366-0.
Hancock, J. T., Wang, H., Khoshgoftaar, T. M., Liang, Q. 2024. Data Reduction Techniques for Highly Imbalanced Medicare Big Data, Journal of Big Data, 11(1), Article 8. https://doi.org/10.1186/s40537-023-00869-3.
Yang, C., Fridgeirsson, E. A., Kors, J. A., Reps, J. M., Rijnbeek, P. R. 2024. Impact of Random Oversampling and Random Undersampling on the Performance of Prediction Models Developed Using Observational Health Data, Journal of Big Data, 11(1), Article 7. https://doi.org/10.1186/s40537-023-00857-7.
Kubicka, F., Nitschke, L., Penzkofer, T., Tan, Q., Nickel, M.D., Wakonig, K.M., Fahlenkamp, U.L., Lerchbaumer, M., Michallek, F., Dommerich, S., Hamm, B., Wagner, M., Walter-Rittel, T. 2024. "Dynamic contrast enhanced MRI of the head and neck region using a VIBE sequence with Cartesian undersampling and compressed sensing," Magnetic Resonance Imaging, 113, Article 110220. https://doi.org/10.1016/j.mri.2024.110220.

An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach

Yıl 2025, Cilt: 27 Sayı: 81, 376 - 381, 29.09.2025

Alican Doğan

https://doi.org/10.21205/deufmd.2025278105

Öz

In the field of machine learning, handling imbalanced datasets remains a critical challenge, often addressed through various sampling techniques. Among these techniques, the Edited Nearest Neighbor (ENN) undersampling method is widely recognized for its ability to enhance classifier performance by reducing class imbalance. However, the traditional ENN method has limitations, such as the removal of potentially informative instances and suboptimal performance in complex datasets. This paper presents an improved version of the ENN undersampling method, leveraging the k-Nearest Neighbors (kNN) approach to refine the selection process for instance removal. The proposed method improves upon the traditional ENN by incorporating a more sophisticated neighbor evaluation criterion based on the k-NN algorithm, which better preserves informative instances while effectively reducing noise. Through extensive experiments on multiple benchmark datasets, we demonstrate that our improved ENN method achieves superior performance in terms of classification accuracy, F1-score, and AUC, compared to the traditional ENN and other state-of-the-art undersampling techniques. The results indicate that the improved ENN method not only mitigates the class imbalance problem more effectively but also maintains a higher level of data integrity, thereby enhancingthe robustness and reliability of machine learning models. This advancement provides a valuable tool for practitioners dealing with imbalanced datasets, contributing to the development of more accurate and efficient predictive models.

Anahtar Kelimeler

ENN , CxKNN , Undersampling , Classification

Kaynakça

Maneerat, T., Iam-On, N., Boongoen, T., Kirimasthong, K., Naik, N., Yang, L., Shen, Q. 2025. Optimisation of Multiple Clustering-Based Undersampling Using Artificial Bee Colony: Application to Improved Detection of Obfuscated Patterns without Adversarial Training, Information Sciences, 687, Article 121407. https://doi.org/10.1016/j.ins.2024.121407
Ghasemkhani, B., Yilmaz, R., Kut, A., Birant, D., 2023, Logistic Model Tree Forest for Steel Plates Faults Prediction, Machines, 11 (7), 679, https://doi.org/10.3390/machines11070679.
Sun, P., Du, Y., Xiong, S. 2024. Nearest neighbors and density-based undersampling for imbalanced data classification with class overlap, Neurocomputing, 609, Article 128492. https://doi.org/10.1016/j.neucom.2024.128492.
Zuo, Y., Wan, M., Shen, Y., Wang, X., He, W., Bi, Y., Liu, X., Deng, Z. 2024. ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique, Computational Biology and Chemistry, 113, Article 108212. https://doi.org/10.1016/j.compbiolchem.2024.108212.
Lim, D., Van Doorsselaere, T., Nakariakov, V. M., Kolotkov, D. Y., Gao, Y., Berghmans, D. 2024. "Undersampling Effects on Observed Periods of Coronal Oscillations," Astronomy & Astrophysics, 690(L8). https://doi.org/10.1051/0004-6361/202451684.
Nasibov, E., Dogan, A. 2016. An Efficient Algorithm for Classification of EEG Eye State Data, Global Journal of Information Technology: Emerging Technologies, 6 (3), 158-165, https://doi.org/10.18844/gjit.v6i3.
Wainer, J. 2024. An Empirical Evaluation of Imbalanced Data Strategies from a Practitioner's Point of View, Expert Systems with Applications, 256, Article 124863. https://doi.org/10.1016/j.eswa.2024.124863.
Bach, M. 2022, New Undersampling Method Based on the kNN Approach, 26th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2022), Procedia Computer Science 207, 3397-3406, https://doi.org/10.1016/j.procs.2022.09.399
Kim, Y., Choi, W., Choi, W., Ko, G., Han, S., Kim, H.-C., Kim, D., Lee, D.-G., Shin, D. W., Lee, Y. 2024. A Machine Learning Approach Using Conditional Normalizing Flow to Address Extreme Class Imbalance Problems in Personal Health Record,. BioData Mining, 17(1), Article 14. https://doi.org/10.1186/s13040-024-00366-0.
Hancock, J. T., Wang, H., Khoshgoftaar, T. M., Liang, Q. 2024. Data Reduction Techniques for Highly Imbalanced Medicare Big Data, Journal of Big Data, 11(1), Article 8. https://doi.org/10.1186/s40537-023-00869-3.
Yang, C., Fridgeirsson, E. A., Kors, J. A., Reps, J. M., Rijnbeek, P. R. 2024. Impact of Random Oversampling and Random Undersampling on the Performance of Prediction Models Developed Using Observational Health Data, Journal of Big Data, 11(1), Article 7. https://doi.org/10.1186/s40537-023-00857-7.
Kubicka, F., Nitschke, L., Penzkofer, T., Tan, Q., Nickel, M.D., Wakonig, K.M., Fahlenkamp, U.L., Lerchbaumer, M., Michallek, F., Dommerich, S., Hamm, B., Wagner, M., Walter-Rittel, T. 2024. "Dynamic contrast enhanced MRI of the head and neck region using a VIBE sequence with Cartesian undersampling and compressed sensing," Magnetic Resonance Imaging, 113, Article 110220. https://doi.org/10.1016/j.mri.2024.110220.

Toplam 12 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Performans Değerlendirmesi
Bölüm	Araştırma Makalesi
Yazarlar	Alican Doğan 0000-0002-0553-2888
Erken Görünüm Tarihi	25 Eylül 2025
Yayımlanma Tarihi	29 Eylül 2025
Gönderilme Tarihi	2 Ekim 2024
Kabul Tarihi	16 Kasım 2024
Yayımlandığı Sayı	Yıl 2025 Cilt: 27 Sayı: 81

Kaynak Göster

APA	Doğan, A. (2025). An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, 27(81), 376-381. https://doi.org/10.21205/deufmd.2025278105
AMA	Doğan A. An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach. DEUFMD. Eylül 2025;27(81):376-381. doi:10.21205/deufmd.2025278105
Chicago	Doğan, Alican. “An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27, sy. 81 (Eylül 2025): 376-81. https://doi.org/10.21205/deufmd.2025278105.
EndNote	Doğan A (01 Eylül 2025) An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27 81 376–381.
IEEE	A. Doğan, “An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach”, DEUFMD, c. 27, sy. 81, ss. 376–381, 2025, doi: 10.21205/deufmd.2025278105.
ISNAD	Doğan, Alican. “An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27/81 (Eylül2025), 376-381. https://doi.org/10.21205/deufmd.2025278105.
JAMA	Doğan A. An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach. DEUFMD. 2025;27:376–381.
MLA	Doğan, Alican. “An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, c. 27, sy. 81, 2025, ss. 376-81, doi:10.21205/deufmd.2025278105.
Vancouver	Doğan A. An Improved Version of Edited Nearest Neighbor Undersampling Method Based on the kNN Approach. DEUFMD. 2025;27(81):376-81.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

Dokuz Eylül Üniversitesi, Mühendislik Fakültesi Dekanlığı Tınaztepe Yerleşkesi, Adatepe Mah. Doğuş Cad. No: 207-I / 35390 Buca-İZMİR.