Research Article
BibTex RIS Cite

Sınıflar Arası Kenar Payını Genişletmek İçin Yeni Bir Örnek Seçim Algoritması

Year 2022, Volume: 5 Issue: 2, 119 - 126, 21.09.2022
https://doi.org/10.38016/jista.1033354

Abstract

Veri kümelerindeki gereksiz örneklerin atılması öğrenme sürecini kısalttığı gibi gürültülü verileri ortadan kaldırdığı için öğrenme performansını da arttırmaktadır. Örnek seçim yöntemleri, yukarıda belirtilen görevleri yerine getirmek için yaygın olarak kullanılmaktadır. Bu makalede, "Border Instances Reduction using Classes Handily (BIRCH)" adlı yeni bir denetimli örnek seçim algoritması öneriyoruz. BIRCH, her örneğin k-en yakın komşularını dikkate alarak, sadece aynı sınıftan komşuları olan, yani farklı sınıflardan komşuları olmayan örnekleri seçer. BIRCH, çeşitli alanlardan on beş veri kümesi kullanılarak biri geleneksel ve dördü son teknoloji örnek seçim algoritması ile karşılaştırılmıştır. Ampirik sonuçlar, BIRCH'in komşu sayısının ayarlanmasıyla doğruluk oranı ve azaltma oranı arasındaki dengeyi iyi sağladığını göstermektedir. Ayrıca önerilen yöntem, yüksek bir sınıflandırma doğruluğunu sağlamayı garanti eder. Önerilen algoritmanın kaynak kodu https://github.com/fatihaydin1/BIRCH web adresinde bulunabilir.

References

  • Akinyelu, A. A. and Adewumi, A. O. (2017) ‘Improved Instance Selection Methods for Support Vector Machine Speed Optimization’, Security and Communication Networks, 2017, pp. 1–11. doi: 10.1155/2017/6790975.
  • Akinyelu, A. A. and Ezugwu, A. E. (2019) ‘Nature Inspired Instance Selection Techniques for Support Vector Machine Speed Optimization’, IEEE Access, 7, pp. 154581–154599. doi: 10.1109/ACCESS.2019.2949238.
  • Alpaydin, E. (1997) ‘Voting over Multiple Condensed Nearest Neighbors’, Artificial Intelligence Review, 11(1/5), pp. 115–132. doi: 10.1023/A:1006563312922.
  • Arnaiz-González, Á. et al. (2016) ‘Instance selection of linear complexity for big data’, Knowledge-Based Systems, 107, pp. 83–95. doi: 10.1016/j.knosys.2016.05.056.
  • Aslani, M. and Seipel, S. (2020) ‘A fast instance selection method for support vector machines in building extraction’, Applied Soft Computing, 97, p. 106716. doi: 10.1016/j.asoc.2020.106716.
  • Aslani, M. and Seipel, S. (2021) ‘Efficient and decision boundary aware instance selection for support vector machines’, Information Sciences, 577, pp. 579–598. doi: 10.1016/j.ins.2021.07.015.
  • Cover, T. and Hart, P. (1967) ‘Nearest neighbor pattern classification’, IEEE Transactions on Information Theory, 13(1), pp. 21–27. doi: 10.1109/TIT.1967.1053964.
  • García-Pedrajas, N. (2011) ‘Evolutionary computation for training set selection’, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(6), pp. 512–523. doi: 10.1002/widm.44.
  • Garcia, S. et al. (2012) ‘Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), pp. 417–435. doi: 10.1109/TPAMI.2011.142.
  • Hart, P. (1968) ‘The condensed nearest neighbor rule (Corresp.)’, IEEE Transactions on Information Theory, 14(3), pp. 515–516. doi: 10.1109/TIT.1968.1054155.
  • Liu, C. et al. (2017) ‘An efficient instance selection algorithm to reconstruct training set for support vector machine’, Knowledge-Based Systems, 116, pp. 58–73. doi: 10.1016/j.knosys.2016.10.031.
  • Olvera-López, J. A. et al. (2010) ‘A review of instance selection methods’, Artificial Intelligence Review, 34(2), pp. 133–143. doi: 10.1007/s10462-010-9165-y.
  • Rico-Juan, J. R., Valero-Mas, J. J. and Calvo-Zaragoza, J. (2019) ‘Extensions to rank-based prototype selection in k-Nearest Neighbour classification’, Applied Soft Computing, 85, p. 105803. doi: 10.1016/j.asoc.2019.105803.
  • Ruiz, I. L. and Gómez-Nieto, M. Á. (2020) ‘Prototype Selection Method Based on the Rivality and Reliability Indexes for the Improvement of the Classification Models and External Predictions’, Journal of Chemical Information and Modeling, 60(6), pp. 3009–3021. doi: 10.1021/acs.jcim.0c00176.
  • Sun, X. et al. (2019) ‘Fast Data Reduction With Granulation-Based Instances Importance Labeling’, IEEE Access, 7, pp. 33587–33597. doi: 10.1109/ACCESS.2018.2889122.
  • Susheela Devi, V. and Murty, M. N. (2002) ‘An incremental prototype set building technique’, Pattern Recognition, 35(2), pp. 505–513. doi: 10.1016/S0031-3203(00)00184-9.
  • Wang, Z., Tsai, C.-F. and Lin, W.-C. (2021) ‘Data cleaning issues in class imbalanced datasets: instance selection and missing values imputation for one-class classifiers’, Data Technologies and Applications, ahead-of-p(ahead-of-print). doi: 10.1108/DTA-01-2021-0027.
  • Wilson, D. L. (1972) ‘Asymptotic Properties of Nearest Neighbor Rules Using Edited Data’, IEEE Transactions on Systems, Man, and Cybernetics, SMC-2(3), pp. 408–421. doi: 10.1109/TSMC.1972.4309137.
  • Wilson, D. R. and Martinez, T. R. (2000) ‘Reduction techniques for instance-based learning algorithms’, Machine Learning, 38, pp. 257–286.
  • Yang, L. et al. (2019) ‘Constraint nearest neighbor for instance reduction’, Soft Computing, 23(24), pp. 13235–13245. doi: 10.1007/s00500-019-03865-z.

A New Instance Selection Method for Enlarging Margins Between Classes

Year 2022, Volume: 5 Issue: 2, 119 - 126, 21.09.2022
https://doi.org/10.38016/jista.1033354

Abstract

As discarding superfluous instances in data sets shortens the learning process, it also increases learning performance because of eliminating noisy data. Instance selection methods are commonly utilized to undertake the abovementioned tasks. In this paper, we propose a new supervised instance selection algorithm called Border Instances Reduction using Classes Handily (BIRCH). BIRCH considers k-nearest neighbors of each instance and selects instances that have neighbors from the only same class, namely, but not having neighbors from the different classes. It has been compared with one traditional and four state-of-the-art instance selection algorithms by using fifteen data sets from various domains. The empirical results show BIRCH well delivers the trade-off between accuracy rate and reduction rate by tuning the number of neighbors. Furthermore, the proposed method guarantees to yield a high classification accuracy. The source code of the proposed algorithm can be found in https://github.com/fatihaydin1/BIRCH.

References

  • Akinyelu, A. A. and Adewumi, A. O. (2017) ‘Improved Instance Selection Methods for Support Vector Machine Speed Optimization’, Security and Communication Networks, 2017, pp. 1–11. doi: 10.1155/2017/6790975.
  • Akinyelu, A. A. and Ezugwu, A. E. (2019) ‘Nature Inspired Instance Selection Techniques for Support Vector Machine Speed Optimization’, IEEE Access, 7, pp. 154581–154599. doi: 10.1109/ACCESS.2019.2949238.
  • Alpaydin, E. (1997) ‘Voting over Multiple Condensed Nearest Neighbors’, Artificial Intelligence Review, 11(1/5), pp. 115–132. doi: 10.1023/A:1006563312922.
  • Arnaiz-González, Á. et al. (2016) ‘Instance selection of linear complexity for big data’, Knowledge-Based Systems, 107, pp. 83–95. doi: 10.1016/j.knosys.2016.05.056.
  • Aslani, M. and Seipel, S. (2020) ‘A fast instance selection method for support vector machines in building extraction’, Applied Soft Computing, 97, p. 106716. doi: 10.1016/j.asoc.2020.106716.
  • Aslani, M. and Seipel, S. (2021) ‘Efficient and decision boundary aware instance selection for support vector machines’, Information Sciences, 577, pp. 579–598. doi: 10.1016/j.ins.2021.07.015.
  • Cover, T. and Hart, P. (1967) ‘Nearest neighbor pattern classification’, IEEE Transactions on Information Theory, 13(1), pp. 21–27. doi: 10.1109/TIT.1967.1053964.
  • García-Pedrajas, N. (2011) ‘Evolutionary computation for training set selection’, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(6), pp. 512–523. doi: 10.1002/widm.44.
  • Garcia, S. et al. (2012) ‘Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), pp. 417–435. doi: 10.1109/TPAMI.2011.142.
  • Hart, P. (1968) ‘The condensed nearest neighbor rule (Corresp.)’, IEEE Transactions on Information Theory, 14(3), pp. 515–516. doi: 10.1109/TIT.1968.1054155.
  • Liu, C. et al. (2017) ‘An efficient instance selection algorithm to reconstruct training set for support vector machine’, Knowledge-Based Systems, 116, pp. 58–73. doi: 10.1016/j.knosys.2016.10.031.
  • Olvera-López, J. A. et al. (2010) ‘A review of instance selection methods’, Artificial Intelligence Review, 34(2), pp. 133–143. doi: 10.1007/s10462-010-9165-y.
  • Rico-Juan, J. R., Valero-Mas, J. J. and Calvo-Zaragoza, J. (2019) ‘Extensions to rank-based prototype selection in k-Nearest Neighbour classification’, Applied Soft Computing, 85, p. 105803. doi: 10.1016/j.asoc.2019.105803.
  • Ruiz, I. L. and Gómez-Nieto, M. Á. (2020) ‘Prototype Selection Method Based on the Rivality and Reliability Indexes for the Improvement of the Classification Models and External Predictions’, Journal of Chemical Information and Modeling, 60(6), pp. 3009–3021. doi: 10.1021/acs.jcim.0c00176.
  • Sun, X. et al. (2019) ‘Fast Data Reduction With Granulation-Based Instances Importance Labeling’, IEEE Access, 7, pp. 33587–33597. doi: 10.1109/ACCESS.2018.2889122.
  • Susheela Devi, V. and Murty, M. N. (2002) ‘An incremental prototype set building technique’, Pattern Recognition, 35(2), pp. 505–513. doi: 10.1016/S0031-3203(00)00184-9.
  • Wang, Z., Tsai, C.-F. and Lin, W.-C. (2021) ‘Data cleaning issues in class imbalanced datasets: instance selection and missing values imputation for one-class classifiers’, Data Technologies and Applications, ahead-of-p(ahead-of-print). doi: 10.1108/DTA-01-2021-0027.
  • Wilson, D. L. (1972) ‘Asymptotic Properties of Nearest Neighbor Rules Using Edited Data’, IEEE Transactions on Systems, Man, and Cybernetics, SMC-2(3), pp. 408–421. doi: 10.1109/TSMC.1972.4309137.
  • Wilson, D. R. and Martinez, T. R. (2000) ‘Reduction techniques for instance-based learning algorithms’, Machine Learning, 38, pp. 257–286.
  • Yang, L. et al. (2019) ‘Constraint nearest neighbor for instance reduction’, Soft Computing, 23(24), pp. 13235–13245. doi: 10.1007/s00500-019-03865-z.
There are 20 citations in total.

Details

Primary Language English
Subjects Artificial Intelligence
Journal Section Research Articles
Authors

Fatih Aydın 0000-0001-9679-0403

Early Pub Date June 14, 2022
Publication Date September 21, 2022
Submission Date December 6, 2021
Published in Issue Year 2022 Volume: 5 Issue: 2

Cite

APA Aydın, F. (2022). A New Instance Selection Method for Enlarging Margins Between Classes. Journal of Intelligent Systems: Theory and Applications, 5(2), 119-126. https://doi.org/10.38016/jista.1033354
AMA Aydın F. A New Instance Selection Method for Enlarging Margins Between Classes. JISTA. September 2022;5(2):119-126. doi:10.38016/jista.1033354
Chicago Aydın, Fatih. “A New Instance Selection Method for Enlarging Margins Between Classes”. Journal of Intelligent Systems: Theory and Applications 5, no. 2 (September 2022): 119-26. https://doi.org/10.38016/jista.1033354.
EndNote Aydın F (September 1, 2022) A New Instance Selection Method for Enlarging Margins Between Classes. Journal of Intelligent Systems: Theory and Applications 5 2 119–126.
IEEE F. Aydın, “A New Instance Selection Method for Enlarging Margins Between Classes”, JISTA, vol. 5, no. 2, pp. 119–126, 2022, doi: 10.38016/jista.1033354.
ISNAD Aydın, Fatih. “A New Instance Selection Method for Enlarging Margins Between Classes”. Journal of Intelligent Systems: Theory and Applications 5/2 (September 2022), 119-126. https://doi.org/10.38016/jista.1033354.
JAMA Aydın F. A New Instance Selection Method for Enlarging Margins Between Classes. JISTA. 2022;5:119–126.
MLA Aydın, Fatih. “A New Instance Selection Method for Enlarging Margins Between Classes”. Journal of Intelligent Systems: Theory and Applications, vol. 5, no. 2, 2022, pp. 119-26, doi:10.38016/jista.1033354.
Vancouver Aydın F. A New Instance Selection Method for Enlarging Margins Between Classes. JISTA. 2022;5(2):119-26.

Journal of Intelligent Systems: Theory and Applications