Research Article
BibTex RIS Cite

Veri sınıflandırma yöntemine dayalı yeni bir alt örnekleme yöntemi

Year 2024, Volume: 26 Issue: 2, 518 - 526, 15.07.2024
https://doi.org/10.25092/baunfbed.1447440

Abstract

Veri madenciliği literatürdeki en önemli araştırma alanlarından biridir. Teknolojik gelişmelerle doğru orantılı olarak artan veri hacmi nedeniyle bu alanda yapılan araştırmaların sayısı da hızla artmaktadır. Veri madenciliğinin amacı, makine öğrenimi tekniklerinden yararlanarak çeşitli tahminlerde bulunmak ve ham verilerden bilgi elde etmektir. Makine öğrenmesi tekniklerinde kullanılan veri kümelerinin yapısal özellikleri ve sınıf dağılımları algoritmaların performanslarını önemli ölçüde etkilemektedir. Bu çalışmada amacımız, dengesiz ikili veri kümelerini, çokyüzlü konik fonksiyonların kullanıldığı bir sınıflandırma yöntemini içeren yeni bir alt örnekleme yaklaşımıyla dengelemektir.

References

  • Ayoub, S., Gulzar, Y., Rustamov, J., Jabbari, A., Reegu, F.A. and Turaev, S., Adversarial Approaches to Tackle Imbalanced Data in Machine Learning, Sustainability, 15, 7097, (2023).
  • Raghuwanshi, B.S. and Shukla, S., Class-Specific Extreme Learning Machine for Handling Binary Class Imbalance Problem, Neural Networks, 105, 206–217, (2018).
  • Mohammed R., Rawashdeh J. and Abdullah M., Machine learning with oversampling and undersampling techniques: Overview study and experimental results, 2020 11th International Conference on Information and Communication Systems (ICICS), 243-248, Irbid, Jordan, (2020).
  • Hoyos-Osorio J., Alvarez-Meza A., G. Daza-Santacoloma, Orozco-Gutierrez A. and Castellanos-Dominguez G., Relevant information undersampling to support imbalanced data classification, Neurocomputing, 436, 136-146, (2021).
  • Sun Z., Song Q., Zhu X., Sun H., Xu B. and Zhou Y., A novel ensemble method for classifying imbalanced data, Pattern Recognition, 48, 5, 1623-1637, (2015).
  • Barandela R., Valdovinos R.M. and Sánchez J.S., New applications of ensembles of classifiers, Pattern Analysis & Applications, 6, 3, 245-256, (2003).
  • Seiffert C., Khoshgoftaar T.M., Van Hulse J. and Napolitano A., Rusboost: a hybrid approach to alleviating class imbalance, IEEE Transactions on Systems, Man Cybernetics-Part A: Systems and Humans, 40, 1, 185-197, (2010).
  • Mostafaei, S.H. and Tanha, J., OUBoost: boosting based over and under sampling technique for handling imbalanced data. International Journal of Machine Learning and Cybernetics, 14, 3393–3411 (2023).
  • Dai Q., Liu J. and Shi Y., Class-overlap undersampling based on Schur decomposition for Class-imbalance problems, Expert Systems with Applications: An International Journal, 221, C, 119735, (2023).
  • Gasimov R.N., and Öztürk G., Separation via polyhedral conic functions. Optimization Methods and Software, 21, 4, 527-540, (2006).
  • Uylas N., Methods based on mathematical optimization for data classification. PhD, Ege University, İzmir, Turkey, (2013).
  • Uylas Sati N., A binary classification algorithm based on polyhedral conic functions, Düzce University Journal of Science and Technology, 3, 152-161, (2015).
  • Öztürk G., and Çitfçi M., Clustering based polyhedral conic functions algorithm in classification, Journal of Industrial and Management Optimization, 11, 3, 921-932, (2015).
  • Uylas Sati N. and Ordin B., Application of the polyhedral conic functions method in the text classification and comparative analysis, Scientific Programming, vol. 2018, Article ID 5349284, 11 pages, (2018).
  • Acar M. and Kasimbeyli R., A polyhedral conic functions based classification method for noisy data, Journal of Industrial and Management Optimization, 17, 6, 3493-3508, (2021).
  • Çevikalp H. and Saglamlar H., Polyhedral conic classifiers for computer vision applications and open set recognition, IEEE Transactions on pattern analysis and machine intelligence, 43, 2, 608-622, (2021).
  • Cevikalp H., Uzun B., Köpüklü O. and Ozturk G., Deep compact polyhedral conic classifier for open and closed set recognition, Pattern Recognition, 119, 108080, ISSN 0031-3203, (2021).
  • Skoog D.A., West D.M., Holler F.J. and Crouch S.R., Fundamentals of Analytical Chemistry, Nelson Education, (2013).
  • Szeghalmy, S. and Fazekas, A., A Comparative study of the use of stratified cross-validation and distribution-balanced stratified cross-validation in imbalanced learning, Sensors, 23, 4, 2333, (2023).
  • Quinonero-Candela, J., Sugiyama, M., Schwaighofer, A. and Lawrence, N.D., Dataset shift in machine learning, MIT Press: Cambridge, MA, USA, (2022).
  • Dua D, and Graff C., UCI Machine Learning Repository 2019. https://archive.ics.uci.edu/, (12.02.2024).

A novel undersampling method based on data classification method

Year 2024, Volume: 26 Issue: 2, 518 - 526, 15.07.2024
https://doi.org/10.25092/baunfbed.1447440

Abstract

Data mining is one of the most important research area in literature. Due to the increasing volume of data, which is directly proportional to technological advancements, the number of researches in this field is growing rapidly. The goal of data mining is to extract various insights and obtain information from raw data by leveraging machine learning techniques. The structural characteristics and also class distributions of the datasets used in machine learning techniques significantly affect the performances of the algorithms. In this study, our aim is balancing the imbalanced binary dataset, used in the machine learning techniques, with an undersampling approach including a classification method via polyhedral conic functions.

References

  • Ayoub, S., Gulzar, Y., Rustamov, J., Jabbari, A., Reegu, F.A. and Turaev, S., Adversarial Approaches to Tackle Imbalanced Data in Machine Learning, Sustainability, 15, 7097, (2023).
  • Raghuwanshi, B.S. and Shukla, S., Class-Specific Extreme Learning Machine for Handling Binary Class Imbalance Problem, Neural Networks, 105, 206–217, (2018).
  • Mohammed R., Rawashdeh J. and Abdullah M., Machine learning with oversampling and undersampling techniques: Overview study and experimental results, 2020 11th International Conference on Information and Communication Systems (ICICS), 243-248, Irbid, Jordan, (2020).
  • Hoyos-Osorio J., Alvarez-Meza A., G. Daza-Santacoloma, Orozco-Gutierrez A. and Castellanos-Dominguez G., Relevant information undersampling to support imbalanced data classification, Neurocomputing, 436, 136-146, (2021).
  • Sun Z., Song Q., Zhu X., Sun H., Xu B. and Zhou Y., A novel ensemble method for classifying imbalanced data, Pattern Recognition, 48, 5, 1623-1637, (2015).
  • Barandela R., Valdovinos R.M. and Sánchez J.S., New applications of ensembles of classifiers, Pattern Analysis & Applications, 6, 3, 245-256, (2003).
  • Seiffert C., Khoshgoftaar T.M., Van Hulse J. and Napolitano A., Rusboost: a hybrid approach to alleviating class imbalance, IEEE Transactions on Systems, Man Cybernetics-Part A: Systems and Humans, 40, 1, 185-197, (2010).
  • Mostafaei, S.H. and Tanha, J., OUBoost: boosting based over and under sampling technique for handling imbalanced data. International Journal of Machine Learning and Cybernetics, 14, 3393–3411 (2023).
  • Dai Q., Liu J. and Shi Y., Class-overlap undersampling based on Schur decomposition for Class-imbalance problems, Expert Systems with Applications: An International Journal, 221, C, 119735, (2023).
  • Gasimov R.N., and Öztürk G., Separation via polyhedral conic functions. Optimization Methods and Software, 21, 4, 527-540, (2006).
  • Uylas N., Methods based on mathematical optimization for data classification. PhD, Ege University, İzmir, Turkey, (2013).
  • Uylas Sati N., A binary classification algorithm based on polyhedral conic functions, Düzce University Journal of Science and Technology, 3, 152-161, (2015).
  • Öztürk G., and Çitfçi M., Clustering based polyhedral conic functions algorithm in classification, Journal of Industrial and Management Optimization, 11, 3, 921-932, (2015).
  • Uylas Sati N. and Ordin B., Application of the polyhedral conic functions method in the text classification and comparative analysis, Scientific Programming, vol. 2018, Article ID 5349284, 11 pages, (2018).
  • Acar M. and Kasimbeyli R., A polyhedral conic functions based classification method for noisy data, Journal of Industrial and Management Optimization, 17, 6, 3493-3508, (2021).
  • Çevikalp H. and Saglamlar H., Polyhedral conic classifiers for computer vision applications and open set recognition, IEEE Transactions on pattern analysis and machine intelligence, 43, 2, 608-622, (2021).
  • Cevikalp H., Uzun B., Köpüklü O. and Ozturk G., Deep compact polyhedral conic classifier for open and closed set recognition, Pattern Recognition, 119, 108080, ISSN 0031-3203, (2021).
  • Skoog D.A., West D.M., Holler F.J. and Crouch S.R., Fundamentals of Analytical Chemistry, Nelson Education, (2013).
  • Szeghalmy, S. and Fazekas, A., A Comparative study of the use of stratified cross-validation and distribution-balanced stratified cross-validation in imbalanced learning, Sensors, 23, 4, 2333, (2023).
  • Quinonero-Candela, J., Sugiyama, M., Schwaighofer, A. and Lawrence, N.D., Dataset shift in machine learning, MIT Press: Cambridge, MA, USA, (2022).
  • Dua D, and Graff C., UCI Machine Learning Repository 2019. https://archive.ics.uci.edu/, (12.02.2024).
There are 21 citations in total.

Details

Primary Language English
Subjects Numerical Computation and Mathematical Software, Large and Complex Data Theory, Mathematical Optimisation
Journal Section Research Articles
Authors

Nur Uylaş Satı 0000-0003-1553-9466

Early Pub Date July 14, 2024
Publication Date July 15, 2024
Submission Date March 5, 2024
Acceptance Date June 6, 2024
Published in Issue Year 2024 Volume: 26 Issue: 2

Cite

APA Uylaş Satı, N. (2024). A novel undersampling method based on data classification method. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 26(2), 518-526. https://doi.org/10.25092/baunfbed.1447440
AMA Uylaş Satı N. A novel undersampling method based on data classification method. BAUN Fen. Bil. Enst. Dergisi. July 2024;26(2):518-526. doi:10.25092/baunfbed.1447440
Chicago Uylaş Satı, Nur. “A Novel Undersampling Method Based on Data Classification Method”. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi 26, no. 2 (July 2024): 518-26. https://doi.org/10.25092/baunfbed.1447440.
EndNote Uylaş Satı N (July 1, 2024) A novel undersampling method based on data classification method. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi 26 2 518–526.
IEEE N. Uylaş Satı, “A novel undersampling method based on data classification method”, BAUN Fen. Bil. Enst. Dergisi, vol. 26, no. 2, pp. 518–526, 2024, doi: 10.25092/baunfbed.1447440.
ISNAD Uylaş Satı, Nur. “A Novel Undersampling Method Based on Data Classification Method”. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi 26/2 (July 2024), 518-526. https://doi.org/10.25092/baunfbed.1447440.
JAMA Uylaş Satı N. A novel undersampling method based on data classification method. BAUN Fen. Bil. Enst. Dergisi. 2024;26:518–526.
MLA Uylaş Satı, Nur. “A Novel Undersampling Method Based on Data Classification Method”. Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 26, no. 2, 2024, pp. 518-26, doi:10.25092/baunfbed.1447440.
Vancouver Uylaş Satı N. A novel undersampling method based on data classification method. BAUN Fen. Bil. Enst. Dergisi. 2024;26(2):518-26.