TY - JOUR T1 - A novel undersampling method based on data classification method TT - Veri sınıflandırma yöntemine dayalı yeni bir alt örnekleme yöntemi AU - Uylaş Satı, Nur PY - 2024 DA - July Y2 - 2024 DO - 10.25092/baunfbed.1447440 JF - Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi JO - BAUN Fen. Bil. Enst. Dergisi PB - Balıkesir Üniversitesi WT - DergiPark SN - 1301-7985 SP - 518 EP - 526 VL - 26 IS - 2 LA - en AB - Data mining is one of the most important research area in literature. Due to the increasing volume of data, which is directly proportional to technological advancements, the number of researches in this field is growing rapidly. The goal of data mining is to extract various insights and obtain information from raw data by leveraging machine learning techniques. The structural characteristics and also class distributions of the datasets used in machine learning techniques significantly affect the performances of the algorithms. In this study, our aim is balancing the imbalanced binary dataset, used in the machine learning techniques, with an undersampling approach including a classification method via polyhedral conic functions. KW - Data mining KW - machine learning KW - undersamling KW - polyhedral conic functions N2 - Veri madenciliği literatürdeki en önemli araştırma alanlarından biridir. Teknolojik gelişmelerle doğru orantılı olarak artan veri hacmi nedeniyle bu alanda yapılan araştırmaların sayısı da hızla artmaktadır. Veri madenciliğinin amacı, makine öğrenimi tekniklerinden yararlanarak çeşitli tahminlerde bulunmak ve ham verilerden bilgi elde etmektir. Makine öğrenmesi tekniklerinde kullanılan veri kümelerinin yapısal özellikleri ve sınıf dağılımları algoritmaların performanslarını önemli ölçüde etkilemektedir. Bu çalışmada amacımız, dengesiz ikili veri kümelerini, çokyüzlü konik fonksiyonların kullanıldığı bir sınıflandırma yöntemini içeren yeni bir alt örnekleme yaklaşımıyla dengelemektir. CR - Ayoub, S., Gulzar, Y., Rustamov, J., Jabbari, A., Reegu, F.A. and Turaev, S., Adversarial Approaches to Tackle Imbalanced Data in Machine Learning, Sustainability, 15, 7097, (2023). CR - Raghuwanshi, B.S. and Shukla, S., Class-Specific Extreme Learning Machine for Handling Binary Class Imbalance Problem, Neural Networks, 105, 206–217, (2018). CR - Mohammed R., Rawashdeh J. and Abdullah M., Machine learning with oversampling and undersampling techniques: Overview study and experimental results, 2020 11th International Conference on Information and Communication Systems (ICICS), 243-248, Irbid, Jordan, (2020). CR - Hoyos-Osorio J., Alvarez-Meza A., G. Daza-Santacoloma, Orozco-Gutierrez A. and Castellanos-Dominguez G., Relevant information undersampling to support imbalanced data classification, Neurocomputing, 436, 136-146, (2021). CR - Sun Z., Song Q., Zhu X., Sun H., Xu B. and Zhou Y., A novel ensemble method for classifying imbalanced data, Pattern Recognition, 48, 5, 1623-1637, (2015). CR - Barandela R., Valdovinos R.M. and Sánchez J.S., New applications of ensembles of classifiers, Pattern Analysis & Applications, 6, 3, 245-256, (2003). CR - Seiffert C., Khoshgoftaar T.M., Van Hulse J. and Napolitano A., Rusboost: a hybrid approach to alleviating class imbalance, IEEE Transactions on Systems, Man Cybernetics-Part A: Systems and Humans, 40, 1, 185-197, (2010). CR - Mostafaei, S.H. and Tanha, J., OUBoost: boosting based over and under sampling technique for handling imbalanced data. International Journal of Machine Learning and Cybernetics, 14, 3393–3411 (2023). CR - Dai Q., Liu J. and Shi Y., Class-overlap undersampling based on Schur decomposition for Class-imbalance problems, Expert Systems with Applications: An International Journal, 221, C, 119735, (2023). CR - Gasimov R.N., and Öztürk G., Separation via polyhedral conic functions. Optimization Methods and Software, 21, 4, 527-540, (2006). CR - Uylas N., Methods based on mathematical optimization for data classification. PhD, Ege University, İzmir, Turkey, (2013). CR - Uylas Sati N., A binary classification algorithm based on polyhedral conic functions, Düzce University Journal of Science and Technology, 3, 152-161, (2015). CR - Öztürk G., and Çitfçi M., Clustering based polyhedral conic functions algorithm in classification, Journal of Industrial and Management Optimization, 11, 3, 921-932, (2015). CR - Uylas Sati N. and Ordin B., Application of the polyhedral conic functions method in the text classification and comparative analysis, Scientific Programming, vol. 2018, Article ID 5349284, 11 pages, (2018). CR - Acar M. and Kasimbeyli R., A polyhedral conic functions based classification method for noisy data, Journal of Industrial and Management Optimization, 17, 6, 3493-3508, (2021). CR - Çevikalp H. and Saglamlar H., Polyhedral conic classifiers for computer vision applications and open set recognition, IEEE Transactions on pattern analysis and machine intelligence, 43, 2, 608-622, (2021). CR - Cevikalp H., Uzun B., Köpüklü O. and Ozturk G., Deep compact polyhedral conic classifier for open and closed set recognition, Pattern Recognition, 119, 108080, ISSN 0031-3203, (2021). CR - Skoog D.A., West D.M., Holler F.J. and Crouch S.R., Fundamentals of Analytical Chemistry, Nelson Education, (2013). CR - Szeghalmy, S. and Fazekas, A., A Comparative study of the use of stratified cross-validation and distribution-balanced stratified cross-validation in imbalanced learning, Sensors, 23, 4, 2333, (2023). CR - Quinonero-Candela, J., Sugiyama, M., Schwaighofer, A. and Lawrence, N.D., Dataset shift in machine learning, MIT Press: Cambridge, MA, USA, (2022). CR - Dua D, and Graff C., UCI Machine Learning Repository 2019. https://archive.ics.uci.edu/, (12.02.2024). UR - https://doi.org/10.25092/baunfbed.1447440 L1 - https://dergipark.org.tr/tr/download/article-file/3773945 ER -