TY - JOUR T1 - The Efficiency of Regularization Method on Model Success in Issue Type Prediction Problem TT - Sorun Türü Tahmini Probleminde Düzenlileştirme Yönteminin Model Başarısı Üzerindeki Etkisi AU - Alsaç, Ali AU - Yenisey, Mehmet Mutlu AU - Ganiz, Murat Can AU - Dağtekin, Mustafa AU - Ulusinan, Taner PY - 2023 DA - December Y2 - 2023 DO - 10.26650/acin.1394019 JF - Acta Infologica JO - ACIN PB - Istanbul University WT - DergiPark SN - 2602-3563 SP - 360 EP - 383 VL - 7 IS - 2 LA - en AB - Designing a prediction method with machine learning algorithms and increasing the prediction success is one of the most important research areas and aims of today. Models designed using classification algorithms are frequently used especially in problem types that require prediction. In this study, real life data is used to answer the question of which problem type should be included in the Information Technology Service Management (ITSM) system. An important step in the search for a solution is to examine the dataset with regularization methods. Experimental results have been obtained to establish the overfitting or underfitting balance of the dataset with L1 and L2 regularization methods. While the Root-Mean-Square Error (RMSE) value was approximately 0.13 in the regression model without regularization, this value was found to be approximately 0.083 after L1 regularization.With the regularized dataset, new results were obtained using Artificial Neural Network (ANN), Logistic Regression (LR), Support Vector Machine (SVM) classifier algorithms. SVM algorithm was the most successful model with a performance of approximately 0.73. It is followed by LR and ANN respectively. Accuracy, Precision, Recall and F1Score were used as evaluation metrics. It is seen that the use of regularization methods, especially in the preparation of real-life data for use in machine learning or other artificial intelligence research, will contribute to increasing the success level of the model. KW - IT service management KW - regularization KW - prediction KW - classification N2 - Matematik düzleminde bir tahmin yöntemi tasarlamak ve başarılı sonuçlarından faydalanmak günümüzün önemli araştırma alanlarından ve amaçlarından biri olarak öne çıkmaktadır. Sınıflandırma algoritmaları kullanılarak tasarlanan modeller özellikle tahmin gerektiren problem türlerinde sıklıkla kullanılmaktadır. Çalışmada gerçek hayat verileri kullanılarak bir gerçek hayat problemi olan müşteriden gelen çözüm talebinin Bilgi Teknolojisi Hizmet Yönetimi (BTHY) sistemi içinde hangi sorun tipine dahil edilmesi gerektiği sorusuna cevap aranmaktadır. Çözüm arayışının önemli bir aşamasında veri kümesinin Regülarizasyon yöntemleri ile incelenmesi yer almaktadır. L1 ve L2 regülarizasyon yöntemleri ile veri kümesinin overfitting ya daunderfitting dengesinin kurulması için deneysel sonuçlar alınmıştır. Regülarizasyon uygulanmamış regresyon modelinde Kök Ortalama Kare Hatası (RMSE) değeri yaklaşık olarak 0,13 iken L1 regülarizasyonu sonucunda bu değer yaklaşık 0,083 olarak bulunmuştur. Düzenlileştirilmiş veri kümesi ile Yapay Sinir Ağları (YSA), Lojistik Regresyon (LR), Destek Vektör Makinaları (DVM) sınıflandırıcı algoritmaları kullanılarak yeni sonuçlar elde edilmiştir. DVM algoritması yaklaşık 0,73 başarım sonucu ile en başarılı model olmuştur. Sırasıyla LR ve YSA takip etmektedir. Değerlendirme metrikleri olarak Accuracy, Precision, Recall ve F1Score kullanılmıştır. Özellikle gerçek hayat verilerinin makina öğrenmesi ya da diğer yapay zeka araştırmalarında kullanımı için hazırlanması aşamasında Regülarizasyon yöntemlerinden faydalanmanın modelin başarı düzeyinin artmasında katkısı olacağı görülmektedir. CR - ALAN, A., & KARABATAK, M. (2020). Veri Seti - Sınıflandırma İlişkisinde Performansa Etki Eden Faktörlerin Değerlendirilmesi. Fırat Üniversitesi Mühendislik Bilimleri Dergisi, 32(2). https://doi.org/10.35234/fumbd.738007 google scholar CR - Anderson D, M. G. (1992). Artificial Neural Networks Technology. Kaman Sciences Corporation, 258(6). google scholar CR - Aran, O., Yildiz, O. T., & Alpaydin, E. (2009). An incremental framework based on cross-validation for estimating the architecture of a multilayer perceptron. International Journal of Pattern Recognition and Artificial Intelligence, 23(2). https://doi.org/10.1142/S0218001409007132 google scholar CR - ARSLAN, H., ÜNEŞ, F., DEMİRCİ, M., TAŞAR, B., & YILMAZ, A. (2020). Keban Baraj Gölü Seviye Değişiminin ANFIS ve Destek Vektör Makineleri ile Tahmini. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 3(2). https://doi.org/10.47495/okufbed.748018 google scholar CR - Bharambe, Prof. P., Bagul, B., Dandekar, S., & Ingle, P. (2022). Used Car Price Prediction using Different Machine Learning Algorithms. International Journal for Research in Applied Science and Engineering Technology, 10(4). https://doi.org/10.22214/yraset.2022.41300 google scholar CR - Bhattacharya, P., Neamtiu, I., & Shelton, C. R. (2012). Automated, highly-accurate, bug assignment using machine learning and tossing graphs. Journal of Systems and Software, 85(10). https://doi.org/10.1016/j.jss.2012.04.053 google scholar CR - ÇELİK, E., DAL, D., & AYDİN, T. (2021). Duygu Analizi İçin Veri Madenciliği Sınıflandırma Algoritmalarının Karşılaştırılması. European Journal of Science and Technology. https://doi.org/10.31590/ejosat.905259 google scholar CR - Cook, D., Dixon, P., Duckworth, W. M., Kaiser, M. S., Koehler, K., Meeker, W. Q., & Stephenson, W. R. (2001). Binary Response and Logistic Regression Analysis. Project Beyond Traditional Statistical Methods, Ml. google scholar CR - Cortes, C., & Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3). https://doi.org/10.1023/A:1022627411411 google scholar CR - Dangeti, P. (2017). Statistics for Machine Learning: Techniques for exploring supervised, unsupervised, and reinforcement learning models with Python and R. In Packt Publishing. google scholar CR - Deloitte, & TUBISAD. (2022). Bilgi ve İletişim Teknolojileri Sektörü 2021 Pazar Verileri. google scholar CR - Doğan, C. (2021). İstatistiksel ve Makine Öğrenme ile Derin Sinir Ağlarında Hiper-Parametre Seçimi İçin Melez Yaklaşım [Yüksek Lisans]. Hacettepe Üniversitesi. google scholar CR - Domingos, P. (2000). A Unified Bias-Variance Decomposition. Aaai/Iaai. google scholar CR - Emmert-Streib, F., & Dehmer, M. (2019). High-Dimensional LASSO-Based Computational Regression Models: Regularization, Shrinkage, and Selection. In Machine Learning and Knowledge Extraction (Vol. 1, Issue 1). https://doi.org/10.3390/make1010021 google scholar CR - Friedrich, S., Groll, A., Ickstadt, K., Kneib, T., Pauly, M., Rahnenführer, J., & Friede, T. (2023). Regularization approaches in clin-ical biostatistics: A review of methods and their applications. In Statistical Methods in Medical Research (Vol. 32, Issue 2). https://doi.org/10.1177/09622802221133557 google scholar CR - Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural Networks and the Bias/Variance Dilemma. Neural Computation, 4(1). https://doi.org/10.1162/neco.1992.4.1.1 google scholar CR - Golam Kibria, B. M., & Banik, S. (2016). Some ridge regression estimators and their performances. Journal of Modern Applied Statistical Methods, 15(1). https://doi.org/10.22237/jmasm/1462075860 google scholar CR - Goldberg, N., & Eckstein, J. (2012). Sparse weighted voting classifier selection and its linear programming relaxations. Information Processing Letters, 112(12). https://doi.org/10.1016/j.ipl.2012.03.004 google scholar CR - Ha, J., Kambe, M., & Pe, J. (2011). Data Mining: Concepts and Techniques. In Data Mining: Concepts and Techniques. https://doi.org/10.1016/C2009-0-61819-5 google scholar CR - Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2010). Multivariate Data Analysis. In Vectors. https://doi.org/10.1016/j.ypharm.2011.02.019 google scholar CR - Hautamaki, V., Kinnunen, T., Sedlak, F., Lee, K. A., Ma, B., & Li, H. (2013). Sparse classifier fusion for speaker verification. IEEE Transactions on Audio, Speech and Language Processing, 21(8). https://doi.org/10.1109/TASL.2013.2256895 google scholar CR - Helming, J., Arndt, H., Hodaie, Z., Koegel, M., & Narayan, N. (2011). Automatic Assignment of Work Items. Communications in Computer and Information Science, 230. https://doi.org/10.1007/978-3-642-23391-3_17 google scholar CR - Jonsson, L., Borg, M., Broman, D., Sandahl, K., Eldh, S., & Runeson, P. (2016). Automated bug assignment: Ensemble-based machine learning in large scale industrial contexts. Empirical Software Engineering, 21(4). https://doi.org/10.1007/s10664-015-9401-9 google scholar CR - Koçoğlu, F. Ö., & Esnaf, Ş. (2022). Machine Learning Approach and Model Performance Evaluation for Tele-Marketing Success Classification. International Journal of Business Analytics, 9(5). https://doi.org/10.4018/yban.298014 google scholar CR - Koçoğlu, F. Ö., & Özcan, T. (2022). A grid search optimized extreme learning machine approach for customer churn prediction. Journal of Engineering Research. google scholar CR - Kotsilieris, T., Anagnostopoulos, I., & Livieris, I. E. (2022). Special Issue: Regularization Techniques for Machine Learning and Their Appli-cations. In Electronics (Switzerland) (Vol. 11, Issue 4). https://doi.org/10.3390/electronics11040521 google scholar CR - Li, N., & Zhou, Z. H. (2009). Selective ensemble under regularization framework. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5519 LNCS. https://doi.org/10.1007/978-3-642-02326-2_30 google scholar CR - Mantovani, R. G., Horvath, T., Cerri, R., Vanschoren, J., & De Carvalho, A. C. P. L. F. (2017). Hyper-Parameter Tuning of a Decision Tree Induction Algorithm. Proceedings - 2016 5th Brazilian Conference on Intelligent Systems, BRACIS 2016. https://doi.org/10.1109/BRACIS.2016.018 google scholar CR - Mao, S., Xiong, L., Jiao, L. C., Zhang, S., & Chen, B. (2013). Weighted ensemble based on 0-1 matrix decomposition. Electronics Letters, 49(2). https://doi.org/10.1049/el.2012.3528 google scholar CR - Muller, A. C., & Guido, S. (2017). Introduction to Machine Learning with Python: a guide for data scientist. In O’Reilly Media, Inc. google scholar CR - Orynbassar, A., Sapazhanov, Y., Kadyrov, S., & Lyublinskaya, I. (2022). Application of ROC Curve Analysis for Predicting Students’ Passing Grade in a Course Based on Prerequisite Grades. Mathematics, 10(12). https://doi.org/10.3390/math10122084 google scholar CR - ÖZBİLGİN, F., & KURNAZ, Ç. (2023). Koroner Arter Hastalığının İris Görüntülerinden Yerel İkili Örüntüler ve Yapay Sinir Ağı Kullanılarak Tahmini. Karadeniz Fen Bilimleri Dergisi, 13(2). https://doi.org/10.31466/kfbd.1266996 google scholar CR - Özgür, A., Nar, F., & Erdem, H. (2018). Sparsity-driven weighted ensemble classifier. International Journal of Computational Intelligence Systems, 11 (1). https://doi.Org/10.2991/ijcis.11.1.73 google scholar CR - Paper, D. (2019). Hands-on Scikit-Learn for Machine Learning Applications: Data Science Fundamentals with Python. In Hands-on Scikit-Learn for Machine Learning Applications: Data Science Fundamentals with Python. https://doi.org/10.1007/978-1-4842-5373-1 google scholar CR - Sahoo, K., Samal, A. K., Pramanik, J., & Pani, S. K. (2019). Exploratory data analysis using python. International Journal of Innovative Technology and Exploring Engineering, 8(12), 4727-4735. https://doi.org/10.35940/jitee.L3591.1081219 google scholar CR - Şen, M. U., & Erdogan, H. (2013). Linear classifier combination and selection using group sparse regularization and hinge loss. Pattern Recognition Letters, 34(3). https://doi.org/10.1016/j.patrec.2012.10.008 google scholar CR - ŞENEL, S., & ALATLI, B. (2014). Lojistik Regresyon Analizinin Kullanıldığı Makaleler Üzerine Bir İnceleme. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 5(1). https://doi.org/10.21031/epod.67169 google scholar CR - Sinha, K., Uddin, Z., Kawsar, H. I., Islam, S., Deen, M. J., & Howlader, M.M.R. (2023). Analyzing chronic disease biomarkers using electrochem-ical sensors and artificial neural networks. In TrAC - Trends in Analytical Chemistry (Vol. 158). https://doi.org/10.1016/j.trac.2022.116861 google scholar CR - Şipal, B., Ormancı, B. B., & Altınel, A. B. (2022). KELİME ANLAM BULANIKLIĞINI GİDERMEK İÇİN DİFÜZYON REGÜLARİZASYON VE NORMALİZASYON TEKNİKLERİNİN KULLANILMASI. In MÜHENDİSLİK ALANINDA ULUSLARARASI ARAŞTIRMALAR VI (pp. 75-85). google scholar CR - Tanyildizi, E., & Demirtas, F. (2019). Hiper Parametre Optimizasyonu Hyper Parameter Optimization. 1st International Infor-matics and Software Engineering Conference: Innovative Technologies for Digital Transformation, IISEC 2019 - Proceedings. https://doi.org/10.1109/UBMYK48245.2019.8965609 google scholar CR - TAZEGÜL, A., YAZARKAN, H., & YERDELEN, C. (2016). İşletmelerin Finansal Başarılı ve Başarısız Olma Durumlarının Veri Madenciliği ve Lojistik Regresyon Analizi İle Tahmin Edilebilirliği. Ege Akademik Bakis (Ege Academic Review), 16(1). https://doi.org/10.21121/eab.2016119960 google scholar CR - Tian, Y., & Zhang, Y. (2022). A comprehensive survey on regularization strategies in machine learning. In Information Fusion (Vol. 80). https://doi.org/10.1016/j.inffus.2021.11.005 google scholar CR - Tinoco, S. L. J. L., Santos, H. G., Menotti, D., Santos, A. B., & Dos Santos, J. A. (2013). Ensemble of classifiers for remote sensed hyperspectral land cover analysis: An approach based on Linear Programming and Weighted Linear Combination. International Geoscience and Remote Sensing Symposium (IGARSS). https://doi.org/10.1109/IGARSS.2013.6723730 google scholar CR - Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical Machine Learning Tools and Techniques. In Data Mining: Practical Machine Learning Tools and Techniques. google scholar CR - YENİSU, E. (2021). Ekonomiyi Harekete Geçiren Kilit Sektörler Nelerdir? Türkiye Üzerine Bir Girdi-Çıktı Analizi. İzmir İktisat Dergisi, 36(4). https://doi.org/10.24988/je.721302 google scholar CR - YETGINLER, B., & ATACAK, İ. (2020). Sentiment Analyses on Movie Reviews using Machine Learning-Based Methods. Artificial Intelligence Studies, 3(2). https://doi.org/10.30855/ais.2020.03.02.01 google scholar CR - Yildiz, M., Alsac, A., Ulusinan, T., Ganiz, M. C., & Yenisey, M. M. (2022). IT Support Ticket Completion Time Prediction. Proceedings - 7th International Conference on Computer Science and Engineering, UBMK 2022. https://doi.org/10.1109/UBMK55850.2022.9919591 google scholar CR - Yin, X. C., Huang, K., Hao, H. W., Iqbal, K., & Wang, Z. Bin. (2012). Classifier ensemble using a heuristic learning with sparsity and diversity. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7664 LNCS(PART 2). https://doi.org/10.1007/978-3-642-34481-7_13 google scholar CR - Yoon, B.L. (1989). Artificial neural network technology. ACM SIGSMALL/PC Notes, 15(3), 3-16. https://doi.org/10.1145/74657.74658 google scholar CR - Zhang, L., & Zhou, W. Da. (2011). Sparse ensembles using weighted combination methods based on linear programming. Pattern Recognition, 44(1). https://doi.org/10.1016/j.patcog.2010.07.021 google scholar CR - Zibran, M.F. (2016). On the effectiveness of labeled latent dirichlet allocation in automatic bug-report categorization. Proceedings - International Conference on Software Engineering. https://doi.org/10.1145/2889160.2892646 google scholar UR - https://doi.org/10.26650/acin.1394019 L1 - https://dergipark.org.tr/en/download/article-file/3551155 ER -