acin

Acta Infologica

2602-3563

Istanbul University

10.26650/acin.1222890

Software Engineering (Other)

Yazılım Mühendisliği (Diğer)

Örneklem Arttırma ve Örneklem Azaltma Algoritmalarının Kombinasyonuna Dayalı Bir Saldırı Tespit Yaklaşımı

An Intrusion Detection Approach based on the Combination of Oversampling and Undersampling Algorithms

https://orcid.org/0000-0002-6572-1605

Arık

Ahmet Okan

ISTANBUL UNIVERSITY

https://orcid.org/0000-0002-4875-4800

Çavdaroğlu

G. Çiğdem

Işık University, Faculty of Economics, Administrative and Social Sciences, Department of Information Technologies

01 02 2024

7 1 125 138 12 23 2022 04 17 2023

2017

Acta Infologica

Artan ağ akışı nedeniyle ağa izinsiz giriş tehdidi çok daha şiddetli hale gelmiştir. Bu nedenle, ağ güvenliğinde en çok endişe duyulan alanlardan biri ağ saldırı tespitidir. Siber güvenlik güvencesine olan talep arttıkça mevcut tehditleri karşılamak için saldırı tespit sistemlerine olan gereksinim de artmaktadır. Bununla birlikte, ağ tabanlı saldırı tespit sistemlerinin, sistemlerin yapısı, ağ verilerinin doğası ve gelecekteki verilerle ilgili belirsizlik nedeniyle bazı eksiklikleri vardır. Dengesiz veri problemi de sınıflandırma performansını kötü etkilediği için çok önemlidir. Son yıllarda derin öğrenme tabanlı metodolojilerde yüksek performans elde edilmesine rağmen, makine öğrenme teknikleri de ağ saldırı tespitinde yüksek performans sağlayabilir. Bu çalışma, makine öğrenme tekniklerini kullanarak UNSW-NB15 veri setinde yüksek doğruluk üreten dengesiz sınıf problemini çözmek için benzersiz bir iki aşamalı yeniden örnekleme modeline sahip olan ROGONG-IDS (Robust Gradient Boosting - Saldırı Tespit sistemi) adlı yeni bir saldırı tespit sistemi önermektedir. ROGONG-IDS, gradyan artırmaya dayalıdır. Sistem, dengesiz sınıf problemini çözmek için Sentetik Azınlık Aşırı Örnekleme Tekniği (SMOTE) ve NearMiss-1 yöntemlerini kullanır. Önerilen modelin çok sınıflı sınıflandırma performansı UNSW-NB15 ile test edilmiş, güçlü yapısı NSL-KDD veri seti ile doğrulanmıştır. ROGONG-IDS, UNSW-NB15 veri setini kullanarak %97,30 tespit oranı ve %97,65 F1 skoru ile literatürdeki en yüksek saldırı tespit oranı ve F1 skoruna ulaşmıştır. ROGONG-IDS, dengesiz sınıf dağılımından muzdarip UNSW-NB15 veri kümesi için sağlam, verimli bir saldırı tespit sistemi sağlamaktadır. Önerilen metodoloji, literatürdeki en gelişmiş saldırı tespit metotlarından daha iyi performans göstermektedir.

The threat of network intrusion has become much more severe due to the increasing network flow. Therefore, network intrusion detection is one of the most concerned areas of network security. As demand for cybersecurity assurance increases, the requirement for intrusion detection systems to meet current threats is also growing. However, network-based intrusion detection systems have several shortcomings due to the structure of the systems, the nature of the network data, and uncertainty related to future data. The imbalanced class problem is also crucial since it significantly negatively affects classification performance. Although high performance has been achieved in deep learning-based methodologies in recent years, machine learning techniques may also provide high performance in network intrusion detection. This study suggests a new intrusion detection system called ROGONG-IDS (Robust Gradient Boosting - Intrusion Detection System) which has a unique two-stage resampling model to solve the imbalanced class problem that produces high accuracy on the UNSW-NB15 dataset using machine learning techniques. ROGONGIDS is based on gradient boosting. The system uses Synthetic Minority Over-Sampling Technique (SMOTE) and NearMiss-1 methods to handle the imbalanced class problem. The proposed model's performance on multi-class classification was tested with the UNSW-NB15, and then its robust structure was validated with the NSL-KDD dataset. ROGONG-IDS reached the highest attack detection rate and F1 score in the literature, with a 97.30% detection rate and 97.65% F1 score using the UNSW-NB15 dataset. ROGONG-IDS provides a robust, efficient intrusion detection system for the UNSW-NB15 dataset, which suffered from imbalanced class distribution. The proposed methodology outperforms state-of-the-art and intrusion detection methods.

Machine Learning Cyber Security Intrusion Detection System Imbalanced Data Gradient Boosting

Makine Öğrenmesi Siber Güvenlik Saldırı Tespit Sistemi Dengesiz Veri Gradyan Artırma

Andresini, G., Appice, A., Mauro, N. D., Loglisci, C., & Malerba, D. (2020). Multi-Channel Deep Feature Learning for Intrusion Detection. IEEE Access, 8, 53346-53359. https://doi.org/10.1109/ACCESS.2020.2980937 google scholar

Belouch, M., El Hadaj, S., & Idhammad, M. (2018). Performance evaluation of intrusion detection based on machine learning using Apache Spark. Procedia Computer Science, 127, 1-6. https://doi.org/10.1016/j.procs.2018.01.091 google scholar

Bergstra, J., Bardenet, R., Bengio, Y., & Kegl, B. (2011). Algorithms for Hyper-Parameter Optimization. Advances in Neural Information Processing Systems, 24. Curran Associates, Inc. Retrieved from https://papers.nips.cc/paper/2011/hash/86e8f7ab32cfd12577bc2619bc635690-Abstract.html google scholar

Bergstra, J., & Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization. google scholar

Bergstra, J., Komer, B., Eliasmith, C., Yamins, D., & Cox, D. D. (2015). Hyperopt: A Python library for model selection and hyperparameter optimization. Computational Science & Discovery, 8(1), 014008. https://doi.org/10.1088/1749-4699/8/1/014008 google scholar

Bhavani, T. T., Rao, M. K., & Reddy, A. M. (2020). Network Intrusion Detection System Using Random Forest and Decision Tree Machine Learning Techniques. 1045, 637-643. https://doi.org/10.1007/978-981-15-0029-9_50 google scholar

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal ofArtificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953 google scholar

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794. https://doi.org/10.1145/2939672.2939785 google scholar

Chkirbene, Z., Eltanbouly, S., Bashendy, M., AlNaimi, N., & Erbad, A. (2020). Hybrid Machine Learning for Network Anomaly Intrusion Detection. google scholar

2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), 163-170. https://doi.org/10.1109/ICIoT48696.2020.9089575 datastd-dev/Github. (2021). ROGONG-IDS/GitHub. Retrieved December 21, 2022, from https://github.com/datastd-dev/ROGONG-IDS (Original work published December 15, 2021) google scholar

Demidova, L., & Klyueva, I. (2017). SVM classification: Optimization with the SMOTE algorithm for the class imbalance problem. 2017 6th Mediterranean Conference on Embedded Computing (MECO), 1-4. https://doi.org/10.1109/MECO.2017.7977136 google scholar

Ericsson. (2021). Ericsson Mobility Report November 2021. google scholar

Haibo He & Yunqian Ma. (2013). Imbalanced Learning: Foundations, Algorithms, and Applications | Wiley. Retrieved December 21, 2022, from Wiley. comwebsite: https://www.wiley.com/en-us/Imbalanced+Learning%3A+Foundations%2C+Algorithms%2C+and+Applications-p-9781118074626 google scholar

Injadat, M., Moubayed, A., Nassif, A. B., & Shami, A. (2021). Multi-Stage Optimized Machine Learning Framework for Network Intrusion Detection. google scholar

IEEE Transactions on Network and Service Management, 18(2), 1803-1816. https://doi.org/10.1109/TNSM.2020.3014929 google scholar

Kaja, N., Shaout, A., & Ma, D. (2019). An intelligent intrusion detection system. Applied Intelligence, 49, 3235-3247. https://doi.org/10.1007/s10489-019 01436-1 google scholar

Khan, F. A., Gumaei, A., Derhab, A., & Hussain, A. (2019). A Novel Two-Stage Deep Learning Model for Efficient Network Intrusion Detection. IEEE Access, 7, 30373-30385. https://doi.org/10.1109/ACCESS.2019.2899721 google scholar Klein, A., Falkner, S., Bartels, S., Hennig, P., & Hutter, F. (2017, March 7). Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets. arXiv. https://doi.org/10.48550/arXiv.1605.07079 google scholar

Liu, J., Gao, Y., & Hu, F. (2021). A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM. Computers & Security, 106, 102289. https://doi.org/10.1016/j.cose.2021.102289 google scholar

Mauldin, A. (2021). Global Internet Traffic and Capacity Return to Regularly Scheduled Programming. Retrieved from https://blog.telegeography.com/ internet-traffic-and-capacity-return-to-their-regularly-scheduled-programming google scholar

Moustafa, N., & Slay, J. (2015). UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). 2015 Military Communications and Information Systems Conference (MilCIS), 1-6. https://doi.org/10.1109/MilCIS.2015.7348942 google scholar

Mulyanto, M., Faisal, M., Prakosa, S. W., & Leu, J.-S. (2021). Effectiveness of Focal Loss for Minority Classification in Network Intrusion Detection Systems. Symmetry, 13(1), 4. https://doi.org/10.3390/sym13010004 google scholar

Naseer, S., Saleem, Y., Khalid, S., Bashir, M. K., Han, J., Iqbal, M. M., & Han, K. (2018). Enhanced Network Anomaly Detection Based on Deep Neural Networks. IEEE Access, 6, 48231-48246. https://doi.org/10.1109/ACCESS.2018.2863036 google scholar

Putatunda, S., & Rama, K. (2018). A Comparative Analysis of Hyperopt as Against Other Approaches for Hyper-Parameter Optimization of XGBoost. Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, 6-10. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3297067.3297080 google scholar

Schaer, R., Müller, H., & Depeursinge, A. (2016). Optimized Distributed Hyperparameter Search and Simulation for Lung Texture Classification in CT google scholar

Using Hadoop. Journal of Imaging, 2(2), 19. https://doi.org/10.3390/jimaging2020019 google scholar

Sumaiya Thaseen, I., Saira Banu, J., Lavanya, K., Rukunuddin Ghalib, M., & Abhishek, K. (2021). An integrated intrusion detection system usin correlation-based attribute selection and artificial neural network. Transactions on Emerging Telecommunications Technologies, 32(2), e4014. https://doi.org/10.1002/ett.4014 google scholar

Tavallaee, M., Bagheri, E., Lu, W.,& Ghorbani, A. A.(2009). A detailed analysis of the KDD CUP 99 data set. 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 2009, pp. 1-6, doi: 10.1109/CISDA.2009.5356528 google scholar

Yang, Y., Zheng, K., Wu, C., & Yang, Y. (2019). Improving the Classification Effectiveness of Intrusion Detection by Using Improved Conditional Variational AutoEncoder and Deep Neural Network. Sensors, 19(11), 2528. https://doi.org/10.3390/s19112528 google scholar

Yen, S.-J., & Lee, Y.-S. (2006). Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset. In D.-S. Huang, K. Li, & G. W. Irwin (Eds.), Intelligent Control and Automation: International Conference on Intelligent Computing, ICIC 2006 Kunming, China, August 16-19, 2006 (pp. 731-740). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-540-37256-1_89 google scholar

Yin, C., Zhu, Y., Fei, J., & He, X. (2017). A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks. IEEE Access, 5, 21954 21961. https://doi.org/10.1109/ACCESS.2017.2762418 google scholar

Zhang, H., Huang, L., Wu, C. Q., & Li, Z. (2020). An effective convolutional neural network based on SMOTE and Gaussian mixture model for intrusion detection in imbalanced dataset. Computer Networks, 177, 107315. https://doi.org/10.1016/j.comnet.2020.107315 google scholar

Zhang, H., Wu, C. Q., Gao, S., Wang, Z., Xu, Y., & Liu, Y. (2018). An Effective Deep Learning Based Scheme for Network Intrusion Detection. 2018 24th International Conference on Pattern Recognition (ICPR), 682-687. https://doi.org/10.1109/ICPR.2018.8546162 google scholar