Research Article
BibTex RIS Cite

An Intrusion Detection Approach based on the Combination of Oversampling and Undersampling Algorithms

Year 2023, , 125 - 138, 02.01.2024
https://doi.org/10.26650/acin.1222890

Abstract

The threat of network intrusion has become much more severe due to the increasing network flow. Therefore, network intrusion detection is one of the most concerned areas of network security. As demand for cybersecurity assurance increases, the requirement for intrusion detection systems to meet current threats is also growing. However, network-based intrusion detection systems have several shortcomings due to the structure of the systems, the nature of the network data, and uncertainty related to future data. The imbalanced class problem is also crucial since it significantly negatively affects classification performance. Although high performance has been achieved in deep learning-based methodologies in recent years, machine learning techniques may also provide high performance in network intrusion detection. This study suggests a new intrusion detection system called ROGONG-IDS (Robust Gradient Boosting - Intrusion Detection System) which has a unique two-stage resampling model to solve the imbalanced class problem that produces high accuracy on the UNSW-NB15 dataset using machine learning techniques. ROGONGIDS is based on gradient boosting. The system uses Synthetic Minority Over-Sampling Technique (SMOTE) and NearMiss-1 methods to handle the imbalanced class problem. The proposed model's performance on multi-class classification was tested with the UNSW-NB15, and then its robust structure was validated with the NSL-KDD dataset. ROGONG-IDS reached the highest attack detection rate and F1 score in the literature, with a 97.30% detection rate and 97.65% F1 score using the UNSW-NB15 dataset. ROGONG-IDS provides a robust, efficient intrusion detection system for the UNSW-NB15 dataset, which suffered from imbalanced class distribution. The proposed methodology outperforms state-of-the-art and intrusion detection methods.

References

  • Andresini, G., Appice, A., Mauro, N. D., Loglisci, C., & Malerba, D. (2020). Multi-Channel Deep Feature Learning for Intrusion Detection. IEEE Access, 8, 53346-53359. https://doi.org/10.1109/ACCESS.2020.2980937 google scholar
  • Belouch, M., El Hadaj, S., & Idhammad, M. (2018). Performance evaluation of intrusion detection based on machine learning using Apache Spark. Procedia Computer Science, 127, 1-6. https://doi.org/10.1016/j.procs.2018.01.091 google scholar
  • Bergstra, J., Bardenet, R., Bengio, Y., & Kegl, B. (2011). Algorithms for Hyper-Parameter Optimization. Advances in Neural Information Processing Systems, 24. Curran Associates, Inc. Retrieved from https://papers.nips.cc/paper/2011/hash/86e8f7ab32cfd12577bc2619bc635690-Abstract.html google scholar
  • Bergstra, J., & Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization. google scholar
  • Bergstra, J., Komer, B., Eliasmith, C., Yamins, D., & Cox, D. D. (2015). Hyperopt: A Python library for model selection and hyperparameter optimization. Computational Science & Discovery, 8(1), 014008. https://doi.org/10.1088/1749-4699/8/1/014008 google scholar
  • Bhavani, T. T., Rao, M. K., & Reddy, A. M. (2020). Network Intrusion Detection System Using Random Forest and Decision Tree Machine Learning Techniques. 1045, 637-643. https://doi.org/10.1007/978-981-15-0029-9_50 google scholar
  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal ofArtificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953 google scholar
  • Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794. https://doi.org/10.1145/2939672.2939785 google scholar
  • Chkirbene, Z., Eltanbouly, S., Bashendy, M., AlNaimi, N., & Erbad, A. (2020). Hybrid Machine Learning for Network Anomaly Intrusion Detection. google scholar
  • 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), 163-170. https://doi.org/10.1109/ICIoT48696.2020.9089575 datastd-dev/Github. (2021). ROGONG-IDS/GitHub. Retrieved December 21, 2022, from https://github.com/datastd-dev/ROGONG-IDS (Original work published December 15, 2021) google scholar
  • Demidova, L., & Klyueva, I. (2017). SVM classification: Optimization with the SMOTE algorithm for the class imbalance problem. 2017 6th Mediterranean Conference on Embedded Computing (MECO), 1-4. https://doi.org/10.1109/MECO.2017.7977136 google scholar
  • Ericsson. (2021). Ericsson Mobility Report November 2021. google scholar
  • Haibo He & Yunqian Ma. (2013). Imbalanced Learning: Foundations, Algorithms, and Applications | Wiley. Retrieved December 21, 2022, from Wiley. comwebsite: https://www.wiley.com/en-us/Imbalanced+Learning%3A+Foundations%2C+Algorithms%2C+and+Applications-p-9781118074626 google scholar
  • Injadat, M., Moubayed, A., Nassif, A. B., & Shami, A. (2021). Multi-Stage Optimized Machine Learning Framework for Network Intrusion Detection. google scholar
  • IEEE Transactions on Network and Service Management, 18(2), 1803-1816. https://doi.org/10.1109/TNSM.2020.3014929 google scholar
  • Kaja, N., Shaout, A., & Ma, D. (2019). An intelligent intrusion detection system. Applied Intelligence, 49, 3235-3247. https://doi.org/10.1007/s10489-019 01436-1 google scholar
  • Khan, F. A., Gumaei, A., Derhab, A., & Hussain, A. (2019). A Novel Two-Stage Deep Learning Model for Efficient Network Intrusion Detection. IEEE Access, 7, 30373-30385. https://doi.org/10.1109/ACCESS.2019.2899721 google scholar Klein, A., Falkner, S., Bartels, S., Hennig, P., & Hutter, F. (2017, March 7). Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets. arXiv. https://doi.org/10.48550/arXiv.1605.07079 google scholar
  • Liu, J., Gao, Y., & Hu, F. (2021). A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM. Computers & Security, 106, 102289. https://doi.org/10.1016/j.cose.2021.102289 google scholar
  • Mauldin, A. (2021). Global Internet Traffic and Capacity Return to Regularly Scheduled Programming. Retrieved from https://blog.telegeography.com/ internet-traffic-and-capacity-return-to-their-regularly-scheduled-programming google scholar
  • Moustafa, N., & Slay, J. (2015). UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). 2015 Military Communications and Information Systems Conference (MilCIS), 1-6. https://doi.org/10.1109/MilCIS.2015.7348942 google scholar
  • Mulyanto, M., Faisal, M., Prakosa, S. W., & Leu, J.-S. (2021). Effectiveness of Focal Loss for Minority Classification in Network Intrusion Detection Systems. Symmetry, 13(1), 4. https://doi.org/10.3390/sym13010004 google scholar
  • Naseer, S., Saleem, Y., Khalid, S., Bashir, M. K., Han, J., Iqbal, M. M., & Han, K. (2018). Enhanced Network Anomaly Detection Based on Deep Neural Networks. IEEE Access, 6, 48231-48246. https://doi.org/10.1109/ACCESS.2018.2863036 google scholar
  • Putatunda, S., & Rama, K. (2018). A Comparative Analysis of Hyperopt as Against Other Approaches for Hyper-Parameter Optimization of XGBoost. Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, 6-10. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3297067.3297080 google scholar
  • Schaer, R., Müller, H., & Depeursinge, A. (2016). Optimized Distributed Hyperparameter Search and Simulation for Lung Texture Classification in CT google scholar
  • Using Hadoop. Journal of Imaging, 2(2), 19. https://doi.org/10.3390/jimaging2020019 google scholar
  • Sumaiya Thaseen, I., Saira Banu, J., Lavanya, K., Rukunuddin Ghalib, M., & Abhishek, K. (2021). An integrated intrusion detection system usin correlation-based attribute selection and artificial neural network. Transactions on Emerging Telecommunications Technologies, 32(2), e4014. https://doi.org/10.1002/ett.4014 google scholar
  • Tavallaee, M., Bagheri, E., Lu, W.,& Ghorbani, A. A.(2009). A detailed analysis of the KDD CUP 99 data set. 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 2009, pp. 1-6, doi: 10.1109/CISDA.2009.5356528 google scholar
  • Yang, Y., Zheng, K., Wu, C., & Yang, Y. (2019). Improving the Classification Effectiveness of Intrusion Detection by Using Improved Conditional Variational AutoEncoder and Deep Neural Network. Sensors, 19(11), 2528. https://doi.org/10.3390/s19112528 google scholar
  • Yen, S.-J., & Lee, Y.-S. (2006). Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset. In D.-S. Huang, K. Li, & G. W. Irwin (Eds.), Intelligent Control and Automation: International Conference on Intelligent Computing, ICIC 2006 Kunming, China, August 16-19, 2006 (pp. 731-740). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-540-37256-1_89 google scholar
  • Yin, C., Zhu, Y., Fei, J., & He, X. (2017). A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks. IEEE Access, 5, 21954 21961. https://doi.org/10.1109/ACCESS.2017.2762418 google scholar
  • Zhang, H., Huang, L., Wu, C. Q., & Li, Z. (2020). An effective convolutional neural network based on SMOTE and Gaussian mixture model for intrusion detection in imbalanced dataset. Computer Networks, 177, 107315. https://doi.org/10.1016/j.comnet.2020.107315 google scholar
  • Zhang, H., Wu, C. Q., Gao, S., Wang, Z., Xu, Y., & Liu, Y. (2018). An Effective Deep Learning Based Scheme for Network Intrusion Detection. 2018 24th International Conference on Pattern Recognition (ICPR), 682-687. https://doi.org/10.1109/ICPR.2018.8546162 google scholar

Örneklem Arttırma ve Örneklem Azaltma Algoritmalarının Kombinasyonuna Dayalı Bir Saldırı Tespit Yaklaşımı

Year 2023, , 125 - 138, 02.01.2024
https://doi.org/10.26650/acin.1222890

Abstract

Artan ağ akışı nedeniyle ağa izinsiz giriş tehdidi çok daha şiddetli hale gelmiştir. Bu nedenle, ağ güvenliğinde en çok endişe duyulan alanlardan biri ağ saldırı tespitidir. Siber güvenlik güvencesine olan talep arttıkça mevcut tehditleri karşılamak için saldırı tespit sistemlerine olan gereksinim de artmaktadır. Bununla birlikte, ağ tabanlı saldırı tespit sistemlerinin, sistemlerin yapısı, ağ verilerinin doğası ve gelecekteki verilerle ilgili belirsizlik nedeniyle bazı eksiklikleri vardır. Dengesiz veri problemi de sınıflandırma performansını kötü etkilediği için çok önemlidir. Son yıllarda derin öğrenme tabanlı metodolojilerde yüksek performans elde edilmesine rağmen, makine öğrenme teknikleri de ağ saldırı tespitinde yüksek performans sağlayabilir. Bu çalışma, makine öğrenme tekniklerini kullanarak UNSW-NB15 veri setinde yüksek doğruluk üreten dengesiz sınıf problemini çözmek için benzersiz bir iki aşamalı yeniden örnekleme modeline sahip olan ROGONG-IDS (Robust Gradient Boosting - Saldırı Tespit sistemi) adlı yeni bir saldırı tespit sistemi önermektedir. ROGONG-IDS, gradyan artırmaya dayalıdır. Sistem, dengesiz sınıf problemini çözmek için Sentetik Azınlık Aşırı Örnekleme Tekniği (SMOTE) ve NearMiss-1 yöntemlerini kullanır. Önerilen modelin çok sınıflı sınıflandırma performansı UNSW-NB15 ile test edilmiş, güçlü yapısı NSL-KDD veri seti ile doğrulanmıştır. ROGONG-IDS, UNSW-NB15 veri setini kullanarak %97,30 tespit oranı ve %97,65 F1 skoru ile literatürdeki en yüksek saldırı tespit oranı ve F1 skoruna ulaşmıştır. ROGONG-IDS, dengesiz sınıf dağılımından muzdarip UNSW-NB15 veri kümesi için sağlam, verimli bir saldırı tespit sistemi sağlamaktadır. Önerilen metodoloji, literatürdeki en gelişmiş saldırı tespit metotlarından daha iyi performans göstermektedir.

References

  • Andresini, G., Appice, A., Mauro, N. D., Loglisci, C., & Malerba, D. (2020). Multi-Channel Deep Feature Learning for Intrusion Detection. IEEE Access, 8, 53346-53359. https://doi.org/10.1109/ACCESS.2020.2980937 google scholar
  • Belouch, M., El Hadaj, S., & Idhammad, M. (2018). Performance evaluation of intrusion detection based on machine learning using Apache Spark. Procedia Computer Science, 127, 1-6. https://doi.org/10.1016/j.procs.2018.01.091 google scholar
  • Bergstra, J., Bardenet, R., Bengio, Y., & Kegl, B. (2011). Algorithms for Hyper-Parameter Optimization. Advances in Neural Information Processing Systems, 24. Curran Associates, Inc. Retrieved from https://papers.nips.cc/paper/2011/hash/86e8f7ab32cfd12577bc2619bc635690-Abstract.html google scholar
  • Bergstra, J., & Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization. google scholar
  • Bergstra, J., Komer, B., Eliasmith, C., Yamins, D., & Cox, D. D. (2015). Hyperopt: A Python library for model selection and hyperparameter optimization. Computational Science & Discovery, 8(1), 014008. https://doi.org/10.1088/1749-4699/8/1/014008 google scholar
  • Bhavani, T. T., Rao, M. K., & Reddy, A. M. (2020). Network Intrusion Detection System Using Random Forest and Decision Tree Machine Learning Techniques. 1045, 637-643. https://doi.org/10.1007/978-981-15-0029-9_50 google scholar
  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal ofArtificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953 google scholar
  • Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794. https://doi.org/10.1145/2939672.2939785 google scholar
  • Chkirbene, Z., Eltanbouly, S., Bashendy, M., AlNaimi, N., & Erbad, A. (2020). Hybrid Machine Learning for Network Anomaly Intrusion Detection. google scholar
  • 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), 163-170. https://doi.org/10.1109/ICIoT48696.2020.9089575 datastd-dev/Github. (2021). ROGONG-IDS/GitHub. Retrieved December 21, 2022, from https://github.com/datastd-dev/ROGONG-IDS (Original work published December 15, 2021) google scholar
  • Demidova, L., & Klyueva, I. (2017). SVM classification: Optimization with the SMOTE algorithm for the class imbalance problem. 2017 6th Mediterranean Conference on Embedded Computing (MECO), 1-4. https://doi.org/10.1109/MECO.2017.7977136 google scholar
  • Ericsson. (2021). Ericsson Mobility Report November 2021. google scholar
  • Haibo He & Yunqian Ma. (2013). Imbalanced Learning: Foundations, Algorithms, and Applications | Wiley. Retrieved December 21, 2022, from Wiley. comwebsite: https://www.wiley.com/en-us/Imbalanced+Learning%3A+Foundations%2C+Algorithms%2C+and+Applications-p-9781118074626 google scholar
  • Injadat, M., Moubayed, A., Nassif, A. B., & Shami, A. (2021). Multi-Stage Optimized Machine Learning Framework for Network Intrusion Detection. google scholar
  • IEEE Transactions on Network and Service Management, 18(2), 1803-1816. https://doi.org/10.1109/TNSM.2020.3014929 google scholar
  • Kaja, N., Shaout, A., & Ma, D. (2019). An intelligent intrusion detection system. Applied Intelligence, 49, 3235-3247. https://doi.org/10.1007/s10489-019 01436-1 google scholar
  • Khan, F. A., Gumaei, A., Derhab, A., & Hussain, A. (2019). A Novel Two-Stage Deep Learning Model for Efficient Network Intrusion Detection. IEEE Access, 7, 30373-30385. https://doi.org/10.1109/ACCESS.2019.2899721 google scholar Klein, A., Falkner, S., Bartels, S., Hennig, P., & Hutter, F. (2017, March 7). Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets. arXiv. https://doi.org/10.48550/arXiv.1605.07079 google scholar
  • Liu, J., Gao, Y., & Hu, F. (2021). A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM. Computers & Security, 106, 102289. https://doi.org/10.1016/j.cose.2021.102289 google scholar
  • Mauldin, A. (2021). Global Internet Traffic and Capacity Return to Regularly Scheduled Programming. Retrieved from https://blog.telegeography.com/ internet-traffic-and-capacity-return-to-their-regularly-scheduled-programming google scholar
  • Moustafa, N., & Slay, J. (2015). UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). 2015 Military Communications and Information Systems Conference (MilCIS), 1-6. https://doi.org/10.1109/MilCIS.2015.7348942 google scholar
  • Mulyanto, M., Faisal, M., Prakosa, S. W., & Leu, J.-S. (2021). Effectiveness of Focal Loss for Minority Classification in Network Intrusion Detection Systems. Symmetry, 13(1), 4. https://doi.org/10.3390/sym13010004 google scholar
  • Naseer, S., Saleem, Y., Khalid, S., Bashir, M. K., Han, J., Iqbal, M. M., & Han, K. (2018). Enhanced Network Anomaly Detection Based on Deep Neural Networks. IEEE Access, 6, 48231-48246. https://doi.org/10.1109/ACCESS.2018.2863036 google scholar
  • Putatunda, S., & Rama, K. (2018). A Comparative Analysis of Hyperopt as Against Other Approaches for Hyper-Parameter Optimization of XGBoost. Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, 6-10. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3297067.3297080 google scholar
  • Schaer, R., Müller, H., & Depeursinge, A. (2016). Optimized Distributed Hyperparameter Search and Simulation for Lung Texture Classification in CT google scholar
  • Using Hadoop. Journal of Imaging, 2(2), 19. https://doi.org/10.3390/jimaging2020019 google scholar
  • Sumaiya Thaseen, I., Saira Banu, J., Lavanya, K., Rukunuddin Ghalib, M., & Abhishek, K. (2021). An integrated intrusion detection system usin correlation-based attribute selection and artificial neural network. Transactions on Emerging Telecommunications Technologies, 32(2), e4014. https://doi.org/10.1002/ett.4014 google scholar
  • Tavallaee, M., Bagheri, E., Lu, W.,& Ghorbani, A. A.(2009). A detailed analysis of the KDD CUP 99 data set. 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 2009, pp. 1-6, doi: 10.1109/CISDA.2009.5356528 google scholar
  • Yang, Y., Zheng, K., Wu, C., & Yang, Y. (2019). Improving the Classification Effectiveness of Intrusion Detection by Using Improved Conditional Variational AutoEncoder and Deep Neural Network. Sensors, 19(11), 2528. https://doi.org/10.3390/s19112528 google scholar
  • Yen, S.-J., & Lee, Y.-S. (2006). Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset. In D.-S. Huang, K. Li, & G. W. Irwin (Eds.), Intelligent Control and Automation: International Conference on Intelligent Computing, ICIC 2006 Kunming, China, August 16-19, 2006 (pp. 731-740). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-540-37256-1_89 google scholar
  • Yin, C., Zhu, Y., Fei, J., & He, X. (2017). A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks. IEEE Access, 5, 21954 21961. https://doi.org/10.1109/ACCESS.2017.2762418 google scholar
  • Zhang, H., Huang, L., Wu, C. Q., & Li, Z. (2020). An effective convolutional neural network based on SMOTE and Gaussian mixture model for intrusion detection in imbalanced dataset. Computer Networks, 177, 107315. https://doi.org/10.1016/j.comnet.2020.107315 google scholar
  • Zhang, H., Wu, C. Q., Gao, S., Wang, Z., Xu, Y., & Liu, Y. (2018). An Effective Deep Learning Based Scheme for Network Intrusion Detection. 2018 24th International Conference on Pattern Recognition (ICPR), 682-687. https://doi.org/10.1109/ICPR.2018.8546162 google scholar
There are 32 citations in total.

Details

Primary Language English
Subjects Software Engineering (Other)
Journal Section Research Article
Authors

Ahmet Okan Arık 0000-0002-6572-1605

G. Çiğdem Çavdaroğlu 0000-0002-4875-4800

Publication Date January 2, 2024
Submission Date December 23, 2022
Published in Issue Year 2023

Cite

APA Arık, A. O., & Çavdaroğlu, G. Ç. (2024). An Intrusion Detection Approach based on the Combination of Oversampling and Undersampling Algorithms. Acta Infologica, 7(1), 125-138. https://doi.org/10.26650/acin.1222890
AMA Arık AO, Çavdaroğlu GÇ. An Intrusion Detection Approach based on the Combination of Oversampling and Undersampling Algorithms. ACIN. January 2024;7(1):125-138. doi:10.26650/acin.1222890
Chicago Arık, Ahmet Okan, and G. Çiğdem Çavdaroğlu. “An Intrusion Detection Approach Based on the Combination of Oversampling and Undersampling Algorithms”. Acta Infologica 7, no. 1 (January 2024): 125-38. https://doi.org/10.26650/acin.1222890.
EndNote Arık AO, Çavdaroğlu GÇ (January 1, 2024) An Intrusion Detection Approach based on the Combination of Oversampling and Undersampling Algorithms. Acta Infologica 7 1 125–138.
IEEE A. O. Arık and G. Ç. Çavdaroğlu, “An Intrusion Detection Approach based on the Combination of Oversampling and Undersampling Algorithms”, ACIN, vol. 7, no. 1, pp. 125–138, 2024, doi: 10.26650/acin.1222890.
ISNAD Arık, Ahmet Okan - Çavdaroğlu, G. Çiğdem. “An Intrusion Detection Approach Based on the Combination of Oversampling and Undersampling Algorithms”. Acta Infologica 7/1 (January 2024), 125-138. https://doi.org/10.26650/acin.1222890.
JAMA Arık AO, Çavdaroğlu GÇ. An Intrusion Detection Approach based on the Combination of Oversampling and Undersampling Algorithms. ACIN. 2024;7:125–138.
MLA Arık, Ahmet Okan and G. Çiğdem Çavdaroğlu. “An Intrusion Detection Approach Based on the Combination of Oversampling and Undersampling Algorithms”. Acta Infologica, vol. 7, no. 1, 2024, pp. 125-38, doi:10.26650/acin.1222890.
Vancouver Arık AO, Çavdaroğlu GÇ. An Intrusion Detection Approach based on the Combination of Oversampling and Undersampling Algorithms. ACIN. 2024;7(1):125-38.