Research Article
BibTex RIS Cite

Jar Malware Detection with XGBoost Algorithm Based on Binary Particle Swarm Optimization Feature Selection

Year 2023, Volume: 10 Issue: 1, 140 - 152, 31.05.2023
https://doi.org/10.35193/bseufbd.1194460

Abstract

Attacks with malware using the Java language have increased rapidly in recent years. With these increases, the damage that malware can cause to people and institutions has led researchers to develop and test different machine learning techniques to improve and strengthen automatic detection systems. In this study, a hybrid system is proposed for the detection of malicious Jar files, which uses binary particle swarm optimization based feature selection and classification with XGBoost algorithm. While minimizing is achieved in the binary particle swarm optimization algorithm, the random forest algorithm is used in the fitness function. With feature selection, it is aimed to increase speed and performance by reducing computational load on classification algorithm. In proposed model, training and tests were carried out by performing 10-fold cross validation. In detection mechanism made with the XGBoost algorithm, the performance of the model established with Accuracy, Precision, F1-Score, Recall metrics has been demonstrated. In order to evaluate performance of the proposed model, tests were made with AdaBoost, Gradient Boosting, Support Vector Machines, Artificial Neural Networks, Naïve Bayes methods and the results were compared. Experimental results showed that the proposed binary particle swarm optimization-based feature selection and hybrid model classifying with the XGBoost algorithm was more successful in detecting Jar malware than the compared models with an accuracy rate of 98.04%.

References

  • Balan, G., & Popescu, A. S. (2018). Detecting Java Compiled Malware using Machine Learning Techniques. 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). 20-23 September, Timisoara, Romania, 435-439.
  • Pinheiro, R. P., Lima, S. M., Souza, D. M., Silva, S. H., Lopes, P. G., de Lima, R. D., de Oliveira, J. R., Monteiro, T. de A., Fernandes, S. M., & Albuquerque, E. de Q. (2022). Antivirus applied to JAR malware detection based on runtime behaviors. Scientific Reports, 12(1). 1-17.
  • Obaidat, I., Sridhar, M., Pham, K. M., & Phung, P. H. (2022). Jadeite: A novel image-behavior-based approach for Java malware detection using deep learning. Computers & Security, 113. 102547.
  • Kumar, R., & Vaishakh, A. R. E. (2016). Detection of obfuscation in java malware. Procedia Computer Science, 78. 521-529.
  • Krebs on Security. (2020). Krebs on Security https://krebsonsecurity.com/2020/03/live-coronavirus-map-used-to-spread-malware/, (16.05.2022).
  • Ye, Y., Li, T., Adjeroh, D., & Iyengar, S. S. (2017). A survey on malware detection using data mining techniques. ACM Computing Surveys (CSUR), 50(3). 1-40.
  • Özgür, A., & Erdem, H. (2018). Feature selection and multiple classifier fusion using genetic algorithms in intrusion detection systems. Journal of the Faculty of Engineering and Architecture of Gazi University, 33(1). 75-87.
  • Anıl, U. (2022). Using network traffic analysis deep learning based Android malware detection. Journal of the Faculty of Engineering and Architecture of Gazi University, 37(4). 1823-1838.
  • Bhilvare, A., & Manik, T. (2015). An Overview of Different Malware Analysis Techniques in Android. IJSRD-International Journal for Scientific Research & Development, 3(1). 368-372.
  • Yerima, S. Y., Sezer, S., & McWilliams, G. (2014). Analysis of Bayesian classification-based approaches for Android malware detection. IET Information Security, 8(1). 25-36.
  • Kulkarni, K. (2018). Android Malware Detection through Permission and App Component Analysis using Machine Learning Algorithms. Master’s thesis, University of Toledo. Toledo.
  • Tong, F., & Yan, Z. (2017). A hybrid approach of mobile malware detection in Android. Journal of Parallel and Distributed computing, 103. 22-31.
  • Pinheiro, R., Lima, S., Fernandes, S., Albuquerque, E., Medeiros, S., Souza, D., Monteiro, T., Lopes, P., Lima, R., & Oliveira, J. (2019). Next generation antivirus applied to Jar malware detection based on runtime behaviors using neural networks. 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD). 06-08 May, Porto, Portugal, 28-32.
  • Herrera, A., & Cheney, B. (2015). JMD: a hybrid approach for detecting Java malware. Proceedings of the 13th Australasian Information Security Conference (AISC 2015), 27. Sydney, Australia, 30.
  • Jha, P. K., Shankar, P., Sujadevi, V. G., & Prabhaharan, P. (2018). Deepmal4j: Java malware detection employing deep learning. International Symposium on Security in Computing and Communication. 389-402.
  • Gunasundari, S., Janakiraman, S., & Meenambal, S. (2018). Multiswarm heterogeneous binary PSO using win-win approach for improved feature selection in liver and kidney disease diagnosis. Computerized Medical Imaging and Graphics, 70. 135-154.
  • Brezočnik, L., Fister, I., & Podgorelec, V. (2018). Swarm intelligence algorithms for feature selection: a review. Applied Sciences, 8(9). 1521.
  • Ji, B., Lu, X., Sun, G., Zhang, W., Li, J., & Xiao, Y. (2020). Bio-inspired feature selection: An improved binary particle swarm optimization approach. IEEE Access, 8. 85989-86002.
  • Abbasi, M. S., Al-Sahaf, H., Mansoori, M., & Welch, I. (2022). Behavior-based ransomware classification: A particle swarm optimization wrapper-based approach for feature selection. Applied Soft Computing, 121. 108744.
  • Ali, Z., & Soomro, T. R. (2018). An efficient mining based approach using PSO selection technique for analysis and detection of obfuscated malware. Journal of Information Assurance & Cyber security, 2018. 1-13.
  • Dong, D., Ye, Z., Su, J., Xie, S., Cao, Y., & Kochan, R. (2020). A malware detection method based on improved fireworks algorithm and support vector machine. 2020 IEEE 15th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET). 846-851.
  • Song, K., Yan, F., Ding, T., Gao, L., & Lu, S. (2020). A steel property optimization model based on the XGBoost algorithm and improved PSO. Computational Materials Science, 174. 109472.
  • Mo, H., Sun, H., Liu, J., & Wei, S. (2019). Developing window behavior models for residential buildings using XGBoost algorithm. Energy and Buildings, 205. 109564.
  • Dua, Dheeru, & Graff, Casey. (2017). UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences https://archive.ics.uci.edu/ml/datasets/cardiotocography, (08.03.2022).
  • Cimen, M. E., & Boz, A. F. (2019). Parameter identification of a non-minimum phase second order system with time delay using relay test and PSO, CS, FA algorithms. Journal of the Faculty of Engineering and Architecture of Gazi University, 34(1). 461-477. https://doi.org/10.17341/gazimmfd.416507
  • Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. San Francisco, California, USA, 785-794.
  • Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 29(5). 1189-1232.
  • Zhou, J., Qiu, Y., Khandelwal, M., Zhu, S., & Zhang, X. (2021). Developing a hybrid model of Jaya algorithm-based extreme gradient boosting machine to estimate blast-induced ground vibrations. International Journal of Rock Mechanics and Mining Sciences, 145. 104856. https://doi.org/10.1016/j.ijrmms.2021.104856
  • Jabeur, S. B., Mefteh-Wali, S., & Viviani, J.-L. (2021). Forecasting gold price with the XGBoost algorithm and SHAP interaction values. Annals of Operations Research. 1-21.
  • Chen, Y., Guo, A., Chen, Q., Quan, B., Liu, G., Li, L., Hong, J., Wei, H., & Hao, Z. (2021). Intelligent classification of antepartum cardiotocography model based on deep forest. Biomedical Signal Processing and Control, 67. 102555. https://doi.org/10.1016/j.bspc.2021.102555
  • Wang, W., Shi, Y., Lyu, G., & Deng, W. (2017). Electricity consumption prediction using xgboost based on discrete wavelet transform. DEStech Trans. Comput. Sci. Eng. 716-729.
  • Akbari, H., Sadiq, M. T., Payan, M., Esmaili, S. S., Baghri, H., & Bagheri, H. (2021). Depression Detection Based on Geometrical Features Extracted from SODP Shape of EEG Signals and Binary PSO. Traitement du Signal, 38(1)
  • Too, J., Abdullah, A. R., Mohd Saad, N., & Tee, W. (2019). EMG feature selection and classification using a Pbest-guide binary particle swarm optimization. Computation, 7(1). 12.
  • Vieira, S. M., Mendonça, L. F., Farinha, G. J., & Sousa, J. M. (2013). Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients. Applied Soft Computing, 13(8). 3494-3504.
  • Faris, H., Mafarja, M. M., Heidari, A. A., Aljarah, I., Al-Zoubi, A. M., Mirjalili, S., & Fujita, H. (2018). An efficient binary Salp Swarm Algorithm with crossover scheme for feature selection problems. Knowledge-Based Systems, 154. 43-67. https://doi.org/10.1016/j.knosys.2018.05.009
  • Too, J., Abdullah, A. R., & Mohd Saad, N. (2019). Binary competitive swarm optimizer approaches for feature selection. Computation, 7(2). 31.
  • Xuan, S., Liu, G., Li, Z., Zheng, L., Wang, S., & Jiang, C. (2018). Random forest for credit card fraud detection. 2018 IEEE 15th international conference on networking, sensing and control (ICNSC). Zhuhai, China, 1-6.
  • Miranda, L. J. (2018). PySwarms: a research toolkit for Particle Swarm Optimization in Python. Journal of Open Source Software, 3(21). 433. https://doi.org/10.21105/joss.00433
  • Google Colaboratory. (2022). Colaboratory, https://colab.research. google.com/, (10.08.2022).
  • Jha, P. K., Shankar, P., Sujadevi, V. G., & Prabhaharan, P. (2018). Deepmal4j: Java malware detection employing deep learning. International Symposium on Security in Computing and Communication. 19-22 September, Bangalore, India, 389-402.

XGBoost Algoritması ile İkili Parçacık Sürü Optimizasyonu Öznitelik Seçme Tabanlı Jar Kötü Amaçlı Yazılımlarının Tespiti

Year 2023, Volume: 10 Issue: 1, 140 - 152, 31.05.2023
https://doi.org/10.35193/bseufbd.1194460

Abstract

Java dilini kullanan kötü amaçlı yazılımlarla gerçekleştirilen saldırılar, geçtiğimiz yıllarda hızla artış göstermeye başlamıştır. Bu artışlarla birlikte kötü amaçlı yazılımların kişilere ve kurumlara verebileceği zararlar araştırmacıları otomatik algılama sistemlerini geliştirerek güçlendirmek için farklı makine öğrenme teknikleri geliştirmeye ve test etmeye yöneltmiştir. Bu çalışmada kötü amaçlı Jar dosyalarının tespiti için ikili parçacık sürü optimizasyonu tabanlı öznitelik seçimi ve XGBoost algoritması ile sınıflandırma yapan hibrit bir sistem önerilmiştir. İkili parçacık sürü optimizasyonu algoritmasında minimizasyon sağlanırken kullanılan uygunluk fonksiyonunda rastgele orman algoritması kullanılmıştır. Öznitelik seçimi ile sınıflandırma algoritmasının üzerine düşen hesaplama yükü azaltılarak hız ve performans artırımı hedeflenmiştir. Önerilen modelde 10 kat çapraz doğrulama yapılarak eğitim ve testler gerçekleştirilmiştir. XGBoost algoritması ile yapılan tespit mekanizmasında doğruluk, kesinlik, F1-Skoru, duyarlılık metrikleri ile kurulan modelin performansı ortaya konulmuştur. Önerilen modelin performansının değerlendirilmesi amacıyla AdaBoost, Gradient Boosting, Destek Vektör Makineleri, Yapay Sinir Ağları, Naive Bayes yöntemleri ile testler yapılmış ve sonuçlar karşılaştırılmıştır. Deneysel sonuçlar, önerilen ikili parçacık sürü optimizasyonu tabanlı öznitelik seçimi ve XGBoost algoritması ile sınıflandırma yapan hibrit modelin kötü amaçlı Jar yazılım tespitinde %98.04 doğruluk oranı ile karşılaştırılan modellere göre daha başarılı olduğunu göstermiştir.

References

  • Balan, G., & Popescu, A. S. (2018). Detecting Java Compiled Malware using Machine Learning Techniques. 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). 20-23 September, Timisoara, Romania, 435-439.
  • Pinheiro, R. P., Lima, S. M., Souza, D. M., Silva, S. H., Lopes, P. G., de Lima, R. D., de Oliveira, J. R., Monteiro, T. de A., Fernandes, S. M., & Albuquerque, E. de Q. (2022). Antivirus applied to JAR malware detection based on runtime behaviors. Scientific Reports, 12(1). 1-17.
  • Obaidat, I., Sridhar, M., Pham, K. M., & Phung, P. H. (2022). Jadeite: A novel image-behavior-based approach for Java malware detection using deep learning. Computers & Security, 113. 102547.
  • Kumar, R., & Vaishakh, A. R. E. (2016). Detection of obfuscation in java malware. Procedia Computer Science, 78. 521-529.
  • Krebs on Security. (2020). Krebs on Security https://krebsonsecurity.com/2020/03/live-coronavirus-map-used-to-spread-malware/, (16.05.2022).
  • Ye, Y., Li, T., Adjeroh, D., & Iyengar, S. S. (2017). A survey on malware detection using data mining techniques. ACM Computing Surveys (CSUR), 50(3). 1-40.
  • Özgür, A., & Erdem, H. (2018). Feature selection and multiple classifier fusion using genetic algorithms in intrusion detection systems. Journal of the Faculty of Engineering and Architecture of Gazi University, 33(1). 75-87.
  • Anıl, U. (2022). Using network traffic analysis deep learning based Android malware detection. Journal of the Faculty of Engineering and Architecture of Gazi University, 37(4). 1823-1838.
  • Bhilvare, A., & Manik, T. (2015). An Overview of Different Malware Analysis Techniques in Android. IJSRD-International Journal for Scientific Research & Development, 3(1). 368-372.
  • Yerima, S. Y., Sezer, S., & McWilliams, G. (2014). Analysis of Bayesian classification-based approaches for Android malware detection. IET Information Security, 8(1). 25-36.
  • Kulkarni, K. (2018). Android Malware Detection through Permission and App Component Analysis using Machine Learning Algorithms. Master’s thesis, University of Toledo. Toledo.
  • Tong, F., & Yan, Z. (2017). A hybrid approach of mobile malware detection in Android. Journal of Parallel and Distributed computing, 103. 22-31.
  • Pinheiro, R., Lima, S., Fernandes, S., Albuquerque, E., Medeiros, S., Souza, D., Monteiro, T., Lopes, P., Lima, R., & Oliveira, J. (2019). Next generation antivirus applied to Jar malware detection based on runtime behaviors using neural networks. 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD). 06-08 May, Porto, Portugal, 28-32.
  • Herrera, A., & Cheney, B. (2015). JMD: a hybrid approach for detecting Java malware. Proceedings of the 13th Australasian Information Security Conference (AISC 2015), 27. Sydney, Australia, 30.
  • Jha, P. K., Shankar, P., Sujadevi, V. G., & Prabhaharan, P. (2018). Deepmal4j: Java malware detection employing deep learning. International Symposium on Security in Computing and Communication. 389-402.
  • Gunasundari, S., Janakiraman, S., & Meenambal, S. (2018). Multiswarm heterogeneous binary PSO using win-win approach for improved feature selection in liver and kidney disease diagnosis. Computerized Medical Imaging and Graphics, 70. 135-154.
  • Brezočnik, L., Fister, I., & Podgorelec, V. (2018). Swarm intelligence algorithms for feature selection: a review. Applied Sciences, 8(9). 1521.
  • Ji, B., Lu, X., Sun, G., Zhang, W., Li, J., & Xiao, Y. (2020). Bio-inspired feature selection: An improved binary particle swarm optimization approach. IEEE Access, 8. 85989-86002.
  • Abbasi, M. S., Al-Sahaf, H., Mansoori, M., & Welch, I. (2022). Behavior-based ransomware classification: A particle swarm optimization wrapper-based approach for feature selection. Applied Soft Computing, 121. 108744.
  • Ali, Z., & Soomro, T. R. (2018). An efficient mining based approach using PSO selection technique for analysis and detection of obfuscated malware. Journal of Information Assurance & Cyber security, 2018. 1-13.
  • Dong, D., Ye, Z., Su, J., Xie, S., Cao, Y., & Kochan, R. (2020). A malware detection method based on improved fireworks algorithm and support vector machine. 2020 IEEE 15th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET). 846-851.
  • Song, K., Yan, F., Ding, T., Gao, L., & Lu, S. (2020). A steel property optimization model based on the XGBoost algorithm and improved PSO. Computational Materials Science, 174. 109472.
  • Mo, H., Sun, H., Liu, J., & Wei, S. (2019). Developing window behavior models for residential buildings using XGBoost algorithm. Energy and Buildings, 205. 109564.
  • Dua, Dheeru, & Graff, Casey. (2017). UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences https://archive.ics.uci.edu/ml/datasets/cardiotocography, (08.03.2022).
  • Cimen, M. E., & Boz, A. F. (2019). Parameter identification of a non-minimum phase second order system with time delay using relay test and PSO, CS, FA algorithms. Journal of the Faculty of Engineering and Architecture of Gazi University, 34(1). 461-477. https://doi.org/10.17341/gazimmfd.416507
  • Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. San Francisco, California, USA, 785-794.
  • Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 29(5). 1189-1232.
  • Zhou, J., Qiu, Y., Khandelwal, M., Zhu, S., & Zhang, X. (2021). Developing a hybrid model of Jaya algorithm-based extreme gradient boosting machine to estimate blast-induced ground vibrations. International Journal of Rock Mechanics and Mining Sciences, 145. 104856. https://doi.org/10.1016/j.ijrmms.2021.104856
  • Jabeur, S. B., Mefteh-Wali, S., & Viviani, J.-L. (2021). Forecasting gold price with the XGBoost algorithm and SHAP interaction values. Annals of Operations Research. 1-21.
  • Chen, Y., Guo, A., Chen, Q., Quan, B., Liu, G., Li, L., Hong, J., Wei, H., & Hao, Z. (2021). Intelligent classification of antepartum cardiotocography model based on deep forest. Biomedical Signal Processing and Control, 67. 102555. https://doi.org/10.1016/j.bspc.2021.102555
  • Wang, W., Shi, Y., Lyu, G., & Deng, W. (2017). Electricity consumption prediction using xgboost based on discrete wavelet transform. DEStech Trans. Comput. Sci. Eng. 716-729.
  • Akbari, H., Sadiq, M. T., Payan, M., Esmaili, S. S., Baghri, H., & Bagheri, H. (2021). Depression Detection Based on Geometrical Features Extracted from SODP Shape of EEG Signals and Binary PSO. Traitement du Signal, 38(1)
  • Too, J., Abdullah, A. R., Mohd Saad, N., & Tee, W. (2019). EMG feature selection and classification using a Pbest-guide binary particle swarm optimization. Computation, 7(1). 12.
  • Vieira, S. M., Mendonça, L. F., Farinha, G. J., & Sousa, J. M. (2013). Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients. Applied Soft Computing, 13(8). 3494-3504.
  • Faris, H., Mafarja, M. M., Heidari, A. A., Aljarah, I., Al-Zoubi, A. M., Mirjalili, S., & Fujita, H. (2018). An efficient binary Salp Swarm Algorithm with crossover scheme for feature selection problems. Knowledge-Based Systems, 154. 43-67. https://doi.org/10.1016/j.knosys.2018.05.009
  • Too, J., Abdullah, A. R., & Mohd Saad, N. (2019). Binary competitive swarm optimizer approaches for feature selection. Computation, 7(2). 31.
  • Xuan, S., Liu, G., Li, Z., Zheng, L., Wang, S., & Jiang, C. (2018). Random forest for credit card fraud detection. 2018 IEEE 15th international conference on networking, sensing and control (ICNSC). Zhuhai, China, 1-6.
  • Miranda, L. J. (2018). PySwarms: a research toolkit for Particle Swarm Optimization in Python. Journal of Open Source Software, 3(21). 433. https://doi.org/10.21105/joss.00433
  • Google Colaboratory. (2022). Colaboratory, https://colab.research. google.com/, (10.08.2022).
  • Jha, P. K., Shankar, P., Sujadevi, V. G., & Prabhaharan, P. (2018). Deepmal4j: Java malware detection employing deep learning. International Symposium on Security in Computing and Communication. 19-22 September, Bangalore, India, 389-402.
There are 40 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Articles
Authors

Mahmut Tokmak 0000-0003-0632-4308

Publication Date May 31, 2023
Submission Date October 25, 2022
Acceptance Date January 27, 2023
Published in Issue Year 2023 Volume: 10 Issue: 1

Cite

APA Tokmak, M. (2023). XGBoost Algoritması ile İkili Parçacık Sürü Optimizasyonu Öznitelik Seçme Tabanlı Jar Kötü Amaçlı Yazılımlarının Tespiti. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi, 10(1), 140-152. https://doi.org/10.35193/bseufbd.1194460