Araştırma Makalesi
BibTex RIS Kaynak Göster

Makine Öğrenmesi Yöntemleri Kullanılarak Kötü Amaçlı Yazılım Sınıflandırması: CIC-MamMem-2022 Veri Kümesi Üzerinde Bir Performans Karşılaştırması

Yıl 2024, Cilt: 17 Sayı: 2, 165 - 173
https://doi.org/10.54525/bbmd.1504476

Öz

Zararlı yazılım veya kötü amaçlı yazılım; bilgisayar ve mobil cihazların işlevlerini bozmak, kritik bilgileri toplamak, özel bilgisayar sistemlerine erişim sağlamak ve istenmeyen reklamları göstermek amacı ile kullanılan yazılımdır. Kötü amaçlı yazılımların güvenlik ve antivirüs sistemlerinde tespit edilebilmesi ya da engellenmesi için makine öğrenmesi tabanlı saldırı tespit/önleme sistemleri kullanılmaktadır. Bu çalışmada CIC-MamMem-2022 veri kümesi üzerinde, makine öğrenmesi yöntemleriyle kötü amaçlı yazılımların sınıflandırılması amaçlanmıştır. Bu veri kümesi üzerinde zorlu bir problem olan on altı sınıf sınıflandırma için literatürde bilinen en iyi F1 ölçüsü, kesinlik, hassasiyet ve doğruluk değerleri sırasıyla %69,46, %70,94, %69,48 ve %69,48 iken; bu çalışmada özellikle on altı sınıf sınıflandırma problemi üzerine odaklanılmış ve literatürde bilinen en iyi sonuçlardan daha iyi sonuçlar elde edilmiştir. Yapılan deneysel çalışmalar sonucunda XGBoost ile F1 ölçüsü, tutturma, bulma ve doğruluk değerleri sırasıyla %75,53, %75,43, %75,65 ve %75,53 olarak elde edilmiştir.

Kaynakça

  • Carrier, T., Victor, P., Tekeoglu, A., & Lashkari, A. H. (2022, February). Detecting Obfuscated Malware using Memory Feature Engineering. In Icissp (pp. 177-188).
  • Abualhaj, M., Abu-Shareha, A., Shambour, Q., Alsaaidah, A., Al-Khatib, S., & Anbar, M. (2024). Customized K-nearest neighbors’ algorithm for malware detection. International Journal of Data and Network Science, 8(1), 431-438.
  • Shafin, S. S., Karmakar, G., & Mareels, I. (2023). Obfuscated memory malware detection in resource-constrained IoT devices for smart city applications. Sensors, 23(11), 5348.
  • Hasan, S. R., & Dhakal, A. (2023, December). Obfuscated Malware Detection: Investigating Real-World Scenarios Through Memory Analysis. In 2023 IEEE International Conference on Telecommunications and Photonics (ICTP) (pp. 01-05). IEEE.
  • Jiang, Q., Zhao, X., & Huang, K. (2011, June). A feature selection method for malware detection. In 2011 IEEE International Conference on Information and Automation (pp. 890-895). IEEE.
  • Smith, D., Khorsandroo, S., & Roy, K. (2023, February). Supervised and unsupervised learning techniques utilizing malware datasets. In 2023 IEEE 2nd International Conference on AI in Cybersecurity (ICAIC) (pp. 1-7). IEEE.
  • Benkerroum, S., & Chougdali, K. (2023, December). Enhancing Forensic Analysis Using a Machine Learning-based Approach. In 2023 6th International Conference on Advanced Communication Technologies and Networking (CommNet) (pp. 1-6). IEEE.
  • Balasubramanian, K. M., Vasudevan, S. V., Thangavel, S. K., Kumar, G., Srinivasan, K., Tibrewal, A., & Vajipayajula, S. (2023, July). Obfuscated Malware detection using Machine Learning models. In 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT) (pp. 1-8). IEEE.
  • Dener, M., Ok, G., & Orman, A. (2022). Malware detection using memory analysis data in big data environment. Applied Sciences, 12(17), 8604.
  • Mezina, A., & Burget, R. (2022, October). Obfuscated malware detection using dilated convolutional network. In 2022 14th international congress on ultra modern telecommunications and control systems and workshops (ICUMT) (pp. 110-115). IEEE
  • Talukder, M. A., Hasan, K. F., Islam, M. M., Uddin, M. A., Akhter, A., Yousuf, M. A., ... & Moni, M. A. (2023). A dependable hybrid machine learning model for network intrusion detection. Journal of Information Security and Applications, 72, 103405
  • Naeem, H., Dong, S., Falana, O. J., & Ullah, F. (2023). Development of a deep stacked ensemble with process based volatile memory forensics for platform independent malware detection and classification. Expert Systems with Applications, 223, 119952.
  • Dener, M., Ok, G., & Orman, A. (2022). Malware detection using memory analysis data in big data environment. Applied Sciences, 12(17), 8604.
  • Smmarwar, S. K., Gupta, G. P., & Kumar, S. (2024). Android Malware Detection and Identification Frameworks by Leveraging the Machine and Deep Learning Techniques: A Comprehensive Review. Telematics and Informatics Reports, 100130.
  • Al-Qudah, M., Ashi, Z., Alnabhan, M., & Abu Al-Haija, Q. (2023). Effective one-class classifier model for memory dump malware detection. Journal of Sensor and Actuator Networks, 12(1), 5.
  • Alani, M. M., Mashatan, A., & Miri, A. (2023). XMal: A lightweight memory-based explainable obfuscated-malware detector. Computers & Security, 133, 103409.
  • Louk, M. H. L., & Tama, B. A. (2022). Tree-based classifier ensembles for PE malware analysis: A performance revisit. Algorithms, 15(9), 332.
  • Smith, D., Khorsandroo, S., & Roy, K. (2023, February). Supervised and unsupervised learning techniques utilizing malware datasets. In 2023 IEEE 2nd International Conference on AI in Cybersecurity (ICAIC) (pp. 1-7). IEEE.
  • Roshan, K., & Zafar, A. (2024). Ensemble adaptive online machine learning in data stream: a case study in cyber intrusion detection system. International Journal of Information Technology, 1-14.
  • Maniriho, P., Mahmood, A. N., & Chowdhury, M. J. M. (2024). MeMalDet: A memory analysis-based malware detection framework using deep autoencoders and stacked ensemble under temporal evaluations. Computers & Security, 142, 103864.
  • Roy, K. S., Ahmed, T., Udas, P. B., Karim, M. E., & Majumdar, S. (2023). Malhystack: A hybrid stacked ensemble learning framework with feature engineering schemes for obfuscated malware analysis. Intelligent Systems with Applications, 20, 200283.
  • Cevallos-Salas, D., Grijalva, F., Estrada-Jiménez, J., Benítez, D., & Andrade, R. (2024). Obfuscated Privacy Malware Classifiers based on Memory Dumping Analysis. IEEE Access.
  • Nugraha, A., & Zeniarja, J. (2022). Malware Detection Using Decision Tree Algorithm Based on Memory Features Engineering. Journal of Applied Intelligent System, 7(3), 206-210b
  • Noor, B., & Qadir, S. (2023). Machine Learning and Deep Learning Based Model for the Detection of Rootkits Using Memory Analysis. Applied Sciences, 13(19), 10730.
  • Özkam, Y. (2023). Malware Detection in Forensic Memory Dumps: The Use of Deep Meta-Learning Models. Acta Infologica, 7(1), 165-172
  • Yogesh, K. M., Arpitha, S., Stephan, T., Praksha, M., & Raghu, V. (2023, December). Unravelling Obfuscated Malware Through Memory Feature Engineering and Ensemble Learning. In International Conference on Information and Communication Technology for Competitive Strategies (pp. 323-332). Singapore: Springer Nature Singapore.
  • MalMem-Classification, https://github.com/oguzhankirlar/MalMem-Classification, Erişim Tarihi: 24.06.2024.

Malware Classification Using Machine Learning Methods: A Performance Benchmark on CIC-MamMem-2022 Dataset

Yıl 2024, Cilt: 17 Sayı: 2, 165 - 173
https://doi.org/10.54525/bbmd.1504476

Öz

Abstract
Malware or malicious software is software used to disrupt the functioning of computers and mobile devices, collect critical information, gain access to private computer systems, and display unwanted advertisements. Machine learning-based intrusion detection/prevention systems are used to detect or block malware in security and antivirus systems. This study aims to classify malware using machine learning methods on the CIC-MamMem-2022 dataset. For the challenging problem of sixteen-class classification on this dataset, the best-known F1 score, precision, recall, and accuracy values in the literature are 69.46%, 70.94%, 69.48%, and 69.48%, respectively. In this study, a particular focus was placed on the sixteen-class classification problem, and better results than the best-known results in the literature were achieved. As a result of the experimental studies, the F1 score, precision, recall, and accuracy values obtained with XGBoost were 75.53%, 75.43%, 75.65%, and 75.53%, respectively.

Kaynakça

  • Carrier, T., Victor, P., Tekeoglu, A., & Lashkari, A. H. (2022, February). Detecting Obfuscated Malware using Memory Feature Engineering. In Icissp (pp. 177-188).
  • Abualhaj, M., Abu-Shareha, A., Shambour, Q., Alsaaidah, A., Al-Khatib, S., & Anbar, M. (2024). Customized K-nearest neighbors’ algorithm for malware detection. International Journal of Data and Network Science, 8(1), 431-438.
  • Shafin, S. S., Karmakar, G., & Mareels, I. (2023). Obfuscated memory malware detection in resource-constrained IoT devices for smart city applications. Sensors, 23(11), 5348.
  • Hasan, S. R., & Dhakal, A. (2023, December). Obfuscated Malware Detection: Investigating Real-World Scenarios Through Memory Analysis. In 2023 IEEE International Conference on Telecommunications and Photonics (ICTP) (pp. 01-05). IEEE.
  • Jiang, Q., Zhao, X., & Huang, K. (2011, June). A feature selection method for malware detection. In 2011 IEEE International Conference on Information and Automation (pp. 890-895). IEEE.
  • Smith, D., Khorsandroo, S., & Roy, K. (2023, February). Supervised and unsupervised learning techniques utilizing malware datasets. In 2023 IEEE 2nd International Conference on AI in Cybersecurity (ICAIC) (pp. 1-7). IEEE.
  • Benkerroum, S., & Chougdali, K. (2023, December). Enhancing Forensic Analysis Using a Machine Learning-based Approach. In 2023 6th International Conference on Advanced Communication Technologies and Networking (CommNet) (pp. 1-6). IEEE.
  • Balasubramanian, K. M., Vasudevan, S. V., Thangavel, S. K., Kumar, G., Srinivasan, K., Tibrewal, A., & Vajipayajula, S. (2023, July). Obfuscated Malware detection using Machine Learning models. In 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT) (pp. 1-8). IEEE.
  • Dener, M., Ok, G., & Orman, A. (2022). Malware detection using memory analysis data in big data environment. Applied Sciences, 12(17), 8604.
  • Mezina, A., & Burget, R. (2022, October). Obfuscated malware detection using dilated convolutional network. In 2022 14th international congress on ultra modern telecommunications and control systems and workshops (ICUMT) (pp. 110-115). IEEE
  • Talukder, M. A., Hasan, K. F., Islam, M. M., Uddin, M. A., Akhter, A., Yousuf, M. A., ... & Moni, M. A. (2023). A dependable hybrid machine learning model for network intrusion detection. Journal of Information Security and Applications, 72, 103405
  • Naeem, H., Dong, S., Falana, O. J., & Ullah, F. (2023). Development of a deep stacked ensemble with process based volatile memory forensics for platform independent malware detection and classification. Expert Systems with Applications, 223, 119952.
  • Dener, M., Ok, G., & Orman, A. (2022). Malware detection using memory analysis data in big data environment. Applied Sciences, 12(17), 8604.
  • Smmarwar, S. K., Gupta, G. P., & Kumar, S. (2024). Android Malware Detection and Identification Frameworks by Leveraging the Machine and Deep Learning Techniques: A Comprehensive Review. Telematics and Informatics Reports, 100130.
  • Al-Qudah, M., Ashi, Z., Alnabhan, M., & Abu Al-Haija, Q. (2023). Effective one-class classifier model for memory dump malware detection. Journal of Sensor and Actuator Networks, 12(1), 5.
  • Alani, M. M., Mashatan, A., & Miri, A. (2023). XMal: A lightweight memory-based explainable obfuscated-malware detector. Computers & Security, 133, 103409.
  • Louk, M. H. L., & Tama, B. A. (2022). Tree-based classifier ensembles for PE malware analysis: A performance revisit. Algorithms, 15(9), 332.
  • Smith, D., Khorsandroo, S., & Roy, K. (2023, February). Supervised and unsupervised learning techniques utilizing malware datasets. In 2023 IEEE 2nd International Conference on AI in Cybersecurity (ICAIC) (pp. 1-7). IEEE.
  • Roshan, K., & Zafar, A. (2024). Ensemble adaptive online machine learning in data stream: a case study in cyber intrusion detection system. International Journal of Information Technology, 1-14.
  • Maniriho, P., Mahmood, A. N., & Chowdhury, M. J. M. (2024). MeMalDet: A memory analysis-based malware detection framework using deep autoencoders and stacked ensemble under temporal evaluations. Computers & Security, 142, 103864.
  • Roy, K. S., Ahmed, T., Udas, P. B., Karim, M. E., & Majumdar, S. (2023). Malhystack: A hybrid stacked ensemble learning framework with feature engineering schemes for obfuscated malware analysis. Intelligent Systems with Applications, 20, 200283.
  • Cevallos-Salas, D., Grijalva, F., Estrada-Jiménez, J., Benítez, D., & Andrade, R. (2024). Obfuscated Privacy Malware Classifiers based on Memory Dumping Analysis. IEEE Access.
  • Nugraha, A., & Zeniarja, J. (2022). Malware Detection Using Decision Tree Algorithm Based on Memory Features Engineering. Journal of Applied Intelligent System, 7(3), 206-210b
  • Noor, B., & Qadir, S. (2023). Machine Learning and Deep Learning Based Model for the Detection of Rootkits Using Memory Analysis. Applied Sciences, 13(19), 10730.
  • Özkam, Y. (2023). Malware Detection in Forensic Memory Dumps: The Use of Deep Meta-Learning Models. Acta Infologica, 7(1), 165-172
  • Yogesh, K. M., Arpitha, S., Stephan, T., Praksha, M., & Raghu, V. (2023, December). Unravelling Obfuscated Malware Through Memory Feature Engineering and Ensemble Learning. In International Conference on Information and Communication Technology for Competitive Strategies (pp. 323-332). Singapore: Springer Nature Singapore.
  • MalMem-Classification, https://github.com/oguzhankirlar/MalMem-Classification, Erişim Tarihi: 24.06.2024.
Toplam 27 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Bilgi Güvenliği Yönetimi
Bölüm Araştırma Makaleleri
Yazarlar

Oğuzhan Kırlar Bu kişi benim 0009-0006-2023-1457

Gamze Peksöz Akın Bu kişi benim 0009-0003-5239-1955

Meltem Kurt Pehlivanoğlu 0000-0002-7581-9390

Erken Görünüm Tarihi 3 Aralık 2024
Yayımlanma Tarihi
Gönderilme Tarihi 24 Haziran 2024
Kabul Tarihi 9 Ağustos 2024
Yayımlandığı Sayı Yıl 2024 Cilt: 17 Sayı: 2

Kaynak Göster

IEEE O. Kırlar, G. Peksöz Akın, ve M. Kurt Pehlivanoğlu, “Makine Öğrenmesi Yöntemleri Kullanılarak Kötü Amaçlı Yazılım Sınıflandırması: CIC-MamMem-2022 Veri Kümesi Üzerinde Bir Performans Karşılaştırması”, bbmd, c. 17, sy. 2, ss. 165–173, 2024, doi: 10.54525/bbmd.1504476.