TY - JOUR T1 - Makine Öğrenmesi Yöntemleri Kullanılarak Kötü Amaçlı Yazılım Sınıflandırması: CIC-MamMem-2022 Veri Kümesi Üzerinde Bir Performans Karşılaştırması TT - Malware Classification Using Machine Learning Methods: A Performance Benchmark on CIC-MamMem-2022 Dataset AU - Kurt Pehlivanoğlu, Meltem AU - Kırlar, Oğuzhan AU - Peksöz Akın, Gamze PY - 2024 DA - December Y2 - 2024 DO - 10.54525/bbmd.1504476 JF - Bilgisayar Bilimleri ve Mühendisliği Dergisi JO - bbmd PB - Akademik Bilişim Vakfı WT - DergiPark SN - 3023-7459 SP - 165 EP - 173 VL - 17 IS - 2 LA - tr AB - Zararlı yazılım veya kötü amaçlı yazılım; bilgisayar ve mobil cihazların işlevlerini bozmak, kritik bilgileri toplamak, özel bilgisayar sistemlerine erişim sağlamak ve istenmeyen reklamları göstermek amacı ile kullanılan yazılımdır. Kötü amaçlı yazılımların güvenlik ve antivirüs sistemlerinde tespit edilebilmesi ya da engellenmesi için makine öğrenmesi tabanlı saldırı tespit/önleme sistemleri kullanılmaktadır. Bu çalışmada CIC-MamMem-2022 veri kümesi üzerinde, makine öğrenmesi yöntemleriyle kötü amaçlı yazılımların sınıflandırılması amaçlanmıştır. Bu veri kümesi üzerinde zorlu bir problem olan on altı sınıf sınıflandırma için literatürde bilinen en iyi F1 ölçüsü, kesinlik, hassasiyet ve doğruluk değerleri sırasıyla %69,46, %70,94, %69,48 ve %69,48 iken; bu çalışmada özellikle on altı sınıf sınıflandırma problemi üzerine odaklanılmış ve literatürde bilinen en iyi sonuçlardan daha iyi sonuçlar elde edilmiştir. Yapılan deneysel çalışmalar sonucunda XGBoost ile F1 ölçüsü, tutturma, bulma ve doğruluk değerleri sırasıyla %75,53, %75,43, %75,65 ve %75,53 olarak elde edilmiştir. KW - Zararlı Yazılım Sınıflandırma KW - Zararlı Yazılım Tespiti KW - Makine öğrenmesi KW - Saldırı Tespit Sistemi N2 - AbstractMalware or malicious software is software used to disrupt the functioning of computers and mobile devices, collect critical information, gain access to private computer systems, and display unwanted advertisements. Machine learning-based intrusion detection/prevention systems are used to detect or block malware in security and antivirus systems. This study aims to classify malware using machine learning methods on the CIC-MamMem-2022 dataset. For the challenging problem of sixteen-class classification on this dataset, the best-known F1 score, precision, recall, and accuracy values in the literature are 69.46%, 70.94%, 69.48%, and 69.48%, respectively. In this study, a particular focus was placed on the sixteen-class classification problem, and better results than the best-known results in the literature were achieved. As a result of the experimental studies, the F1 score, precision, recall, and accuracy values obtained with XGBoost were 75.53%, 75.43%, 75.65%, and 75.53%, respectively. CR - Carrier, T., Victor, P., Tekeoglu, A., & Lashkari, A. H. (2022, February). Detecting Obfuscated Malware using Memory Feature Engineering. In Icissp (pp. 177-188). CR - Abualhaj, M., Abu-Shareha, A., Shambour, Q., Alsaaidah, A., Al-Khatib, S., & Anbar, M. (2024). Customized K-nearest neighbors’ algorithm for malware detection. International Journal of Data and Network Science, 8(1), 431-438. CR - Shafin, S. S., Karmakar, G., & Mareels, I. (2023). Obfuscated memory malware detection in resource-constrained IoT devices for smart city applications. Sensors, 23(11), 5348. CR - Hasan, S. R., & Dhakal, A. (2023, December). Obfuscated Malware Detection: Investigating Real-World Scenarios Through Memory Analysis. In 2023 IEEE International Conference on Telecommunications and Photonics (ICTP) (pp. 01-05). IEEE. CR - Jiang, Q., Zhao, X., & Huang, K. (2011, June). A feature selection method for malware detection. In 2011 IEEE International Conference on Information and Automation (pp. 890-895). IEEE. CR - Smith, D., Khorsandroo, S., & Roy, K. (2023, February). Supervised and unsupervised learning techniques utilizing malware datasets. In 2023 IEEE 2nd International Conference on AI in Cybersecurity (ICAIC) (pp. 1-7). IEEE. CR - Benkerroum, S., & Chougdali, K. (2023, December). Enhancing Forensic Analysis Using a Machine Learning-based Approach. In 2023 6th International Conference on Advanced Communication Technologies and Networking (CommNet) (pp. 1-6). IEEE. CR - Balasubramanian, K. M., Vasudevan, S. V., Thangavel, S. K., Kumar, G., Srinivasan, K., Tibrewal, A., & Vajipayajula, S. (2023, July). Obfuscated Malware detection using Machine Learning models. In 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT) (pp. 1-8). IEEE. CR - Dener, M., Ok, G., & Orman, A. (2022). Malware detection using memory analysis data in big data environment. Applied Sciences, 12(17), 8604. CR - Mezina, A., & Burget, R. (2022, October). Obfuscated malware detection using dilated convolutional network. In 2022 14th international congress on ultra modern telecommunications and control systems and workshops (ICUMT) (pp. 110-115). IEEE CR - Talukder, M. A., Hasan, K. F., Islam, M. M., Uddin, M. A., Akhter, A., Yousuf, M. A., ... & Moni, M. A. (2023). A dependable hybrid machine learning model for network intrusion detection. Journal of Information Security and Applications, 72, 103405 CR - Naeem, H., Dong, S., Falana, O. J., & Ullah, F. (2023). Development of a deep stacked ensemble with process based volatile memory forensics for platform independent malware detection and classification. Expert Systems with Applications, 223, 119952. CR - Dener, M., Ok, G., & Orman, A. (2022). Malware detection using memory analysis data in big data environment. Applied Sciences, 12(17), 8604. CR - Smmarwar, S. K., Gupta, G. P., & Kumar, S. (2024). Android Malware Detection and Identification Frameworks by Leveraging the Machine and Deep Learning Techniques: A Comprehensive Review. Telematics and Informatics Reports, 100130. CR - Al-Qudah, M., Ashi, Z., Alnabhan, M., & Abu Al-Haija, Q. (2023). Effective one-class classifier model for memory dump malware detection. Journal of Sensor and Actuator Networks, 12(1), 5. CR - Alani, M. M., Mashatan, A., & Miri, A. (2023). XMal: A lightweight memory-based explainable obfuscated-malware detector. Computers & Security, 133, 103409. CR - Louk, M. H. L., & Tama, B. A. (2022). Tree-based classifier ensembles for PE malware analysis: A performance revisit. Algorithms, 15(9), 332. CR - Smith, D., Khorsandroo, S., & Roy, K. (2023, February). Supervised and unsupervised learning techniques utilizing malware datasets. In 2023 IEEE 2nd International Conference on AI in Cybersecurity (ICAIC) (pp. 1-7). IEEE. CR - Roshan, K., & Zafar, A. (2024). Ensemble adaptive online machine learning in data stream: a case study in cyber intrusion detection system. International Journal of Information Technology, 1-14. CR - Maniriho, P., Mahmood, A. N., & Chowdhury, M. J. M. (2024). MeMalDet: A memory analysis-based malware detection framework using deep autoencoders and stacked ensemble under temporal evaluations. Computers & Security, 142, 103864. CR - Roy, K. S., Ahmed, T., Udas, P. B., Karim, M. E., & Majumdar, S. (2023). Malhystack: A hybrid stacked ensemble learning framework with feature engineering schemes for obfuscated malware analysis. Intelligent Systems with Applications, 20, 200283. CR - Cevallos-Salas, D., Grijalva, F., Estrada-Jiménez, J., Benítez, D., & Andrade, R. (2024). Obfuscated Privacy Malware Classifiers based on Memory Dumping Analysis. IEEE Access. CR - Nugraha, A., & Zeniarja, J. (2022). Malware Detection Using Decision Tree Algorithm Based on Memory Features Engineering. Journal of Applied Intelligent System, 7(3), 206-210b CR - Noor, B., & Qadir, S. (2023). Machine Learning and Deep Learning Based Model for the Detection of Rootkits Using Memory Analysis. Applied Sciences, 13(19), 10730. CR - Özkam, Y. (2023). Malware Detection in Forensic Memory Dumps: The Use of Deep Meta-Learning Models. Acta Infologica, 7(1), 165-172 CR - Yogesh, K. M., Arpitha, S., Stephan, T., Praksha, M., & Raghu, V. (2023, December). Unravelling Obfuscated Malware Through Memory Feature Engineering and Ensemble Learning. In International Conference on Information and Communication Technology for Competitive Strategies (pp. 323-332). Singapore: Springer Nature Singapore. CR - MalMem-Classification, https://github.com/oguzhankirlar/MalMem-Classification, Erişim Tarihi: 24.06.2024. UR - https://doi.org/10.54525/bbmd.1504476 L1 - https://dergipark.org.tr/tr/download/article-file/4019757 ER -