Advanced Android Malware Detection: Merging Deep Learning and XGBoost Techniques

Esra Kavalcı Yılmaz; Rezan Bakır

doi:10.17671/gazibtd.1553548

Araştırma Makalesi

Advanced Android Malware Detection: Merging Deep Learning and XGBoost Techniques

Yıl 2025, Cilt: 18 Sayı: 1, 45 - 61, 31.01.2025

Esra Kavalcı Yılmaz , Rezan Bakır

https://doi.org/10.17671/gazibtd.1553548

Cited By: 1

Öz

The increasing importance of Android devices in our lives brings with it the need to secure personal information stored on these devices, such as contact details, documents, location data, and browser data. These devices are often targeted by attacks and malware designed to steal this data. In response, this work takes a novel approach to Android malware detection by integrating deep learning with traditional machine learning algorithms. An extensive experimental study was conducted using the DroidCollector network traffic analysis dataset. Eight different deep learning methods are analysed for malware classification. In the first phase, experiments were conducted on both original and stabilised datasets and the most effective methods were identified. In the second phase, the best performing deep learning methods were combined with XGBoost for classification. This hybrid approach increased classification success by 3-4%. The highest F1 and accuracy values obtained after 150 epochs of training with BiLSTM+XGBoost were 95.12% and 99.33% respectively. These results highlight the superiority of combining deep learning and traditional machine learning techniques over individual models and significantly improve classification accuracy. This integrated method provides a very important strategy for developing high-performance models for various applications.

Anahtar Kelimeler

Malware detection , machine learning , deep learning , XGBoost

Kaynakça

J. Qiu, J. Zhang, W. Luo, L. Pan, S. Nepal, and Y. Xiang, “A Survey of Android Malware Detection with Deep Neural Models”, ACM Comput. Surv., c. 53, sy 6, s. 126:1-126:36, 2020.
H. Zhu, Y. Li, L. Wang, and V. S. Sheng, “A multi-model ensemble learning framework for imbalanced android malware detection”, Expert Systems with Applications, c. 234, s. 120952, 2023.
H. Bakır and R. Bakır, “DroidEncoder: Malware detection using auto-encoder based feature extractor and machine learning algorithms”, Computers and Electrical Engineering, c. 110, s. 108804, 2023.
O. N. Elayan and A. M. Mustafa, “Android Malware Detection Using Deep Learning”, Procedia Computer Science, c. 184, ss. 847-852, 2021.
K. Bakour and H. M. Ünver, “DeepVisDroid: android malware detection by hybridizing image-based features with deep learning techniques”, Neural Comput & Applic, c. 33, sy 18, ss. 11499-11516, 2021.
H. AlOmari, Q. M. Yaseen, and M. A. Al-Betar, “A Comparative Analysis of Machine Learning Algorithms for Android Malware Detection”, Procedia Computer Science, c. 220, ss. 763-768, 2023
A. Arthi., K. Aggarwal, R. Karthikeyan, S. Kayalvili, S. S, and A. Srivastava, “Hybrid Multimodal Machine Learning Driven Android Malware Recognition and Classification Model”, 2023 7th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India: IEEE, ss. 1555-1560, 2023.
P. Yadav, N. Menon, V. Ravi, S. Vishvanathan, and T. D. Pham, “EfficientNet convolutional neural networks-based Android malware detection”, Computers & Security, c. 115, s. 102622, 2022.
R. Yumlembam, B. Issac, S. M. Jacob, and L. Yang, “IoT-Based Android Malware Detection Using Graph Neural Network With Adversarial Defense”, IEEE Internet of Things Journal, c. 10, sy 10, ss. 8432-8444, 2023.
Z. Z. Jundi and H. Alyasiri, “Android Malware Detection Based on Grammatical Evaluation Algorithm and XGBoost”, 2023 Al-Sadiq International Conference on Communication and Information Technology (AICCIT), Al-Muthana, Iraq: IEEE, ss. 70-75, 2023.
M. A. Mohammed, M. Asante, S. Alornyo, and B. O. Essah, “Android applications classification with deep neural networks”, Iran J Comput Sci, c. 6, sy 3, ss. 221-232, 2023.
J. Tang et al., “Android malware detection based on a novel mixed bytecode image combined with attention mechanism”, Journal of Information Security and Applications, c. 82, s. 103721, 2024
Y. Seyfari and A. Meimandi, “A new approach to android malware detection using fuzzy logic-based simulated annealing and feature selection”, Multimed Tools Appl, c. 83, sy 4, ss. 10525-10549, 2024
X. Fu, C. Jiang, C. Li, J. Li, X. Zhu, and F. Li, “A hybrid approach for Android malware detection using improved multi- scale convolutional neural networks and residual networks”, Expert Systems with Applications, c. 249, s. 123675, 2024.
Z. Liu, R. Wang, N. Japkowicz, H. M. Gomes, B. Peng, and W. Zhang, “SeGDroid: An Android malware detection method based on sensitive function call graph learning”, Expert Systems with Applications, c. 235, s. 121125, 2024.
R. Raman, K. R. Nirmal, A. Gehlot, S. Trivedi, D. Sain, and R. Ponnusamy, “Detecting Android Malware and Sensitive Data Flows Using Machine Learning Techniques”, 2022 5th International Conference on Contemporary Computing and Informatics (IC3I), Uttar Pradesh, India: IEEE, ss. 1694-1698, 2022.
M. M. Alani and A. I. Awad, “AdStop: Efficient flow-based mobile adware detection using machine learning”, Computers & Security, c. 117, s. 102718, 2022.
A. Duran and H. Bakır, “Hiperparametreleri Ayarlanmış Makine Öğrenimi Algoritmalarını Kullanarak Android Sistemlerde Kötü Amaçlı Yazılım Tespiti”, Uluslararası Sivas Bilim ve Teknoloji Üniversitesi Dergisi, c. 2, sy 1, Art. sy 1, 2023.
E. Baghirov, “Evaluating the Performance of Different Machine Learning Algorithms for Android Malware Detection”, 2023 5th International Conference on Problems of Cybernetics and Informatics (PCI), Baku, Azerbaijan: IEEE, ss. 1-4, 2023.
A. Zhang, H. Yu, S. Zhou, Z. Huan, and X. Yang, “Instance weighted SMOTE by indirectly exploring the data distribution”, Knowledge-Based Systems, c. 249, s. 108919, 2022.
M. G. Lanjewar, K. G. Panchbhai, and L. B. Patle, “Fusion of transfer learning models with LSTM for detection of breast cancer using ultrasound images”, Computers in Biology and Medicine, c. 169, s. 107914, 2024.
W.-C. Lin, C.-F. Tsai, Y.-H. Hu, and J.-S. Jhang, “Clustering- based undersampling in class-imbalanced data”, Information Sciences, c. 409-410, ss. 17-26, 2017.
R. Ghanem and H. Erbay, “Spam detection on social networks using deep contextualized word representation”, Multimed Tools Appl, c. 82, sy 3, ss. 3697-3712, 2023.
H. Bakir and R. Bakir, “Evaluating The Robustness of Yolo Object Detection Algorithm in Terms Of Detecting Objects in Noisy Environment”, Journal of Scientific Reports-A, sy 054, ss. 1-25, 2023.
J. B. Lee and H. G. Lee, “Quantitative analysis of automatic voice disorder detection studies for hybrid feature and classifier selection”, Biomedical Signal Processing and Control, c. 91, s. 106014, 2024.
Y. Alaca and Y. Çelik, “Cyber attack detection with QR code images using lightweight deep learning models”, Computers & Security, c. 126, s. 103065, 2023.
J. Zhang, W. Gong, L. Ye, F. Wang, Z. Shangguan, and Y. Cheng, “A Review of deep learning methods for denoising of medical low-dose CT images”, Computers in Biology and Medicine, s. 108112, 2024.
S. Kaushal, D. K. Tammineni, P. Rana, M. Sharma, K. Sridhar, and H.-H. Chen, “Computer vision and deep learning-based approaches for detection of food nutrients/nutrition: New insights and advances”, Trends in Food Science & Technology, c. 146, s. 104408, 2024.
S. Raziani and M. Azimbagirad, “Deep CNN hyperparameter optimization algorithms for sensor-based human activity recognition”, Neuroscience Informatics, c. 2, sy 3, s. 100078, 2022.
S. Bhardwaj and M. Dave, “Enhanced neural network-based attack investigation framework for network forensics: Identification, detection, and analysis of the attack”, Computers& Security, c. 135, s. 103521, 2023.
E. K. Yılmaz, K. Adem, S. Kılıçarslan, and H. A. Aydın, “Classification of lemon quality using hybrid model based on Stacked AutoEncoder and convolutional neural network”, Eur Food Res Technol, c. 249, sy 6, ss. 1655-1667, 2023.
N. Raj, “Prediction of Stock Market Using LSTM-RNN Model”, içinde 2023 International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS), Erode, India: IEEE, ss. 623-628, 2023.
Y. Yang, Chaoluomeng, and N. Razmjooy, “Early detection of brain tumors: Harnessing the power of GRU networks and hybrid dwarf mongoose optimization algorithm”, Biomedical Signal Processing and Control, c. 91, s. 106093, 2024.
W. Zheng, P. Cheng, Z. Cai, and Y. Xiao, “Research on Network Attack Detection Model Based on BiGRU-Attention”, içinde 2022 4th International Conference on Frontiers Technology of Information and Computer (ICFTIC), Qingdao, China: IEEE, ss. 979-982, 2022.
E. K. Yılmaz and M. A. Akcayol, “SUST-DDD: A Real-Drive Dataset for Driver Drowsiness Detection”, Proceeding of the 31st Conference of Fruct Associatıon, 2022.
R. Ghanem, H. Erbay, and K. Bakour, “Contents-Based Spam Detection on Social Networks Using RoBERTa Embedding and Stacked BLSTM”, SN COMPUT. SCI., c. 4, sy 4, s. 380, 2023.
R. Wang, X. Ji, S. Xu, Y. Tian, S. Jiang, and R. Huang, “An empirical assessment of different word embedding and deep learning models for bug assignment”, Journal of Systems and Software, c. 210, s. 111961, 2024.
T. Wang, L. Fu, Y. Zhou, and S. Gao, “Service price forecasting of urban charging infrastructure by using deep stacked CNN- BiGRU network”, Engineering Applications of Artificial Intelligence, c. 116, s. 105445, Kas. 2022.
B. Song, Y. Liu, J. Fang, W. Liu, M. Zhong, and X. Liu, “An optimized CNN-BiLSTM network for bearing fault diagnosis under multiple working conditions with limited training samples”, Neurocomputing, c. 574, s. 127284, 2024.
B. Samia, Z. Soraya, and M. Malika, “Fashion Images Classification using Machine Learning, Deep Learning and Transfer Learning Models”, içinde 2022 7th International Conference on Image and Signal Processing and their Applications (ISPA, ss. 1-5), 2022.
W. Cao, Y. Liu, H. Mei, H. Shang, and Y. Yu, “Short-term district power load self-prediction based on improved XGBoost model”, Engineering Applications of Artificial Intelligence, c. 126, s. 106826, 2023.
A. Maleki, M. Raahemi, and H. Nasiri, “Breast cancer diagnosis from histopathology images using deep neural network and XGBoost”, Biomedical Signal Processing and Control, c. 86, s. 105152, 2023.

Gelişmiş Android Kötü Amaçlı Yazılım Tespiti: Derin Öğrenme ve XGBoost Tekniklerinin Birleştirilmesi

Yıl 2025, Cilt: 18 Sayı: 1, 45 - 61, 31.01.2025

Esra Kavalcı Yılmaz , Rezan Bakır

https://doi.org/10.17671/gazibtd.1553548

Cited By: 1

Öz

Android cihazların hayatımızdaki artan önemi, bu cihazlarda depolanan kişisel bilgileri (iletişim bilgileri, belgeler, konum verileri ve tarayıcı verileri gibi) güvence altına alma ihtiyacını beraberinde getirir. Bu cihazlar genellikle bu verileri çalmak için tasarlanmış saldırılar ve kötü amaçlı yazılımların hedefi olur. Bu duruma önlem olarak, bu çalışma derin öğrenmeyi geleneksel makine öğrenimi algoritmalarıyla entegre ederek Android kötü amaçlı yazılım tespitine yeni bir yaklaşım sunmaktadır. DroidCollector ağ trafiği analizi veri kümesi kullanılarak kapsamlı bir deneysel çalışma yürütülmüştür. Kötü amaçlı yazılım sınıflandırması için sekiz farklı derin öğrenme yöntemi analiz edilmiştir. İlk aşamada, hem orijinal hem de önişlemden geçirilmiş (SMOTE, SMOTETomek, ClusterCentroids) veri kümeleri üzerinde deneyler yürütülmüş ve en etkili yöntemler belirlenmiştir. İkinci aşamada, en iyi performans gösteren derin öğrenme yöntemleri sınıflandırma için XGBoost ile birleştirilmiştir. Bu hibrit yaklaşım, sınıflandırma başarısını %3-4 oranında artırmıştır. BiLSTM + XGBoost modelinin 150 epoch ile eğitilmesiyle elde edilen en yüksek F1-score ve doğruluk değerleri sırasıyla %95,12 ve %99,33 olmuştur. Bu sonuçlar, derin öğrenme ve geleneksel makine öğrenimi tekniklerinin bireysel modellere göre birleştirilmesinin üstünlüğünü vurgular ve sınıflandırma doğruluğunu önemli ölçüde iyileştirir. Bu hibrit yöntem, çeşitli uygulamalar için yüksek performanslı modeller geliştirmek amacıyla önemli bir strateji sunmaktadır.

Anahtar Kelimeler

Kötü amaçlı yazılım tespiti , makine öğrenmesi , derin öğrenme , XGBoost

Kaynakça

J. Qiu, J. Zhang, W. Luo, L. Pan, S. Nepal, and Y. Xiang, “A Survey of Android Malware Detection with Deep Neural Models”, ACM Comput. Surv., c. 53, sy 6, s. 126:1-126:36, 2020.
H. Zhu, Y. Li, L. Wang, and V. S. Sheng, “A multi-model ensemble learning framework for imbalanced android malware detection”, Expert Systems with Applications, c. 234, s. 120952, 2023.
H. Bakır and R. Bakır, “DroidEncoder: Malware detection using auto-encoder based feature extractor and machine learning algorithms”, Computers and Electrical Engineering, c. 110, s. 108804, 2023.
O. N. Elayan and A. M. Mustafa, “Android Malware Detection Using Deep Learning”, Procedia Computer Science, c. 184, ss. 847-852, 2021.
K. Bakour and H. M. Ünver, “DeepVisDroid: android malware detection by hybridizing image-based features with deep learning techniques”, Neural Comput & Applic, c. 33, sy 18, ss. 11499-11516, 2021.
H. AlOmari, Q. M. Yaseen, and M. A. Al-Betar, “A Comparative Analysis of Machine Learning Algorithms for Android Malware Detection”, Procedia Computer Science, c. 220, ss. 763-768, 2023
A. Arthi., K. Aggarwal, R. Karthikeyan, S. Kayalvili, S. S, and A. Srivastava, “Hybrid Multimodal Machine Learning Driven Android Malware Recognition and Classification Model”, 2023 7th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India: IEEE, ss. 1555-1560, 2023.
P. Yadav, N. Menon, V. Ravi, S. Vishvanathan, and T. D. Pham, “EfficientNet convolutional neural networks-based Android malware detection”, Computers & Security, c. 115, s. 102622, 2022.
R. Yumlembam, B. Issac, S. M. Jacob, and L. Yang, “IoT-Based Android Malware Detection Using Graph Neural Network With Adversarial Defense”, IEEE Internet of Things Journal, c. 10, sy 10, ss. 8432-8444, 2023.
Z. Z. Jundi and H. Alyasiri, “Android Malware Detection Based on Grammatical Evaluation Algorithm and XGBoost”, 2023 Al-Sadiq International Conference on Communication and Information Technology (AICCIT), Al-Muthana, Iraq: IEEE, ss. 70-75, 2023.
M. A. Mohammed, M. Asante, S. Alornyo, and B. O. Essah, “Android applications classification with deep neural networks”, Iran J Comput Sci, c. 6, sy 3, ss. 221-232, 2023.
J. Tang et al., “Android malware detection based on a novel mixed bytecode image combined with attention mechanism”, Journal of Information Security and Applications, c. 82, s. 103721, 2024
Y. Seyfari and A. Meimandi, “A new approach to android malware detection using fuzzy logic-based simulated annealing and feature selection”, Multimed Tools Appl, c. 83, sy 4, ss. 10525-10549, 2024
X. Fu, C. Jiang, C. Li, J. Li, X. Zhu, and F. Li, “A hybrid approach for Android malware detection using improved multi- scale convolutional neural networks and residual networks”, Expert Systems with Applications, c. 249, s. 123675, 2024.
Z. Liu, R. Wang, N. Japkowicz, H. M. Gomes, B. Peng, and W. Zhang, “SeGDroid: An Android malware detection method based on sensitive function call graph learning”, Expert Systems with Applications, c. 235, s. 121125, 2024.
R. Raman, K. R. Nirmal, A. Gehlot, S. Trivedi, D. Sain, and R. Ponnusamy, “Detecting Android Malware and Sensitive Data Flows Using Machine Learning Techniques”, 2022 5th International Conference on Contemporary Computing and Informatics (IC3I), Uttar Pradesh, India: IEEE, ss. 1694-1698, 2022.
M. M. Alani and A. I. Awad, “AdStop: Efficient flow-based mobile adware detection using machine learning”, Computers & Security, c. 117, s. 102718, 2022.
A. Duran and H. Bakır, “Hiperparametreleri Ayarlanmış Makine Öğrenimi Algoritmalarını Kullanarak Android Sistemlerde Kötü Amaçlı Yazılım Tespiti”, Uluslararası Sivas Bilim ve Teknoloji Üniversitesi Dergisi, c. 2, sy 1, Art. sy 1, 2023.
E. Baghirov, “Evaluating the Performance of Different Machine Learning Algorithms for Android Malware Detection”, 2023 5th International Conference on Problems of Cybernetics and Informatics (PCI), Baku, Azerbaijan: IEEE, ss. 1-4, 2023.
A. Zhang, H. Yu, S. Zhou, Z. Huan, and X. Yang, “Instance weighted SMOTE by indirectly exploring the data distribution”, Knowledge-Based Systems, c. 249, s. 108919, 2022.
M. G. Lanjewar, K. G. Panchbhai, and L. B. Patle, “Fusion of transfer learning models with LSTM for detection of breast cancer using ultrasound images”, Computers in Biology and Medicine, c. 169, s. 107914, 2024.
W.-C. Lin, C.-F. Tsai, Y.-H. Hu, and J.-S. Jhang, “Clustering- based undersampling in class-imbalanced data”, Information Sciences, c. 409-410, ss. 17-26, 2017.
R. Ghanem and H. Erbay, “Spam detection on social networks using deep contextualized word representation”, Multimed Tools Appl, c. 82, sy 3, ss. 3697-3712, 2023.
H. Bakir and R. Bakir, “Evaluating The Robustness of Yolo Object Detection Algorithm in Terms Of Detecting Objects in Noisy Environment”, Journal of Scientific Reports-A, sy 054, ss. 1-25, 2023.
J. B. Lee and H. G. Lee, “Quantitative analysis of automatic voice disorder detection studies for hybrid feature and classifier selection”, Biomedical Signal Processing and Control, c. 91, s. 106014, 2024.
Y. Alaca and Y. Çelik, “Cyber attack detection with QR code images using lightweight deep learning models”, Computers & Security, c. 126, s. 103065, 2023.
J. Zhang, W. Gong, L. Ye, F. Wang, Z. Shangguan, and Y. Cheng, “A Review of deep learning methods for denoising of medical low-dose CT images”, Computers in Biology and Medicine, s. 108112, 2024.
S. Kaushal, D. K. Tammineni, P. Rana, M. Sharma, K. Sridhar, and H.-H. Chen, “Computer vision and deep learning-based approaches for detection of food nutrients/nutrition: New insights and advances”, Trends in Food Science & Technology, c. 146, s. 104408, 2024.
S. Raziani and M. Azimbagirad, “Deep CNN hyperparameter optimization algorithms for sensor-based human activity recognition”, Neuroscience Informatics, c. 2, sy 3, s. 100078, 2022.
S. Bhardwaj and M. Dave, “Enhanced neural network-based attack investigation framework for network forensics: Identification, detection, and analysis of the attack”, Computers& Security, c. 135, s. 103521, 2023.
E. K. Yılmaz, K. Adem, S. Kılıçarslan, and H. A. Aydın, “Classification of lemon quality using hybrid model based on Stacked AutoEncoder and convolutional neural network”, Eur Food Res Technol, c. 249, sy 6, ss. 1655-1667, 2023.
N. Raj, “Prediction of Stock Market Using LSTM-RNN Model”, içinde 2023 International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS), Erode, India: IEEE, ss. 623-628, 2023.
Y. Yang, Chaoluomeng, and N. Razmjooy, “Early detection of brain tumors: Harnessing the power of GRU networks and hybrid dwarf mongoose optimization algorithm”, Biomedical Signal Processing and Control, c. 91, s. 106093, 2024.
W. Zheng, P. Cheng, Z. Cai, and Y. Xiao, “Research on Network Attack Detection Model Based on BiGRU-Attention”, içinde 2022 4th International Conference on Frontiers Technology of Information and Computer (ICFTIC), Qingdao, China: IEEE, ss. 979-982, 2022.
E. K. Yılmaz and M. A. Akcayol, “SUST-DDD: A Real-Drive Dataset for Driver Drowsiness Detection”, Proceeding of the 31st Conference of Fruct Associatıon, 2022.
R. Ghanem, H. Erbay, and K. Bakour, “Contents-Based Spam Detection on Social Networks Using RoBERTa Embedding and Stacked BLSTM”, SN COMPUT. SCI., c. 4, sy 4, s. 380, 2023.
R. Wang, X. Ji, S. Xu, Y. Tian, S. Jiang, and R. Huang, “An empirical assessment of different word embedding and deep learning models for bug assignment”, Journal of Systems and Software, c. 210, s. 111961, 2024.
T. Wang, L. Fu, Y. Zhou, and S. Gao, “Service price forecasting of urban charging infrastructure by using deep stacked CNN- BiGRU network”, Engineering Applications of Artificial Intelligence, c. 116, s. 105445, Kas. 2022.
B. Song, Y. Liu, J. Fang, W. Liu, M. Zhong, and X. Liu, “An optimized CNN-BiLSTM network for bearing fault diagnosis under multiple working conditions with limited training samples”, Neurocomputing, c. 574, s. 127284, 2024.
B. Samia, Z. Soraya, and M. Malika, “Fashion Images Classification using Machine Learning, Deep Learning and Transfer Learning Models”, içinde 2022 7th International Conference on Image and Signal Processing and their Applications (ISPA, ss. 1-5), 2022.
W. Cao, Y. Liu, H. Mei, H. Shang, and Y. Yu, “Short-term district power load self-prediction based on improved XGBoost model”, Engineering Applications of Artificial Intelligence, c. 126, s. 106826, 2023.
A. Maleki, M. Raahemi, and H. Nasiri, “Breast cancer diagnosis from histopathology images using deep neural network and XGBoost”, Biomedical Signal Processing and Control, c. 86, s. 105152, 2023.

Toplam 42 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Yapay Zeka (Diğer)
Bölüm	Makaleler
Yazarlar	Esra Kavalcı Yılmaz 0000-0003-1314-4495 Rezan Bakır 0000-0002-4373-2231
Yayımlanma Tarihi	31 Ocak 2025
Gönderilme Tarihi	20 Eylül 2024
Kabul Tarihi	15 Kasım 2024
Yayımlandığı Sayı	Yıl 2025 Cilt: 18 Sayı: 1

Kaynak Göster

APA	Kavalcı Yılmaz, E., & Bakır, R. (2025). Advanced Android Malware Detection: Merging Deep Learning and XGBoost Techniques. Bilişim Teknolojileri Dergisi, 18(1), 45-61. https://doi.org/10.17671/gazibtd.1553548

Cited By

Towards a robust android malware detection model using explainable deep learning

Journal of Information Security and Applications

https://doi.org/10.1016/j.jisa.2025.104191

Kapak Resmi İndir

Makale Dosyaları

Tam Metin