Hyperparameter Tunning and Feature Selection Methods for Malware Detection

Esra Kavalcı Yılmaz; Halit Bakır

doi:10.2339/politeknik.1243881

Research Article

Hyperparameter Tunning and Feature Selection Methods for Malware Detection

Year 2024, , 343 - 353, 29.02.2024

Esra Kavalcı Yılmaz , Halit Bakır

https://doi.org/10.2339/politeknik.1243881

Cited By: 4

Abstract

Smartphones have started to take an essential place in every aspect of our lives with the developing technology. All kinds of transactions, from daily routine work to business meetings, payments, and personal transactions, started to be done via smartphones. Therefore, there is a significant amount of very important user information stored in these devices which makes them a target for malware developers. For these reasons, machine learning (ML) methods have been used to detect malicious software on android devices quickly and reliably. In this study, a machine learning-based Android malware detection system has been developed, optimized, and tested. To this end, firstly, the data in the dataset has been balanced with 3 different methods namely SMOTE, SMOTETomek and ClusterCentroids. Afterward, the obtained results have been tried to be optimized by using different feature selection approaches including mRMR, Mutual Information, Select From Model, and Select k Best. Finally, the most two successful methods from the five tested ML algorithms (i.e. RF, SVM, LR, XGBoost, and ETC) have been tuned using GridSearch, Random Search, and Bayesian Optimization algorithms in order to investigate the effects of hyperparameter tuning on the performance of ML algorithms.

Keywords

Android Malware Detection, Feature Selection, Imbalance Data Sampling, Hyperparameter Tuning

References

[1] Dağlıoğlu A. and Doğru İ. A., "Android İşletim Sisteminde Kötücül Yazılım Tespit Sistemleri", Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi, 11(2): 499-511, Haz. (2020).
[2] Arp D., Spreitzenbarth M., Hubner M., Gascon H. and Rieck K., "Drebin: Effective and explainable detection of android malware in your pocket", Network and Distributed System Security (NDSS) Symposium14: 23-26, (2014).
[3] Wang W., Zhao M. & Wang J. “Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network.”, J Ambient Intell Human Comput 10:3035–3043 (2019).
[4] Xiao X., Zhang S., Mercaldo F., Hu G., Sangaiah A.K. “Android malware detection based on system call sequences and LSTM.”, Multimed Tools Appl 78, 3979–3999 (2019).
[5] Alotaibi A., "Identifying Malicious Software Using Deep Residual Long-Short Term Memory", in IEEE Access, 7, 163128-163137, (2019).
[6] Ünver H.M., Bakour K. “Android malware detection based on image-based features and machine learning techniques.”, SN Appl. Sci. 2, 1299, (2020).
[7] Baldini G. and Geneiatakis D., "A Performance Evaluation on Distance Measures in KNN for Mobile Malware Detection," 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), (2019).
[8] Fiky A. H. E., Elshenawy A. and Madkour M. A., "Detection of Android Malware using Machine Learning," 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), (2021).
[9] Gao H., Cheng S., Zhang W., “GDroid: Android malware detection and classification with graph convolutional network”, Computers & Security, 106, (2021).
[10] Cao M., Badihi S., Ahmed K., Xiong P. and Rubin J., "On Benign Features in Malware Detection," 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), 1234-1238, (2020).
[11] Bakour K., Ünver H.M. “VisDroid: Android malware classification based on local and global image features, bag of visual words and machine learning techniques.”, Neural Comput & Applic 33, 3133–3153, (2021).
[12] Bakour K., Ünver H.M. “DeepVisDroid: android malware detection by hybridizing image-based features with deep learning techniques.”, Neural Comput & Applic 33, 11499–11516, (2021).
[13] Yerima S. Y. and Sezer S., "DroidFusion: A Novel Multilevel Classifier Fusion Approach for Android Malware Detection," in IEEE Transactions on Cybernetics, 49(2):453-466, Feb. (2019).
[14] Srinilta C. and Kanharattanachai S., "Application of Natural Neighbor-based Algorithm on Oversampling SMOTE Algorithms," 2021 7th International Conference on Engineering, Applied Sciences and Technology (ICEAST), 217-220, (2021).
[15] Tallo T. E. and Musdholifah A., "The Implementation of Genetic Algorithm in Smote (Synthetic Minority Oversampling Technique) for Handling Imbalanced Dataset Problem," 2018 4th International Conference on Science and Technology (ICST), 1-4, (2018).
[16] Sahid M. A., Hasan M., Akter N. and Tareq M. M. R., "Effect of Imbalance Data Handling Techniques to Improve the Accuracy of Heart Disease Prediction using Machine Learning and Deep Learning," 2022 IEEE Region 10 Symposium (TENSYMP), pp. 1-6, (2022).
[17] Zheng H., Sherazi S. W. A. and Lee J. Y., "A Stacking Ensemble Prediction Model for the Occurrences of Major Adverse Cardiovascular Events in Patients With Acute Coronary Syndrome on Imbalanced Data," in IEEE Access, 9, 113692-113704, (2021).
[18] Wang G., Lauri F. and Hassani A. H. E., "Feature Selection by mRMR Method for Heart Disease Diagnosis," in IEEE Access, 10, 100786-100796, (2022).
[19] Zhou H. and Peng C., "Oil Spills Identification in SAR Image Using mRMR and SVM Model," 2018 5th International Conference on Information Science and Control Engineering (ICISCE), (2018).
[20] Baruah H. S., Thakur J., Sarmah S. and Hoque N., "A Feature Selection Method using PSO-MI," 2020 International Conference on Computational Performance Evaluation (ComPE), 280-284, (2020).
[21] Pirbazari A. M., Chakravorty A. and Rong C., "Evaluating Feature Selection Methods for Short-Term Load Forecasting," 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), 1-8, (2019).
[22] Agaal A. and Essgaer M., "Influence of Feature Selection Methods on Breast Cancer Early Prediction Phase using Classification and Regression Tree," 2022 International Conference on Engineering & MIS (ICEMIS), 1-6, (2022).
[23] Bakır, Halit, and Rezan Bakır. "DroidEncoder: Malware detection using auto-encoder based feature extractor and machine learning algorithms." Computers and Electrical Engineering 110: 108804, (2023).
[24] Bakır, Halit, Ayşe Nur Çayır, and Tuğba Selcen Navruz. "A comprehensive experimental study for analyzing the effects of data augmentation techniques on voice classification." Multimedia Tools and Applications: 1-28, (2023).
[25] Duran, Abdulmuttalip, and Halit BAKIR. "Hiperparametreleri Ayarlanmış Makine Öğrenimi Algoritmalarını Kullanarak Android Sistemlerde Kötü Amaçlı Yazılım Tespiti." International Journal of Sivas University of Science and Technology 2, no. 1: 1-19, (2023).
[26] Özcan, Büşra, and Halit Bakır. "Yapay Zeka Destekli Beyin Görüntüleri Üzerinde Tümör Tespiti." In International Conference on Pioneer and Innovative Studies, 1,297-306. (2023).
[27] Doğan, E. R. O. L., and Halit BAKIR. "Hiperparemetreleri Ayarlanmış Makine Öğrenmesi Yöntemleri Kullanılarak Ağdaki Saldırıların Tespiti." In International Conference on Pioneer and Innovative Studies, 1, 274-286. (2023).
[28] Demircioğlu, Ufuk, Asaf Sayil, and Halit Bakır. "Detecting Cutout Shape and Predicting Its Location in Sandwich Structures Using Free Vibration Analysis and Tuned Machine-Learning Algorithms." Arabian Journal for Science and Engineering: 1-14, (2023).
[29] Bakır, Halit, and Kholoud Elmabruk. "Deep learning-based approach for detection of turbulence-induced distortions in free-space optical communication links." Physica Scripta 98, no. 6: 065521, (2023).
[30] Ragab M. G., Abdulkadir S. J. and Aziz N., "Random Search One Dimensional CNN for Human Activity Recognition," 2020 International Conference on Computational Intelligence (ICCI), 86-91, (2020).
[31] Zhang J., Chen Y., Yang K., Zhao J. and Yan X., "Insider Threat Detection Based on Adaptive Optimization DBN by Grid Search," 2019 IEEE International Conference on Intelligence and Security Informatics (ISI), 173-175, (2019).
[32] Tanyıldızı E. and Demirtaş F., "Hiper Parametre Optimizasyonu Hyper Parameter Optimization," 2019 1st International Informatics and Software Engineering Conference (UBMYK), 1-5, (2019).
[33] Nguyen V., "Bayesian Optimization for Accelerating Hyper-Parameter Tuning," 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), pp. 302-305, (2019).
[34] Katırcı R., Yılmaz E. K., Kaynar O., Zontul M., “Automated evaluation of Cr-III coated parts using Mask RCNN and ML methods”, Surface and Coatings Technology, 422, 127571,ISSN 0257-8972, (2021).
[35] Adem K., “Diagnosis of breast cancer with Stacked autoencoder and Subspace kNN”, Physica A: Statistical Mechanics and its Applications,Volume 551,124591,ISSN 0378-4371, (2020).
[36] Zhao Y., "Credit Card Approval Predictions Using Logistic Regression, Linear SVM and Naïve Bayes Classifier," 2022 International Conference on Machine Learning and Knowledge Engineering (MLKE), 207-211, (2022).
[37] Katirci R., Aktas H. & Zontul M., “The prediction of the ZnNi thickness and Ni % of ZnNi alloy electroplating using a machine learning method”, Transactions of the IMF, 99:3, 162-168, (2021).
[38] D. J, N. J and N. P, "Multimodal Feature Selection for Android Malware Detection Classifiers," 2022 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES), Chennai, India, 1-5, (2022).
[39] E. Odat and Q. M. Yaseen, "A Novel Machine Learning Approach for Android Malware Detection Based on the Co-Existence of Features," in IEEE Access, 11. 15471-15484, (2023).

Kötü Amaçlı Yazılım Algılaması için Hiperparametre Ayarlama ve Özellik Seçim Yöntemleri

Year 2024, , 343 - 353, 29.02.2024

Esra Kavalcı Yılmaz , Halit Bakır

https://doi.org/10.2339/politeknik.1243881

Cited By: 4

Abstract

Gelişen teknoloji ile birlikte akıllı telefonlar hayatımızın her alanında yer almaya başlamıştır. Günlük rutin işlerden önemli toplantılara, ödemelere ve kişisel işlemlere kadar her türlü işlem akıllı telefonlar üzerinden yapılmaya başlandı. Bu durumda, tüm kullanıcı bilgilerinin akıllı telefonlarda saklanması, akıllı telefonları kötü amaçlı yazılım geliştiricileri için bir hedef haline getirmektedir. Bu sebeplerden dolayı android cihazlardaki zararlı yazılımları hızlı ve güvenilir bir şekilde tespit etmek için Makine Öğrenmesi yöntemleri kullanılmaya başlanmıştır. Bu çalışmada öncelikle veri setindeki veriler SMOTE, SMOTETomek ve ClusterCentroids olmak üzere 3 farklı yöntemle dengelenmiştir. Daha sonra mRMR, Mutual Information, Select from Model ve Select k Best özellik seçim modelleri kullanılarak en yüksek doğruluk değeri elde edilmeye çalışılmıştır. Son olarak 5 farklı Makine Öğrenmesi algoritmasından (RF, SVM, LR, XGBoost, ETC) en başarılı 2 yöntem GridSearch, Random Search ve Bayesian Optimization yöntemleri kullanılarak ayarlanmıştır.

Keywords

Adroid kötü amaçlı yazılım tespiti, öznitelik seçimi, dengesiz veri örneklemesi

References

[1] Dağlıoğlu A. and Doğru İ. A., "Android İşletim Sisteminde Kötücül Yazılım Tespit Sistemleri", Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi, 11(2): 499-511, Haz. (2020).
[2] Arp D., Spreitzenbarth M., Hubner M., Gascon H. and Rieck K., "Drebin: Effective and explainable detection of android malware in your pocket", Network and Distributed System Security (NDSS) Symposium14: 23-26, (2014).
[3] Wang W., Zhao M. & Wang J. “Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network.”, J Ambient Intell Human Comput 10:3035–3043 (2019).
[4] Xiao X., Zhang S., Mercaldo F., Hu G., Sangaiah A.K. “Android malware detection based on system call sequences and LSTM.”, Multimed Tools Appl 78, 3979–3999 (2019).
[5] Alotaibi A., "Identifying Malicious Software Using Deep Residual Long-Short Term Memory", in IEEE Access, 7, 163128-163137, (2019).
[6] Ünver H.M., Bakour K. “Android malware detection based on image-based features and machine learning techniques.”, SN Appl. Sci. 2, 1299, (2020).
[7] Baldini G. and Geneiatakis D., "A Performance Evaluation on Distance Measures in KNN for Mobile Malware Detection," 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), (2019).
[8] Fiky A. H. E., Elshenawy A. and Madkour M. A., "Detection of Android Malware using Machine Learning," 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), (2021).
[9] Gao H., Cheng S., Zhang W., “GDroid: Android malware detection and classification with graph convolutional network”, Computers & Security, 106, (2021).
[10] Cao M., Badihi S., Ahmed K., Xiong P. and Rubin J., "On Benign Features in Malware Detection," 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), 1234-1238, (2020).
[11] Bakour K., Ünver H.M. “VisDroid: Android malware classification based on local and global image features, bag of visual words and machine learning techniques.”, Neural Comput & Applic 33, 3133–3153, (2021).
[12] Bakour K., Ünver H.M. “DeepVisDroid: android malware detection by hybridizing image-based features with deep learning techniques.”, Neural Comput & Applic 33, 11499–11516, (2021).
[13] Yerima S. Y. and Sezer S., "DroidFusion: A Novel Multilevel Classifier Fusion Approach for Android Malware Detection," in IEEE Transactions on Cybernetics, 49(2):453-466, Feb. (2019).
[14] Srinilta C. and Kanharattanachai S., "Application of Natural Neighbor-based Algorithm on Oversampling SMOTE Algorithms," 2021 7th International Conference on Engineering, Applied Sciences and Technology (ICEAST), 217-220, (2021).
[15] Tallo T. E. and Musdholifah A., "The Implementation of Genetic Algorithm in Smote (Synthetic Minority Oversampling Technique) for Handling Imbalanced Dataset Problem," 2018 4th International Conference on Science and Technology (ICST), 1-4, (2018).
[16] Sahid M. A., Hasan M., Akter N. and Tareq M. M. R., "Effect of Imbalance Data Handling Techniques to Improve the Accuracy of Heart Disease Prediction using Machine Learning and Deep Learning," 2022 IEEE Region 10 Symposium (TENSYMP), pp. 1-6, (2022).
[17] Zheng H., Sherazi S. W. A. and Lee J. Y., "A Stacking Ensemble Prediction Model for the Occurrences of Major Adverse Cardiovascular Events in Patients With Acute Coronary Syndrome on Imbalanced Data," in IEEE Access, 9, 113692-113704, (2021).
[18] Wang G., Lauri F. and Hassani A. H. E., "Feature Selection by mRMR Method for Heart Disease Diagnosis," in IEEE Access, 10, 100786-100796, (2022).
[19] Zhou H. and Peng C., "Oil Spills Identification in SAR Image Using mRMR and SVM Model," 2018 5th International Conference on Information Science and Control Engineering (ICISCE), (2018).
[20] Baruah H. S., Thakur J., Sarmah S. and Hoque N., "A Feature Selection Method using PSO-MI," 2020 International Conference on Computational Performance Evaluation (ComPE), 280-284, (2020).
[21] Pirbazari A. M., Chakravorty A. and Rong C., "Evaluating Feature Selection Methods for Short-Term Load Forecasting," 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), 1-8, (2019).
[22] Agaal A. and Essgaer M., "Influence of Feature Selection Methods on Breast Cancer Early Prediction Phase using Classification and Regression Tree," 2022 International Conference on Engineering & MIS (ICEMIS), 1-6, (2022).
[23] Bakır, Halit, and Rezan Bakır. "DroidEncoder: Malware detection using auto-encoder based feature extractor and machine learning algorithms." Computers and Electrical Engineering 110: 108804, (2023).
[24] Bakır, Halit, Ayşe Nur Çayır, and Tuğba Selcen Navruz. "A comprehensive experimental study for analyzing the effects of data augmentation techniques on voice classification." Multimedia Tools and Applications: 1-28, (2023).
[25] Duran, Abdulmuttalip, and Halit BAKIR. "Hiperparametreleri Ayarlanmış Makine Öğrenimi Algoritmalarını Kullanarak Android Sistemlerde Kötü Amaçlı Yazılım Tespiti." International Journal of Sivas University of Science and Technology 2, no. 1: 1-19, (2023).
[26] Özcan, Büşra, and Halit Bakır. "Yapay Zeka Destekli Beyin Görüntüleri Üzerinde Tümör Tespiti." In International Conference on Pioneer and Innovative Studies, 1,297-306. (2023).
[27] Doğan, E. R. O. L., and Halit BAKIR. "Hiperparemetreleri Ayarlanmış Makine Öğrenmesi Yöntemleri Kullanılarak Ağdaki Saldırıların Tespiti." In International Conference on Pioneer and Innovative Studies, 1, 274-286. (2023).
[28] Demircioğlu, Ufuk, Asaf Sayil, and Halit Bakır. "Detecting Cutout Shape and Predicting Its Location in Sandwich Structures Using Free Vibration Analysis and Tuned Machine-Learning Algorithms." Arabian Journal for Science and Engineering: 1-14, (2023).
[29] Bakır, Halit, and Kholoud Elmabruk. "Deep learning-based approach for detection of turbulence-induced distortions in free-space optical communication links." Physica Scripta 98, no. 6: 065521, (2023).
[30] Ragab M. G., Abdulkadir S. J. and Aziz N., "Random Search One Dimensional CNN for Human Activity Recognition," 2020 International Conference on Computational Intelligence (ICCI), 86-91, (2020).
[31] Zhang J., Chen Y., Yang K., Zhao J. and Yan X., "Insider Threat Detection Based on Adaptive Optimization DBN by Grid Search," 2019 IEEE International Conference on Intelligence and Security Informatics (ISI), 173-175, (2019).
[32] Tanyıldızı E. and Demirtaş F., "Hiper Parametre Optimizasyonu Hyper Parameter Optimization," 2019 1st International Informatics and Software Engineering Conference (UBMYK), 1-5, (2019).
[33] Nguyen V., "Bayesian Optimization for Accelerating Hyper-Parameter Tuning," 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), pp. 302-305, (2019).
[34] Katırcı R., Yılmaz E. K., Kaynar O., Zontul M., “Automated evaluation of Cr-III coated parts using Mask RCNN and ML methods”, Surface and Coatings Technology, 422, 127571,ISSN 0257-8972, (2021).
[35] Adem K., “Diagnosis of breast cancer with Stacked autoencoder and Subspace kNN”, Physica A: Statistical Mechanics and its Applications,Volume 551,124591,ISSN 0378-4371, (2020).
[36] Zhao Y., "Credit Card Approval Predictions Using Logistic Regression, Linear SVM and Naïve Bayes Classifier," 2022 International Conference on Machine Learning and Knowledge Engineering (MLKE), 207-211, (2022).
[37] Katirci R., Aktas H. & Zontul M., “The prediction of the ZnNi thickness and Ni % of ZnNi alloy electroplating using a machine learning method”, Transactions of the IMF, 99:3, 162-168, (2021).
[38] D. J, N. J and N. P, "Multimodal Feature Selection for Android Malware Detection Classifiers," 2022 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES), Chennai, India, 1-5, (2022).
[39] E. Odat and Q. M. Yaseen, "A Novel Machine Learning Approach for Android Malware Detection Based on the Co-Existence of Features," in IEEE Access, 11. 15471-15484, (2023).

There are 39 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Research Article
Authors	Esra Kavalcı Yılmaz 0000-0003-1314-4495 Halit Bakır 0000-0003-3327-2822
Early Pub Date	September 8, 2023
Publication Date	February 29, 2024
Submission Date	January 30, 2023
Published in Issue	Year 2024

Cite

APA	Kavalcı Yılmaz, E., & Bakır, H. (2024). Hyperparameter Tunning and Feature Selection Methods for Malware Detection. Politeknik Dergisi, 27(1), 343-353. https://doi.org/10.2339/politeknik.1243881
AMA	Kavalcı Yılmaz E, Bakır H. Hyperparameter Tunning and Feature Selection Methods for Malware Detection. Politeknik Dergisi. February 2024;27(1):343-353. doi:10.2339/politeknik.1243881
Chicago	Kavalcı Yılmaz, Esra, and Halit Bakır. “Hyperparameter Tunning and Feature Selection Methods for Malware Detection”. Politeknik Dergisi 27, no. 1 (February 2024): 343-53. https://doi.org/10.2339/politeknik.1243881.
EndNote	Kavalcı Yılmaz E, Bakır H (February 1, 2024) Hyperparameter Tunning and Feature Selection Methods for Malware Detection. Politeknik Dergisi 27 1 343–353.
IEEE	E. Kavalcı Yılmaz and H. Bakır, “Hyperparameter Tunning and Feature Selection Methods for Malware Detection”, Politeknik Dergisi, vol. 27, no. 1, pp. 343–353, 2024, doi: 10.2339/politeknik.1243881.
ISNAD	Kavalcı Yılmaz, Esra - Bakır, Halit. “Hyperparameter Tunning and Feature Selection Methods for Malware Detection”. Politeknik Dergisi 27/1 (February 2024), 343-353. https://doi.org/10.2339/politeknik.1243881.
JAMA	Kavalcı Yılmaz E, Bakır H. Hyperparameter Tunning and Feature Selection Methods for Malware Detection. Politeknik Dergisi. 2024;27:343–353.
MLA	Kavalcı Yılmaz, Esra and Halit Bakır. “Hyperparameter Tunning and Feature Selection Methods for Malware Detection”. Politeknik Dergisi, vol. 27, no. 1, 2024, pp. 343-5, doi:10.2339/politeknik.1243881.
Vancouver	Kavalcı Yılmaz E, Bakır H. Hyperparameter Tunning and Feature Selection Methods for Malware Detection. Politeknik Dergisi. 2024;27(1):343-5.