Research Article
BibTex RIS Cite

ANDROID MALWARE CLASSIFICATION USING BASIC MACHINE LEARNING METHODS

Year 2024, Volume: 11 Issue: 23, 190 - 202, 31.08.2024
https://doi.org/10.54365/adyumbd.1462488

Abstract

Android malware attacks grow in both sophistication and volume day by day, thus android users are vulnerable to cyber-attacks. Researchers have developed many machine learning techniques to detect, block or mitigate these attacks. However, technological advancements, increase in Android mobile devices and the applications used on these devices, also increase problems in terms of user privacy due to malware. In this study, a comprehensive study is presented on the detection and classification of malicious applications using an up-to-date dataset containing 241 attributes. First, incorrect and missing data are detected and the relevant lines are removed, then normalization-based scaling is performed. After this preprocessing step, the data set is randomly divided into 70% training and 30% testing using hold-out cross validation. Finally, classification is carried out using 6 different machine learning methods: Multilayer Perceptron (MLP), Logistic Regression (LOGR), K-Nearest Neighbor (KNN), Decision Tree Classifier (DTC), Random Forest (RF). The comparison of modeling results demonstrates that RF machine learning technique can achieve the best performance with the level of 97% accuracy and the various other metrics for Android malware detection in real-world Android application sets.

References

  • “Smartphone OS market share 2023”, 2024, https://www.idc.com/promo/smartphone-market-share.
  • “Statista, Number of available applications in the Google Play Store from December 2009 to June 2023”, 2024, https://www.statista.com/statistics/266210/number-of-available-applications-in-the-google-play-store.
  • Li, J., Sun, L., Yan, Q., Li, Z., Srisa-An, W., & Ye, H. (2018). Significant permission identification for machine-learning-based android malware detection. IEEE Transactions on Industrial Informatics, 14(7), 3216-3225.
  • “Tom’sguide”, 2024, https://www.tomsguide.com/news/over-400-million-infected-with-android-spyware-delete-these-apps-right-now.
  • Ngo, F. T., Agarwal, A., Govindu, R., & MacDonald, C. (2020). Malicious software threats. The Palgrave Handbook of International Cybercrime and Cyberdeviance, 793-813.
  • Sahs, J., & Khan, L. (2012, August). A machine learning approach to android malware detection. In 2012 European intelligence and security informatics conference (pp. 141-147). IEEE.
  • Yerima, S. Y., Sezer, S., & Muttik, I. (2014, September). Android malware detection using parallel machine learning classifiers. In 2014 Eighth international conference on next generation mobile apps, services and technologies (pp. 37-42). IEEE.
  • Wen, L., & Yu, H. (2017, August). An Android malware detection system based on machine learning. In AIP conference proceedings (Vol. 1864, No. 1). AIP Publishing.
  • Kakavand, M., Dabbagh, M., & Dehghantanha, A. (2018, November). Application of machine learning algorithms for android malware detection. In Proceedings of the 2018 International Conference on Computational Intelligence and Intelligent Systems (pp. 32-36).
  • Li, J., Sun, L., Yan, Q., Li, Z., Srisa-An, W., & Ye, H. (2018). Significant permission identification for machine-learning-based android malware detection. IEEE Transactions on Industrial Informatics, 14(7), 3216-3225.
  • TAHTACI, B., & CANBAY, B. (2020, October). Android malware detection using machine learning. In 2020 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1-6). IEEE.
  • Liu, K., Xu, S., Xu, G., Zhang, M., Sun, D., & Liu, H. (2020). A review of android malware detection approaches based on machine learning. IEEE Access, 8, 124579-124607.
  • Kouliaridis, V., & Kambourakis, G. (2021). A comprehensive survey on machine learning techniques for android malware detection. Information, 12(5), 185.
  • Odusami, M., Abayomi-Alli, O., Misra, S., Shobayo, O., Damasevicius, R., & Maskeliunas, R. (2018). Android malware detection: A survey. In Applied Informatics: First International Conference, ICAI 2018, Bogotá, Colombia, November 1-3, 2018, Proceedings 1 (pp. 255-266). Springer International Publishing.
  • Pan, Y., Ge, X., Fang, C., & Fan, Y. (2020). A systematic literature review of android malware detection using static analysis. IEEE Access, 8, 116363-116379.]
  • “Kaggle, Android Malware Detection”, 2024, https://www.kaggle.com/datasets/joebeachcapital/tuandromd.
  • Chiong, R., & Theng, L. B. (2008, June). A hybrid naive bayes approach for information filtering. In 2008 3rd IEEE Conference on Industrial Electronics and Applications (pp. 1003-1007). IEEE.
  • Zhao, C., Gao, Y., He, J., & Lian, J. (2012). Recognition of driving postures by multiwavelet transform and multilayer perceptron classifier. Engineering Applications of Artificial Intelligence, 25(8), 1677-1686.
  • Tsangaratos, P., & Ilia, I. (2016). Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size. Catena, 145, 164-179.
  • Viswanath, P., & Sarma, T. H. (2011, September). An improvement to k-nearest neighbor classifier. In 2011 IEEE Recent Advances in Intelligent Computational Systems (pp. 227-231). IEEE.
  • Du, W., & Zhan, Z. (2002). Building decision tree classifier on private data.
  • Belgiu, M., & Drăguţ, L. (2016). Random forest in remote sensing: A review of applications and future directions. ISPRS journal of photogrammetry and remote sensing, 114, 24-31.
  • Eroğlu, K., & Palabaş, T. (2016, December). The impact on the classification performance of the combined use of different classification methods and different ensemble algorithms in chronic kidney disease detection. In 2016 National Conference on Electrical, Electronics and Biomedical Engineering (ELECO) (pp. 512-516). IEEE.
  • Palabas, T., & Eroğlu, K. (2018, May). Occupancy detection from temperature, humidity, light, CO2 and humidity ratio measurements using machine learning techniques. In 2018 26th Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE.
  • Erkaymaz, O., & Palabaş, T. (2018, May). Classification of cervical cancer data and the effect of random subspace algorithms on classification performance. In 2018 26th Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE.
  • Bowers, A. J., Sprott, R., & Taff, S. A. (2012). Do we know who will drop out? A review of the predictors of dropping out of high school: Precision, sensitivity, and specificity. The High School Journal, 77-100.
  • Chu, K. (1999). An introduction to sensitivity, specificity, predictive values and likelihood ratios. Emergency Medicine, 11(3), 175-181.
  • Vieira, S. M., Kaymak, U., & Sousa, J. M. (2010, July). Cohen's kappa coefficient as a performance measure for feature selection. In International conference on fuzzy systems (pp. 1-8). IEEE.
  • Chicco, D., Warrens, M. J., & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 7, e623.
  • Linden, A. (2006). Measuring diagnostic and predictive accuracy in disease management: an introduction to receiver operating characteristic (ROC) analysis. Journal of evaluation in clinical practice, 12(2), 132-139.
  • "BusinessofApps Android Statistics (2023)", 2024, https://www.businessofapps.com/data/android-statistics.
  • Borah, P., Bhattacharyya, D. K., & Kalita, J. K. (2020, December). Malware dataset generation and evaluation. In 2020 IEEE 4th Conference on Information & Communication Technology (CICT) (pp. 1-6). IEEE.
  • Ambekar, N. G., Devi, N. N., Thokchom, S., & Yogita. (2024). TabLSTMNet: enhancing android malware classification through integrated attention and explainable AI. Microsystem Technologies, 1-19.
  • Li J, He J, Li W, Fang W, Yang G, Li T (2023) Syndroid: an adaptive enhanced android malware classification method based on CTGAN-SVM. Comput Secur 137:103604
  • Shu Z, Yan G (2023) Eagle: evasion attacks guided by local explanations against android malware classification. IEEE Trans Depend Secure Comput. https://doi.org/10.1109/TDSC.2023.3324265.
  • Li J, He J, Li W, Fang W, Yang G, Li T (2023) Syndroid: an adaptive enhanced android malware classification method based on CTGAN-SVM. Comput Secur 137:103604
  • Islam R, Sayed MI, Saha S, Hossain MJ, Masud MA (2023) Android malware classification using optimum feature selection and ensemble machine learning. Internet Things Cyber-Phys Syst 3:100–111
  • Ullah F, Cheng X, Mostarda L, Jabbar S (2023) Android-iot malware classification and detection approach using deep url features analysis. J Database Manag 34(2):1–26

TEMEL MAKİNE ÖĞRENİMİ YÖNTEMLERİNİ KULLANARAK ANDROİD KÖTÜ AMAÇLI YAZILIM SINIFLANDIRMASI

Year 2024, Volume: 11 Issue: 23, 190 - 202, 31.08.2024
https://doi.org/10.54365/adyumbd.1462488

Abstract

Android kötü amaçlı yazılım saldırılarının hem karmaşıklığı hem de hacmi her geçen gün artıyor, bu nedenle android kullanıcıları siber saldırılara karşı savunmasız kalıyorlar. Araştırmacılar bu saldırıları tespit etmek, engellemek veya azaltmak için birçok makine öğrenmesi tekniği geliştirdiler. Ancak teknolojik gelişmeler, Android mobil cihazların ve bu cihazlarda kullanılan uygulamaların artması, kötü amaçlı yazılımlardan dolayı kullanıcı gizliliği açısından sorunları da arttırmaktadır. Bu çalışmada, 241 öznitelik içeren güncel bir veri seti kullanılarak kötü amaçlı uygulamaların tespiti ve sınıflandırılması konusunda kapsamlı bir çalışma sunulmaktadır. Öncelikle hatalı ve eksik veriler tespit edilerek ilgili satırlar kaldırılır, ardından normalizasyon bazlı ölçeklendirme gerçekleştirilir. Bu ön işleme adımından sonra veri seti, çapraz doğrulama kullanılarak rastgele %70 eğitim ve %30 test verisine bölünür. Son olarak Çok Katmanlı Algılayıcı (MLP), Lojistik Regresyon (LOGR), K-En Yakın Komşu (KNN), Karar Ağacı Sınıflandırıcı (DTC), Rastgele Orman (RF) olmak üzere 6 farklı makine öğrenmesi yöntemi kullanılarak sınıflandırma işlemi gerçekleştirilir. Modelleme sonuçlarının karşılaştırılması, RF makine öğrenimi tekniğinin, gerçek dünyadaki Android uygulama setlerinde Android kötü amaçlı yazılım tespiti için %97 doğruluk düzeyi ve diğer çeşitli ölçümlerle en iyi performansı elde edebileceğini göstermektedir

References

  • “Smartphone OS market share 2023”, 2024, https://www.idc.com/promo/smartphone-market-share.
  • “Statista, Number of available applications in the Google Play Store from December 2009 to June 2023”, 2024, https://www.statista.com/statistics/266210/number-of-available-applications-in-the-google-play-store.
  • Li, J., Sun, L., Yan, Q., Li, Z., Srisa-An, W., & Ye, H. (2018). Significant permission identification for machine-learning-based android malware detection. IEEE Transactions on Industrial Informatics, 14(7), 3216-3225.
  • “Tom’sguide”, 2024, https://www.tomsguide.com/news/over-400-million-infected-with-android-spyware-delete-these-apps-right-now.
  • Ngo, F. T., Agarwal, A., Govindu, R., & MacDonald, C. (2020). Malicious software threats. The Palgrave Handbook of International Cybercrime and Cyberdeviance, 793-813.
  • Sahs, J., & Khan, L. (2012, August). A machine learning approach to android malware detection. In 2012 European intelligence and security informatics conference (pp. 141-147). IEEE.
  • Yerima, S. Y., Sezer, S., & Muttik, I. (2014, September). Android malware detection using parallel machine learning classifiers. In 2014 Eighth international conference on next generation mobile apps, services and technologies (pp. 37-42). IEEE.
  • Wen, L., & Yu, H. (2017, August). An Android malware detection system based on machine learning. In AIP conference proceedings (Vol. 1864, No. 1). AIP Publishing.
  • Kakavand, M., Dabbagh, M., & Dehghantanha, A. (2018, November). Application of machine learning algorithms for android malware detection. In Proceedings of the 2018 International Conference on Computational Intelligence and Intelligent Systems (pp. 32-36).
  • Li, J., Sun, L., Yan, Q., Li, Z., Srisa-An, W., & Ye, H. (2018). Significant permission identification for machine-learning-based android malware detection. IEEE Transactions on Industrial Informatics, 14(7), 3216-3225.
  • TAHTACI, B., & CANBAY, B. (2020, October). Android malware detection using machine learning. In 2020 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1-6). IEEE.
  • Liu, K., Xu, S., Xu, G., Zhang, M., Sun, D., & Liu, H. (2020). A review of android malware detection approaches based on machine learning. IEEE Access, 8, 124579-124607.
  • Kouliaridis, V., & Kambourakis, G. (2021). A comprehensive survey on machine learning techniques for android malware detection. Information, 12(5), 185.
  • Odusami, M., Abayomi-Alli, O., Misra, S., Shobayo, O., Damasevicius, R., & Maskeliunas, R. (2018). Android malware detection: A survey. In Applied Informatics: First International Conference, ICAI 2018, Bogotá, Colombia, November 1-3, 2018, Proceedings 1 (pp. 255-266). Springer International Publishing.
  • Pan, Y., Ge, X., Fang, C., & Fan, Y. (2020). A systematic literature review of android malware detection using static analysis. IEEE Access, 8, 116363-116379.]
  • “Kaggle, Android Malware Detection”, 2024, https://www.kaggle.com/datasets/joebeachcapital/tuandromd.
  • Chiong, R., & Theng, L. B. (2008, June). A hybrid naive bayes approach for information filtering. In 2008 3rd IEEE Conference on Industrial Electronics and Applications (pp. 1003-1007). IEEE.
  • Zhao, C., Gao, Y., He, J., & Lian, J. (2012). Recognition of driving postures by multiwavelet transform and multilayer perceptron classifier. Engineering Applications of Artificial Intelligence, 25(8), 1677-1686.
  • Tsangaratos, P., & Ilia, I. (2016). Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size. Catena, 145, 164-179.
  • Viswanath, P., & Sarma, T. H. (2011, September). An improvement to k-nearest neighbor classifier. In 2011 IEEE Recent Advances in Intelligent Computational Systems (pp. 227-231). IEEE.
  • Du, W., & Zhan, Z. (2002). Building decision tree classifier on private data.
  • Belgiu, M., & Drăguţ, L. (2016). Random forest in remote sensing: A review of applications and future directions. ISPRS journal of photogrammetry and remote sensing, 114, 24-31.
  • Eroğlu, K., & Palabaş, T. (2016, December). The impact on the classification performance of the combined use of different classification methods and different ensemble algorithms in chronic kidney disease detection. In 2016 National Conference on Electrical, Electronics and Biomedical Engineering (ELECO) (pp. 512-516). IEEE.
  • Palabas, T., & Eroğlu, K. (2018, May). Occupancy detection from temperature, humidity, light, CO2 and humidity ratio measurements using machine learning techniques. In 2018 26th Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE.
  • Erkaymaz, O., & Palabaş, T. (2018, May). Classification of cervical cancer data and the effect of random subspace algorithms on classification performance. In 2018 26th Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE.
  • Bowers, A. J., Sprott, R., & Taff, S. A. (2012). Do we know who will drop out? A review of the predictors of dropping out of high school: Precision, sensitivity, and specificity. The High School Journal, 77-100.
  • Chu, K. (1999). An introduction to sensitivity, specificity, predictive values and likelihood ratios. Emergency Medicine, 11(3), 175-181.
  • Vieira, S. M., Kaymak, U., & Sousa, J. M. (2010, July). Cohen's kappa coefficient as a performance measure for feature selection. In International conference on fuzzy systems (pp. 1-8). IEEE.
  • Chicco, D., Warrens, M. J., & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 7, e623.
  • Linden, A. (2006). Measuring diagnostic and predictive accuracy in disease management: an introduction to receiver operating characteristic (ROC) analysis. Journal of evaluation in clinical practice, 12(2), 132-139.
  • "BusinessofApps Android Statistics (2023)", 2024, https://www.businessofapps.com/data/android-statistics.
  • Borah, P., Bhattacharyya, D. K., & Kalita, J. K. (2020, December). Malware dataset generation and evaluation. In 2020 IEEE 4th Conference on Information & Communication Technology (CICT) (pp. 1-6). IEEE.
  • Ambekar, N. G., Devi, N. N., Thokchom, S., & Yogita. (2024). TabLSTMNet: enhancing android malware classification through integrated attention and explainable AI. Microsystem Technologies, 1-19.
  • Li J, He J, Li W, Fang W, Yang G, Li T (2023) Syndroid: an adaptive enhanced android malware classification method based on CTGAN-SVM. Comput Secur 137:103604
  • Shu Z, Yan G (2023) Eagle: evasion attacks guided by local explanations against android malware classification. IEEE Trans Depend Secure Comput. https://doi.org/10.1109/TDSC.2023.3324265.
  • Li J, He J, Li W, Fang W, Yang G, Li T (2023) Syndroid: an adaptive enhanced android malware classification method based on CTGAN-SVM. Comput Secur 137:103604
  • Islam R, Sayed MI, Saha S, Hossain MJ, Masud MA (2023) Android malware classification using optimum feature selection and ensemble machine learning. Internet Things Cyber-Phys Syst 3:100–111
  • Ullah F, Cheng X, Mostarda L, Jabbar S (2023) Android-iot malware classification and detection approach using deep url features analysis. J Database Manag 34(2):1–26
There are 38 citations in total.

Details

Primary Language English
Subjects Machine Learning Algorithms
Journal Section Makaleler
Authors

Tuğba Palabaş 0000-0002-6985-6494

Publication Date August 31, 2024
Submission Date April 1, 2024
Acceptance Date July 17, 2024
Published in Issue Year 2024 Volume: 11 Issue: 23

Cite

APA Palabaş, T. (2024). ANDROID MALWARE CLASSIFICATION USING BASIC MACHINE LEARNING METHODS. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi, 11(23), 190-202. https://doi.org/10.54365/adyumbd.1462488
AMA Palabaş T. ANDROID MALWARE CLASSIFICATION USING BASIC MACHINE LEARNING METHODS. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi. August 2024;11(23):190-202. doi:10.54365/adyumbd.1462488
Chicago Palabaş, Tuğba. “ANDROID MALWARE CLASSIFICATION USING BASIC MACHINE LEARNING METHODS”. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi 11, no. 23 (August 2024): 190-202. https://doi.org/10.54365/adyumbd.1462488.
EndNote Palabaş T (August 1, 2024) ANDROID MALWARE CLASSIFICATION USING BASIC MACHINE LEARNING METHODS. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi 11 23 190–202.
IEEE T. Palabaş, “ANDROID MALWARE CLASSIFICATION USING BASIC MACHINE LEARNING METHODS”, Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi, vol. 11, no. 23, pp. 190–202, 2024, doi: 10.54365/adyumbd.1462488.
ISNAD Palabaş, Tuğba. “ANDROID MALWARE CLASSIFICATION USING BASIC MACHINE LEARNING METHODS”. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi 11/23 (August 2024), 190-202. https://doi.org/10.54365/adyumbd.1462488.
JAMA Palabaş T. ANDROID MALWARE CLASSIFICATION USING BASIC MACHINE LEARNING METHODS. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi. 2024;11:190–202.
MLA Palabaş, Tuğba. “ANDROID MALWARE CLASSIFICATION USING BASIC MACHINE LEARNING METHODS”. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi, vol. 11, no. 23, 2024, pp. 190-02, doi:10.54365/adyumbd.1462488.
Vancouver Palabaş T. ANDROID MALWARE CLASSIFICATION USING BASIC MACHINE LEARNING METHODS. Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi. 2024;11(23):190-202.