Dynamic Malware Detection Approach Based on API Calls: Machine Learning and Ensemble Learning Models

Aykut Karakaya; Ahmet Ulu

doi:10.55859/ijiss.1510423

Research Article

Year 2024, Volume: 13 Issue: 4, 1 - 20, 29.12.2024

Aykut Karakaya , Ahmet Ulu

https://doi.org/10.55859/ijiss.1510423

Abstract

References

[1] S. Poeplau, Y. Fratantonio, A. Bianchi, C. Kruegel, and G. Vigna, “Execute this! analyzing unsafe and malicious dynamic code loading in android applications,” in Proceedings of the 20th Annual Network and Distributed System Security Symposium (NDSS), vol. 14, 2014, pp. 23–26.
[2] M. Ahmad, V. Costamagna, B. Crispo, F. Bergadano, and Y. Zhauniarovich, “Stadart: Addressing the problem of dynamic code updates in the security analysis of android applications,” Journal of Systems and Software, vol. 159, p. 110386, 2020.
[3] Y. Rosmansyah, B. Dabarsyah et al., “Malware detection on android smartphones using api class and machine learning,” in 2015 International Conference on Electrical Engineering and Informatics (ICEEI), 2015, pp. 294–297.
[4] A. Roy, D. S. Jas, G. Jaggi, and K. Sharma, “Android malware detection based on vulnerable feature aggregation,” Procedia Computer Science, vol. 173, pp. 345–353, 2020. [5] F. Shen, J. Del Vecchio, A. Mohaisen, S. Y. Ko, and L. Ziarek, “Android malware detection using complex-flows,” IEEE Transactions on Mobile Computing, vol. 18, no. 6, pp. 1231–1245, 2018. [6] T. Chen, H. Zeng, M. Lv, and T. Zhu, “Ctimd: Cyber threat intelligence enhanced malware detection using api call sequences with parameters,” Computers & Security, vol. 136, p. 103518, 2024.
[7] S. Zhang, J. Wu, M. Zhang, and W. Yang, “Dynamic malware analysis based on api sequence semantic fusion,” Applied Sciences, vol. 13, no. 11, 2023.
[8] A. A. Alhashmi, A. A. Darem, A. M. Alashjaee, S. M. Alanazi, T. M. Alkhaldi, S. A. Ebad, F. A. Ghaleb, and A. M. Almadani, “Similarity-based hybrid malware detection model using api calls,” Mathematics, vol. 11, no. 13, 2023.
[9] C. Li, Q. Lv, N. Li, Y. Wang, D. Sun, and Y. Qiao, “A novel deep framework for dynamic malware detection based on api sequence intrinsic features,” Computers & Security, vol. 116, p. 102686, 2022. [10] C. Li, Z. Cheng, H. Zhu, L. Wang, Q. Lv, Y. Wang, N. Li, and D. Sun, “Dmalnet: Dynamic malware analysis based on api feature engineering and graph learning,” Computers & Security, vol. 122, p. 102872, 2022.
[11] J. Singh and J. Singh, “Assessment of supervised machine learning algorithms using dynamic api calls for malware detection,” International Journal of Computers and Applications, vol. 44, no. 3, pp. 270–277, 2022.
[12] J. Yang, J. Tang, R. Yan, and T. Xiang, “Android malware detection method based on permission complement and api calls,” Chinese Journal of Electronics, vol. 31, no. 4, pp. 773– 785, 2022.
[13] J. Tang, W. Xu, T. Peng, S. Zhou, Q. Pi, R. He, and X. Hu, “Android malware detection based on a novel mixed bytecode image combined with attention mechanism,” Journal of Information Security and Applications, vol. 82, p. 103721, 2024.
[14] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19.
[15] H.-j. Zhu, W. Gu, L.-m. Wang, Z.-c. Xu, and V. S. Sheng, “Android malware detection based on multi-head squeeze-andexcitation residual network,” Expert Systems with Applications, vol. 212, p. 118705, 2023.
[16] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.
[17] A. Ksibi, M. Zakariah, L. Almuqren, and A. S. Alluhaidan, “Efficient android malware identification with limited training data utilizing multiple convolution neural network techniques,” Engineering Applications of Artificial Intelligence, vol. 127, p. 107390, 2024.
[18] M. Tan, “Efficientnet: Rethinking model scaling for convolutional neural networks,” arXiv preprint arXiv:1905.11946, 2019.
[19] A. Damodaran, F. D. Troia, C. A. Visaggio, T. H. Austin, and M. Stamp, “A comparison of static, dynamic, and hybrid analysis for malware detection,” Journal of Computer Virology and Hacking Techniques, vol. 13, pp. 1–12, 2017.
[20] A. T. W. Almais, A. Susilo, A. Naba, M. Sarosa, C. Crysdian, I. Tazi, M. A. Hariyadi, M. A. Muslim, P. M. N. S. A. Basid, Y. M. Arif, M. S. Purwanto, D. Parwatiningtyas, Supriyono, and H. Wicaksono, “Principal component analysis-based data clustering for labeling of level damage sector in post-natural disasters,” IEEE Access, vol. 11, pp. 74 590–74 601, 2023.
[21] L.-C. Chang, J.-Y. Liou, and F.-J. Chang, “Spatial-temporal flood inundation nowcasts by fusing machine learning methods and principal component analysis,” Journal of Hydrology, vol. 612, p. 128086, 2022.
[22] S. A. Abdul-Wahab, C. S. Bakheit, and S. M. Al-Alawi, “Principal component and multiple regression analysis in modelling of ground-level ozone and factors affecting its concentrations,” Environmental Modelling & Software, vol. 20, no. 10, pp. 1263– 1271, 2005. [23] L. Breiman, “Random forests,” Machine learning, vol. 45, pp. 5–32, 2001.
[24] M. Schonlau and R. Y. Zou, “The random forest algorithm for statistical learning,” The Stata Journal, vol. 20, no. 1, pp. 3–29, 2020.
[25] J. R. Quinlan, “Learning decision tree classifiers,” ACM Computing Surveys (CSUR), vol. 28, no. 1, pp. 71–72, 1996.
[26] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, “Support vector machines,” IEEE Intelligent Systems and their applications, vol. 13, no. 4, pp. 18–28, 1998. [27] A. Almomany, W. R. Ayyad, and A. Jarrah, “Optimized implementation of an improved knn classification algorithm using intel fpga platform: Covid-19 case study,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 6, pp. 3815–3827, 2022.
[28] A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Frontiers in neurorobotics, vol. 7, p. 21, 2013.
[29] A. Karakaya, A. Ulu, and S. Akleylek, “Goalalert: A novel realtime technical team alert approach using machine learning on an iot-based system in sports,” Microprocessors and Microsystems, vol. 93, p. 104606, 2022.
[30] T. Hastie, R. Tibshirani, and J. Friedman, “Boosting and additive trees,” The elements of statistical learning: data mining, inference, and prediction, pp. 337–387, 2009.
[31] A. Karakaya and S. Akleylek, “A novel iot-based health and tactical analysis model with fog computing,” PeerJ Computer Science, vol. 7, p. e342, 2021.
[32] B. Genc¸aydin, C. N. Kahya, F. Demirkiran, B. D¨uzg¨un, A. C¸ ayir, and H. Da˘g, “Benchmark static api call datasets for malware family classification,” in 2022 7th International Conference on Computer Science and Engineering (UBMK), 2022, pp. 1–5.
[33] A. Karakaya, I. Karakaya, and T. Temizceri, “An online shoppers purchasing intention model based on ensemble learning,” in 2023 4th International Informatics and Software Engineering Conference (IISEC), 2023, pp. 1–4.
[34] A. Karakaya and A. Ulu, “A novel mobile malware detection model based on ensemble learning,” in 2023 Innovations in Intelligent Systems and Applications Conference (ASYU), 2023, pp. 1–6.
[35] P. Xanthopoulos, P. M. Pardalos, T. B. Trafalis, P. Xanthopoulos, P. M. Pardalos, and T. B. Trafalis, “Linear discriminant analysis,” Robust data mining, pp. 27–33, 2013.
[36] H. Hoffmann, “Kernel pca for novelty detection,” Pattern recognition, vol. 40, no. 3, pp. 863–874, 2007. 20

Dynamic Malware Detection Approach Based on API Calls: Machine Learning and Ensemble Learning Models

Year 2024, Volume: 13 Issue: 4, 1 - 20, 29.12.2024

Aykut Karakaya , Ahmet Ulu

https://doi.org/10.55859/ijiss.1510423

Abstract

The rapid evolution of malware presents significant challenges in cybersecurity. This study investigates the efficacy of various machine learning and ensemble learning models for malware detection using dynamic analysis. The dynamic datasets, contain API calls and permissions, enabling real-time monitoring of malware behavior. In conclusion, for both the VirusSample and VirusShare datasets, the random forest (RF) model achieved the best results among machine learning models, with accuracies of %94.69 and %85.72, respectively. For the VirusSample dataset, the stacking ensemble learning model, which uses RF and decision trees (DT) as base classifiers and K-nearest neighbors (KNN) as the meta classifier, achieved the highest accuracy of %94.52. In contrast, for the VirusShare dataset, the stacking ensemble learning model, which uses RF, KNN, and gradient boosting (GB) as base classifiers and support vector machine (SVM) as the meta classifier, achieved the highest accuracy of %85.7. These results underscore the superiority of dynamic analysis and the effectiveness of ensemble methods in enhancing malware detection accuracy. This study contributes to the optimization of machine learning models and the advancement of cybersecurity solutions.

Keywords

information security , malware detection , dynamic analysis , machine learning , ensemble learning.

References

[1] S. Poeplau, Y. Fratantonio, A. Bianchi, C. Kruegel, and G. Vigna, “Execute this! analyzing unsafe and malicious dynamic code loading in android applications,” in Proceedings of the 20th Annual Network and Distributed System Security Symposium (NDSS), vol. 14, 2014, pp. 23–26.
[2] M. Ahmad, V. Costamagna, B. Crispo, F. Bergadano, and Y. Zhauniarovich, “Stadart: Addressing the problem of dynamic code updates in the security analysis of android applications,” Journal of Systems and Software, vol. 159, p. 110386, 2020.
[3] Y. Rosmansyah, B. Dabarsyah et al., “Malware detection on android smartphones using api class and machine learning,” in 2015 International Conference on Electrical Engineering and Informatics (ICEEI), 2015, pp. 294–297.
[4] A. Roy, D. S. Jas, G. Jaggi, and K. Sharma, “Android malware detection based on vulnerable feature aggregation,” Procedia Computer Science, vol. 173, pp. 345–353, 2020. [5] F. Shen, J. Del Vecchio, A. Mohaisen, S. Y. Ko, and L. Ziarek, “Android malware detection using complex-flows,” IEEE Transactions on Mobile Computing, vol. 18, no. 6, pp. 1231–1245, 2018. [6] T. Chen, H. Zeng, M. Lv, and T. Zhu, “Ctimd: Cyber threat intelligence enhanced malware detection using api call sequences with parameters,” Computers & Security, vol. 136, p. 103518, 2024.
[7] S. Zhang, J. Wu, M. Zhang, and W. Yang, “Dynamic malware analysis based on api sequence semantic fusion,” Applied Sciences, vol. 13, no. 11, 2023.
[8] A. A. Alhashmi, A. A. Darem, A. M. Alashjaee, S. M. Alanazi, T. M. Alkhaldi, S. A. Ebad, F. A. Ghaleb, and A. M. Almadani, “Similarity-based hybrid malware detection model using api calls,” Mathematics, vol. 11, no. 13, 2023.
[9] C. Li, Q. Lv, N. Li, Y. Wang, D. Sun, and Y. Qiao, “A novel deep framework for dynamic malware detection based on api sequence intrinsic features,” Computers & Security, vol. 116, p. 102686, 2022. [10] C. Li, Z. Cheng, H. Zhu, L. Wang, Q. Lv, Y. Wang, N. Li, and D. Sun, “Dmalnet: Dynamic malware analysis based on api feature engineering and graph learning,” Computers & Security, vol. 122, p. 102872, 2022.
[11] J. Singh and J. Singh, “Assessment of supervised machine learning algorithms using dynamic api calls for malware detection,” International Journal of Computers and Applications, vol. 44, no. 3, pp. 270–277, 2022.
[12] J. Yang, J. Tang, R. Yan, and T. Xiang, “Android malware detection method based on permission complement and api calls,” Chinese Journal of Electronics, vol. 31, no. 4, pp. 773– 785, 2022.
[13] J. Tang, W. Xu, T. Peng, S. Zhou, Q. Pi, R. He, and X. Hu, “Android malware detection based on a novel mixed bytecode image combined with attention mechanism,” Journal of Information Security and Applications, vol. 82, p. 103721, 2024.
[14] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19.
[15] H.-j. Zhu, W. Gu, L.-m. Wang, Z.-c. Xu, and V. S. Sheng, “Android malware detection based on multi-head squeeze-andexcitation residual network,” Expert Systems with Applications, vol. 212, p. 118705, 2023.
[16] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.
[17] A. Ksibi, M. Zakariah, L. Almuqren, and A. S. Alluhaidan, “Efficient android malware identification with limited training data utilizing multiple convolution neural network techniques,” Engineering Applications of Artificial Intelligence, vol. 127, p. 107390, 2024.
[18] M. Tan, “Efficientnet: Rethinking model scaling for convolutional neural networks,” arXiv preprint arXiv:1905.11946, 2019.
[19] A. Damodaran, F. D. Troia, C. A. Visaggio, T. H. Austin, and M. Stamp, “A comparison of static, dynamic, and hybrid analysis for malware detection,” Journal of Computer Virology and Hacking Techniques, vol. 13, pp. 1–12, 2017.
[20] A. T. W. Almais, A. Susilo, A. Naba, M. Sarosa, C. Crysdian, I. Tazi, M. A. Hariyadi, M. A. Muslim, P. M. N. S. A. Basid, Y. M. Arif, M. S. Purwanto, D. Parwatiningtyas, Supriyono, and H. Wicaksono, “Principal component analysis-based data clustering for labeling of level damage sector in post-natural disasters,” IEEE Access, vol. 11, pp. 74 590–74 601, 2023.
[21] L.-C. Chang, J.-Y. Liou, and F.-J. Chang, “Spatial-temporal flood inundation nowcasts by fusing machine learning methods and principal component analysis,” Journal of Hydrology, vol. 612, p. 128086, 2022.
[22] S. A. Abdul-Wahab, C. S. Bakheit, and S. M. Al-Alawi, “Principal component and multiple regression analysis in modelling of ground-level ozone and factors affecting its concentrations,” Environmental Modelling & Software, vol. 20, no. 10, pp. 1263– 1271, 2005. [23] L. Breiman, “Random forests,” Machine learning, vol. 45, pp. 5–32, 2001.
[24] M. Schonlau and R. Y. Zou, “The random forest algorithm for statistical learning,” The Stata Journal, vol. 20, no. 1, pp. 3–29, 2020.
[25] J. R. Quinlan, “Learning decision tree classifiers,” ACM Computing Surveys (CSUR), vol. 28, no. 1, pp. 71–72, 1996.
[26] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, “Support vector machines,” IEEE Intelligent Systems and their applications, vol. 13, no. 4, pp. 18–28, 1998. [27] A. Almomany, W. R. Ayyad, and A. Jarrah, “Optimized implementation of an improved knn classification algorithm using intel fpga platform: Covid-19 case study,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 6, pp. 3815–3827, 2022.
[28] A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Frontiers in neurorobotics, vol. 7, p. 21, 2013.
[29] A. Karakaya, A. Ulu, and S. Akleylek, “Goalalert: A novel realtime technical team alert approach using machine learning on an iot-based system in sports,” Microprocessors and Microsystems, vol. 93, p. 104606, 2022.
[30] T. Hastie, R. Tibshirani, and J. Friedman, “Boosting and additive trees,” The elements of statistical learning: data mining, inference, and prediction, pp. 337–387, 2009.
[31] A. Karakaya and S. Akleylek, “A novel iot-based health and tactical analysis model with fog computing,” PeerJ Computer Science, vol. 7, p. e342, 2021.
[32] B. Genc¸aydin, C. N. Kahya, F. Demirkiran, B. D¨uzg¨un, A. C¸ ayir, and H. Da˘g, “Benchmark static api call datasets for malware family classification,” in 2022 7th International Conference on Computer Science and Engineering (UBMK), 2022, pp. 1–5.
[33] A. Karakaya, I. Karakaya, and T. Temizceri, “An online shoppers purchasing intention model based on ensemble learning,” in 2023 4th International Informatics and Software Engineering Conference (IISEC), 2023, pp. 1–4.
[34] A. Karakaya and A. Ulu, “A novel mobile malware detection model based on ensemble learning,” in 2023 Innovations in Intelligent Systems and Applications Conference (ASYU), 2023, pp. 1–6.
[35] P. Xanthopoulos, P. M. Pardalos, T. B. Trafalis, P. Xanthopoulos, P. M. Pardalos, and T. B. Trafalis, “Linear discriminant analysis,” Robust data mining, pp. 27–33, 2013.
[36] H. Hoffmann, “Kernel pca for novelty detection,” Pattern recognition, vol. 40, no. 3, pp. 863–874, 2007. 20

There are 31 citations in total.

Details

Primary Language	English
Subjects	System and Network Security, Data Security and Protection
Journal Section	Research Article
Authors	Aykut Karakaya 0000-0001-6970-3239 Ahmet Ulu 0000-0002-4618-5712
Publication Date	December 29, 2024
Submission Date	July 4, 2024
Acceptance Date	October 3, 2024
Published in Issue	Year 2024 Volume: 13 Issue: 4

Cite

IEEE	A. Karakaya and A. Ulu, “Dynamic Malware Detection Approach Based on API Calls: Machine Learning and Ensemble Learning Models”, IJISS, vol. 13, no. 4, pp. 1–20, 2024, doi: 10.55859/ijiss.1510423.

Article Files

Full Text