Büyük Veri Ortamlarında Zararlı Yazılım Tespiti Kapsamında Makine Öğrenmesi Algoritmalarının Performansının İncelenmesi

Sercan Gülburun; Murat Dener

doi:10.31202/ecjse.967919

Araştırma Makalesi

Büyük Veri Ortamlarında Zararlı Yazılım Tespiti Kapsamında Makine Öğrenmesi Algoritmalarının Performansının İncelenmesi

Yıl 2021, Cilt: 8 Sayı: 3, 1536 - 1549, 30.09.2021

Sercan Gülburun , Murat Dener

https://doi.org/10.31202/ecjse.967919

https://izlik.org/JA56PS47HJ

Öz

Bilgi teknolojileri varlıklarının hem bireylerin günlük hayatlarındaki hem de kurum ve kuruluşların işleyişindeki yeri son çeyrek asırda hızlı bir artış göstermiştir. Bu artışa paralel olarak bilgi varlıklarına yönelik tehditler de artmıştır. Bu varlıkları tehdit eden başlıca hususlardan bir tanesi zararlı yazılımlardır. Bu çalışmada, büyük veri ortamında zararlı yazılımların tespit edilmesi kapsamında makine öğrenmesi algoritmalarının etkinliği incelenmiştir. Google Colaboratory, Azure HDInsight, Amazon EMR ve Google Dataproc ortamlarında yapılan çalışmada, Apache Spark 3.0’da bulunan ve ikili sınıflandırma yapabilen rastgele orman (Random Forest - RF), karar ağaçları (Decision Trees – DT) ve gradyan yükseltme ağaçları (Gradient Boosting Trees – GBT) makine öğrenme metotları kullanılarak Kaggle Zararlı Yazılım Tespiti Veri Seti üzerinde modellerin etkinliği test edilmiştir. Statik analiz yaklaşımıyla gerçekleştirilen çalışmada, her bir makine öğrenme algoritması için doğruluk, kesinlik, duyarlılık, eğitim zamanı ve tahmin zamanı metrikleri hesaplanmış, ayrıca, aynı algoritmalar için Sci-Kit Learn kütüphanesinden faydalanılarak da sonuçlar elde edilmiş ve değerlendirilmiştir.

Anahtar Kelimeler

Büyük Veri , Makine Öğrenmesi , Zararlı Yazılım Tespiti , Google Dataproc , Azure HDInsight

Kaynakça

[1]. Abawajy, J. H., and Kelarev, A., “Large iterative multitier ensemble classifiers for security of big data”, IEEE Trans. Emerg. Top. Comput., vol. 2, no. 3, pp. 352–363, 2014, doi: 10.1109/TETC.2014.2316510.
[2]. Bocchi, E., Grimaudo, L., Mellia, M., Baralis, E., Saha, S., Miskovic, S., Modelo-Howard, G. and Lee, S.J., “MAGMA network behavior classifier for malware traffic” Comput. Networks, vol. 109, pp. 142–156, 2016, doi: 10.1016/j.comnet.2016.03.021.
[3]. Gupta, D., and Rani, R., “Big data framework for zero-day malware detection” Cybern. Syst., vol. 49, no. 2, pp. 103–121, 2018, doi: 10.1080/01969722.2018.1429835.
[4]. Gupta, D., and Rani, R., “Improving malware detection using big data and ensemble learning,” Comput. Electr. Eng., vol. 86, p. 106729, 2020, doi: 10.1016/j.compeleceng.2020.106729.
[5]. Abawajy, J. H., Chowdhury, M., and Kelarev, A., “Hybrid consensus pruning of ensemble classifiers for big data malware detection” IEEE Trans. Cloud Comput., vol. 8, no. 2, pp. 398–407, 2020, doi: 10.1109/TCC.2015.2481378.
[6]. Usman, N., Usman, S., Khan, F., Jan, M.A., Saj,d, A., Alazab, M. and Watters, P., “Intelligent dynamic malware detection using machine learning in IP reputation for forensics data analytics” Futur. Gener. Comput. Syst., vol. 118, pp. 124–141, 2021, doi: 10.1016/j.future.2021.01.004.
[7]. Sahoo, A. K., Sahoo, K. S., and Tiwary, M., “Signature based malware detection for unstructured data in Hadoop” 2014 Int. Conf. Adv. Electron. Comput. Commun. ICAECC 2014, 2015, doi: 10.1109/ICAECC.2014.7002394.
[8]. Suhasini, N. S., Hirwarkar, T., and Ashok, J., “Big data analytics for malware detection in a virtulaized framework” vol. 7, no. 14, pp. 3184–3191, 2020.
[9]. Vinayakumar, R., Alazab, M., Soman, K. P., Poornachandran, P., and Venkatraman, S., “Robust intelligent malware detection using deep learning” IEEE Access, vol. 7, pp. 46717–46738, 2019, doi: 10.1109/ACCESS.2019.2906934.
[10]. De Paola, A., Gaglio, S., Lo Re, G., and Morana, M., “A hybrid system for malware detection on big data” INFOCOM 2018 - IEEE Conf. Comput. Commun. Work., pp. 45–50, 2018, doi: 10.1109/INFCOMW.2018.8406963.
[11]. Masabo, E., Kaawaase, K. S., and Sansa-Otim, J., “Big data: Deep learning for detecting malware” Proc. - Int. Conf. Softw. Eng., pp. 20–26, 2018, doi: 10.1145/3195528.3195533.
[12]. Yousefi-azar, M., Hamey, L. G. C., Varadharajan, V., and Chen, S., “Malytics : A malware detection scheme” IEEE Access, vol. 6, pp. 49418–49431, 2018, doi: 10.1109/ACCESS.2018.2864871.
[13]. Mao, W., Cai, Z., Yang, Y., and Shi, X., “From big data to knowledge : A spatio- temporal approach to malware detection” Comput. Secur., vol. 74, pp. 167–183, 2018, doi: 10.1016/j.cose.2017.12.005.
[14]. Niveditha, V. R., Ananthan, T. V. , Amudha, S., Sam, D., and Srinidhi, S., “Detect and classify zero day malware efficiently in big data platform” Int. J. Adv. Sci. Technol., vol. 29, no. 4 Special Issue, pp. 1947–1954, 2020, doi: 10.13140/RG.2.2.20804.45445.
[15]. Libri, A., Bartolini, A., and Benini, L., “pAElla: Edge AI-based real-time malware detection in data centers” IEEE Internet Things J., vol. 7, no. 10, pp. 9589–9599, 2020, doi: 10.1109/JIOT.2020.2986702.
[16]. Wu, W. C., and Hung, S. H., “DroidDolphin: A dynamic android malware detection framework using big data and machine learning” Proc. 2014 Res. Adapt. Converg. Syst. RACS 2014, pp. 247–252, 2014, doi: 10.1145/2663761.2664223.
[17]. Wassermann, S., and Casas, P., “BIGMOMAL - Big data analytics for mobile malware detection” WTMC 2018 - Proc. 2018 Work. Traffic Meas. Cybersecurity, Part SIGCOMM 2018, pp. 33–39, 2018, doi: 10.1145/3229598.3229600.
[18]. Memon, L. U., Bawany, N. Z., and Shamsi, J. A., “A comparison of machine learning techniques for android malware detection using apache spark” J. Eng. Sci. Technol., vol. 14, no. 3, pp. 1572–1586, 2019.
[19]. Venkatraman, S., and Alazab, M., “Use of Data Visualisation for Zero-Day Malware Detection” Secur. Commun. Networks, vol. 2018, 2018, doi: 10.1155/2018/1728303.
[20]. Modiri, E., Azmoodeh, A., Dehghantanha, A., Ellis, D., Parizi, R. M., and Karimipour, H., “Fuzzy pattern tree for edge malware detection and categorization in IoT” J. Syst. Archit. Comput., vol. 97, no. October 2018, pp. 1–7, 2019, doi: 10.1016/j.sysarc.2019.01.017.

Analyzing of Machine Learning Algorithms Performance in Big Data Environment in terms of Malware Detection

Yıl 2021, Cilt: 8 Sayı: 3, 1536 - 1549, 30.09.2021

Sercan Gülburun , Murat Dener

https://doi.org/10.31202/ecjse.967919

https://izlik.org/JA56PS47HJ

Öz

The place of information technology assets in both the daily lives of individuals and the functioning of institutions and organizations has increased rapidly in the last quarter century. Parallel to this increase, threats to information assets have also increased. One of the main threats to these assets is malware. In this study, the effectiveness of machine learning algorithms in detecting malicious software in big data environment was examined. In the study conducted in Google Colaboratory, Azure HDInsight, Amazon EMR and Google Dataproc, the effectiveness of random forest, decision trees and gradient boosting trees algorithms which are included in Apache 3.0 and capable of binary classification are tested using Kaggle Malware Detection dataset. In the study, which was carried out with a static analysis approach, accuracy, precision, sensitivity, training time and prediction time metrics were calculated for each machine learning algorithm and the results of same algorithms using Sci-Kit Learn library are collected and evaluated all together.

Anahtar Kelimeler

Big Data , Machine Learning , Malware Detection , Google Dataproc , Azure HDInsight

Kaynakça

[1]. Abawajy, J. H., and Kelarev, A., “Large iterative multitier ensemble classifiers for security of big data”, IEEE Trans. Emerg. Top. Comput., vol. 2, no. 3, pp. 352–363, 2014, doi: 10.1109/TETC.2014.2316510.
[2]. Bocchi, E., Grimaudo, L., Mellia, M., Baralis, E., Saha, S., Miskovic, S., Modelo-Howard, G. and Lee, S.J., “MAGMA network behavior classifier for malware traffic” Comput. Networks, vol. 109, pp. 142–156, 2016, doi: 10.1016/j.comnet.2016.03.021.
[3]. Gupta, D., and Rani, R., “Big data framework for zero-day malware detection” Cybern. Syst., vol. 49, no. 2, pp. 103–121, 2018, doi: 10.1080/01969722.2018.1429835.
[4]. Gupta, D., and Rani, R., “Improving malware detection using big data and ensemble learning,” Comput. Electr. Eng., vol. 86, p. 106729, 2020, doi: 10.1016/j.compeleceng.2020.106729.
[5]. Abawajy, J. H., Chowdhury, M., and Kelarev, A., “Hybrid consensus pruning of ensemble classifiers for big data malware detection” IEEE Trans. Cloud Comput., vol. 8, no. 2, pp. 398–407, 2020, doi: 10.1109/TCC.2015.2481378.
[6]. Usman, N., Usman, S., Khan, F., Jan, M.A., Saj,d, A., Alazab, M. and Watters, P., “Intelligent dynamic malware detection using machine learning in IP reputation for forensics data analytics” Futur. Gener. Comput. Syst., vol. 118, pp. 124–141, 2021, doi: 10.1016/j.future.2021.01.004.
[7]. Sahoo, A. K., Sahoo, K. S., and Tiwary, M., “Signature based malware detection for unstructured data in Hadoop” 2014 Int. Conf. Adv. Electron. Comput. Commun. ICAECC 2014, 2015, doi: 10.1109/ICAECC.2014.7002394.
[8]. Suhasini, N. S., Hirwarkar, T., and Ashok, J., “Big data analytics for malware detection in a virtulaized framework” vol. 7, no. 14, pp. 3184–3191, 2020.
[9]. Vinayakumar, R., Alazab, M., Soman, K. P., Poornachandran, P., and Venkatraman, S., “Robust intelligent malware detection using deep learning” IEEE Access, vol. 7, pp. 46717–46738, 2019, doi: 10.1109/ACCESS.2019.2906934.
[10]. De Paola, A., Gaglio, S., Lo Re, G., and Morana, M., “A hybrid system for malware detection on big data” INFOCOM 2018 - IEEE Conf. Comput. Commun. Work., pp. 45–50, 2018, doi: 10.1109/INFCOMW.2018.8406963.
[11]. Masabo, E., Kaawaase, K. S., and Sansa-Otim, J., “Big data: Deep learning for detecting malware” Proc. - Int. Conf. Softw. Eng., pp. 20–26, 2018, doi: 10.1145/3195528.3195533.
[12]. Yousefi-azar, M., Hamey, L. G. C., Varadharajan, V., and Chen, S., “Malytics : A malware detection scheme” IEEE Access, vol. 6, pp. 49418–49431, 2018, doi: 10.1109/ACCESS.2018.2864871.
[13]. Mao, W., Cai, Z., Yang, Y., and Shi, X., “From big data to knowledge : A spatio- temporal approach to malware detection” Comput. Secur., vol. 74, pp. 167–183, 2018, doi: 10.1016/j.cose.2017.12.005.
[14]. Niveditha, V. R., Ananthan, T. V. , Amudha, S., Sam, D., and Srinidhi, S., “Detect and classify zero day malware efficiently in big data platform” Int. J. Adv. Sci. Technol., vol. 29, no. 4 Special Issue, pp. 1947–1954, 2020, doi: 10.13140/RG.2.2.20804.45445.
[15]. Libri, A., Bartolini, A., and Benini, L., “pAElla: Edge AI-based real-time malware detection in data centers” IEEE Internet Things J., vol. 7, no. 10, pp. 9589–9599, 2020, doi: 10.1109/JIOT.2020.2986702.
[16]. Wu, W. C., and Hung, S. H., “DroidDolphin: A dynamic android malware detection framework using big data and machine learning” Proc. 2014 Res. Adapt. Converg. Syst. RACS 2014, pp. 247–252, 2014, doi: 10.1145/2663761.2664223.
[17]. Wassermann, S., and Casas, P., “BIGMOMAL - Big data analytics for mobile malware detection” WTMC 2018 - Proc. 2018 Work. Traffic Meas. Cybersecurity, Part SIGCOMM 2018, pp. 33–39, 2018, doi: 10.1145/3229598.3229600.
[18]. Memon, L. U., Bawany, N. Z., and Shamsi, J. A., “A comparison of machine learning techniques for android malware detection using apache spark” J. Eng. Sci. Technol., vol. 14, no. 3, pp. 1572–1586, 2019.
[19]. Venkatraman, S., and Alazab, M., “Use of Data Visualisation for Zero-Day Malware Detection” Secur. Commun. Networks, vol. 2018, 2018, doi: 10.1155/2018/1728303.
[20]. Modiri, E., Azmoodeh, A., Dehghantanha, A., Ellis, D., Parizi, R. M., and Karimipour, H., “Fuzzy pattern tree for edge malware detection and categorization in IoT” J. Syst. Archit. Comput., vol. 97, no. October 2018, pp. 1–7, 2019, doi: 10.1016/j.sysarc.2019.01.017.

Toplam 20 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Mühendislik
Bölüm	Araştırma Makalesi
Yazarlar	Sercan Gülburun 0000-0001-5272-3911 Murat Dener 0000-0001-5746-6141
Gönderilme Tarihi	8 Temmuz 2021
Kabul Tarihi	31 Ağustos 2021
Yayımlanma Tarihi	30 Eylül 2021
DOI	https://doi.org/10.31202/ecjse.967919
IZ	https://izlik.org/JA56PS47HJ
Yayımlandığı Sayı	Yıl 2021 Cilt: 8 Sayı: 3

Kaynak Göster

IEEE	[1]S. Gülburun ve M. Dener, “Büyük Veri Ortamlarında Zararlı Yazılım Tespiti Kapsamında Makine Öğrenmesi Algoritmalarının Performansının İncelenmesi”, ECJSE, c. 8, sy 3, ss. 1536–1549, Eyl. 2021, doi: 10.31202/ecjse.967919.

Makale Dosyaları

Tam Metin

Açık Dergi Erişimi (BOAI)

Bu eser Creative Commons Atıf-GayriTicari 4.0 Uluslararası Lisansı ile lisanslanmıştır.