Research Article
BibTex RIS Cite

Big Data Visualization for Cyber Security: BETH Dataset

Year 2022, Volume: 9 Issue: 4, 1572 - 1582, 31.12.2022
https://doi.org/10.31202/ecjse.1209586

Abstract

In this study, the literature on big data visualization for cyber security purposes was scanned and a purposeful data visualization study was carried out on a sample data set. When the visualization study carried out is compared with its counterpart in the literature, it reveals that if the visualization with the criteria suggested in this study is applied, the user (human) can read the graphics much more easily and it will be a facilitating way for attack detection. The criteria in the study are based on the use of current data sets such as BETH and the use of methods such as Principle Component Analysis (PCA).

References

  • [1]. Wang, Y., Kung, L., Byrd, TA. Big data analytics: “Understanding its capabilities and potential benefits for healthcare organizations”, Technological Forecasting and Social Change, Volume 126, P. 3-13, 2018.
  • [2]. Oussous, Ahmed, Benjelloun, Fatima-Zahra, Lahcen, Ayoub Ait, Belfkih, Samir, “Big data technologies: a survey”, J. King Saud Univ.-Comput. Inf. Sci. 30 (4), 431–448, 2018.
  • [3]. Yaqoob, I., Ahmed, E., Gani, A., Mokhtar, S., Imran, M., Guizani, S., “Mobile adhoc cloud: a survey”, Wireless Commun. Mobile Comput. 16 (16), 2572–2589, 2016.
  • [4]. Jianzheng Liu, Jie Li, Weifeng Li, Jiansheng Wu, “Rethinking big data: A review on the data quality and usage issues”, ISPRS Journal of Photogrammetry and Remote Sensing, Volume 115, Pages 134-142, 2016.
  • [5]. Javaid, N. “Integration of context awareness in Internet of Agricultural Things”, ICT Express, 2021.
  • [6]. Najada, H. A., Mahgoub, I. and Mohammed I., "Cyber Intrusion Prediction and Taxonomy System Using Deep Learning And Distributed Big Data Processing," 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 631-638, doi: 10.1109/SSCI.2018.8628685, 2018.
  • [7]. Johnsan, J., “Global number of web attacks blocked per day from 2015 to 2018(in 1,000s)”, statista.com/statistics/494961/web-attacks-blocked-per-day-worldwide/, 23 Aralık, 2021.
  • [8]. Lynch, K., “How Big Data Aids Cybersecurity”, 2019, https://www.isaca.org/resources/news-and-trends/isaca-now-blog/2019/how-big-data-aids-cybersecurity, 23 Aralık, 2021.
  • [9]. Jagpreet, K., Ramkumar K. R, “The recent trends in cyber security: A review”, Journal of King Saud University - Computer and Information Sciences, 2021.
  • [10]. Highnam, K., Arulkumaran, K., Hanif, Z.D., and Jennings, N.R, “BETH Dataset: Real Cybersecurity Data for Anomaly Detection Research”, ICML In workshop on Uncertainty and Robustness in Deep Learning 2021 and Conference on Applied Machine Learning for Information Security (CAMLIS 2021), 2021.
  • [11]. Ibrahim, A., Targio, H., Ibrar, Y., Nor Badrul, A., Salimah, M., Abdullah, G., Samee Ullah, K., “The rise of “big data” on cloud computing: Review and open research issues”, Information Systems, Volume 47, Pages 98-115, 2015.
  • [12]. Rin, H., Hironori, U., Asato, N., Keitaro, T., Yoshihisa, N., “Large scale log anomaly detection via spatial pooling”, Cognitive Robotics, Volume 1, Pages 188-196, 2021.
  • [13]. Amir, F. T. and Aaron, G., “Unsupervised log message anomaly detection”, ICT Express, Volume 6, Issue 3, Pages 229-237, 2020.
  • [14]. Tuncer, O., Ates, E., Zhang, Y., Turk, A., Brandt, J., Leung, V. J., and Coskun, A. K., “Diagnosing Performance Variations in HPC Applications Using Machine Learning”. In: Kunkel J.M., Yokota R., Balaji P., Keyes D. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science, vol 10266. Springer, 2017.
  • [15]. Wang, B., Shi, Y., Cheng, G., Wang, R., Yang, Z., and Dong, B., “Log-Based Anomaly Detection with the Improved K-Nearest Neighbor”, International Journal of Software Engineering and Knowledge Engineering, 2020.
  • [16]. Wittkopp, T., Wiesner, P., Scheinert, D. and Kao, O., “A Taxonomy of Anomalies in Log Data”. In AIOPS workshop 2021 co-located with ICSOC 2021, 2021.
  • [17]. Oliner, A. and Stearley, J., "What Supercomputers Say: A Study of Five System Logs," 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), pp. 575-584, 2007.
  • [18]. Muhammad, A., Sagheer, A., Khan, M.A., Areej, F., Muhammad Adnan, K. and Sang-Woong, L., “MapReduce Based Intelligent Model for Intrusion Detection Using Machine Learning Technique”, Journal of King Saud University - Computer and Information Sciences, 2021.
  • [19]. Lee, H. and Lee, S., “A Study on Security Event Detection in ESM Using Big Data and Deep Learning”, International Journal of Internet, Broadcasting and Communication Vol.13 No.3 42-49, 2021.
  • [20]. Tommaso, Z., Andrea, C., Lorenzo, S. and Andrea, B., “On the educated selection of unsupervised algorithms via attacks and anomaly classes”, Journal of Information Security and Applications, Volume 52, 2020.
  • [21]. Ma, Q., Huang, W., Jin Y. and Mao J., "Encrypted Traffic Classification Based on Traffic Reconstruction," 2021 4th International Conference on Artificial Intelligence and Big Data (ICAIBD), pp. 572-576, 2021.
  • [22]. Sahu, S. K., Mohapatra, D. P., Rout, J. K., Sahoo, K. S., & Luhach, A. K., “An ensemble-based scalable approach for intrusion detection using big data framework,” Big Data, 9(4), 303-321, 2021.
  • [23]. Pande, S., Aditya K., and Deepak G., "Recommendations for DDOS Attack-Based Intrusion Detection System Through Data Analysis." Proceedings of Second Doctoral Symposium on Computational Intelligence. Springer, Singapore, 2022.
  • [24]. KDD-99 Veri Seti. The Fifth International Conference on Knowledge Discovery and Data Mining Konferansında Sunulmuştur, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html adresinden erişilmiştir, 23 Aralık, 2021.
  • [25]. Sinha, A. and Rastogi, S. and Kaur, G., “Mining Anomalies in Large ISCX Dataset Using Machine Learning Algorithms in KNIME (April 28, 2018)”. Proceedings of 3rd International Conference on Internet of Things and Connected Technologies (ICIoTCT), 2018 held at Malaviya National Institute of Technology, Jaipur (India) on March 26-27, 2018.
  • [26]. Tavallaee, M., Bagheri, E., Lu, W. and A. Ghorbani, “A Detailed Analysis of the KDD CUP 99 Data Set,” Submitted to Second IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), 2009.
  • [27]. BETH Veri Seti Erişim. https://www.kaggle.com/katehighnam/beth-dataset adresinden erişilmiştir. Erişim Tarihi: 23/12/2021
  • [28]. Ghurab, M., Al-gaphari, G., Alshami, F., Alshamy, R. & Othman, S., “A Detailed Analysis of Benchmark Datasets for Network Intrusion Detection System”, 2021.
  • [29]. Shetty, SD. “Sentiment Analysis, Tweet Analysis and Visualization on Big Data Using Apache Spark and Hadoop”. IOP Conf. Ser.: Mater. Sci. Eng., 2021.
  • [30]. Zichan, R., Yuantian, M., Lei, P., Nicholas, P., Jun, Z., “Visualization of big data security: a case study on the KDD99 cup data set”, Digital Communications and Networks, Volume 3, Issue 4, Pages 250-259, 2017.
  • [31]. Microsoft. “Real-Time Streaming in Power BI”, https://docs.microsoft.com/en-us/power-bi/connect-data/service-real-time-streaming, 24 Aralık, 2021.
  • [32]. BETH_Veri_Seti_Örneğinde_PCA_Görselleştirmesi.ipynb, https://colab.research.google.com/drive/1Ll8riSCBEUhWleWVEPcKMexBNmYZIMQQ?usp=sharin, 24 Aralık, 2021. [33]. Test1.csv, https://www.dset.com.tr/wp-content/uploads/test1.csv, 24 Aralık, 2021.
Year 2022, Volume: 9 Issue: 4, 1572 - 1582, 31.12.2022
https://doi.org/10.31202/ecjse.1209586

Abstract

References

  • [1]. Wang, Y., Kung, L., Byrd, TA. Big data analytics: “Understanding its capabilities and potential benefits for healthcare organizations”, Technological Forecasting and Social Change, Volume 126, P. 3-13, 2018.
  • [2]. Oussous, Ahmed, Benjelloun, Fatima-Zahra, Lahcen, Ayoub Ait, Belfkih, Samir, “Big data technologies: a survey”, J. King Saud Univ.-Comput. Inf. Sci. 30 (4), 431–448, 2018.
  • [3]. Yaqoob, I., Ahmed, E., Gani, A., Mokhtar, S., Imran, M., Guizani, S., “Mobile adhoc cloud: a survey”, Wireless Commun. Mobile Comput. 16 (16), 2572–2589, 2016.
  • [4]. Jianzheng Liu, Jie Li, Weifeng Li, Jiansheng Wu, “Rethinking big data: A review on the data quality and usage issues”, ISPRS Journal of Photogrammetry and Remote Sensing, Volume 115, Pages 134-142, 2016.
  • [5]. Javaid, N. “Integration of context awareness in Internet of Agricultural Things”, ICT Express, 2021.
  • [6]. Najada, H. A., Mahgoub, I. and Mohammed I., "Cyber Intrusion Prediction and Taxonomy System Using Deep Learning And Distributed Big Data Processing," 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 631-638, doi: 10.1109/SSCI.2018.8628685, 2018.
  • [7]. Johnsan, J., “Global number of web attacks blocked per day from 2015 to 2018(in 1,000s)”, statista.com/statistics/494961/web-attacks-blocked-per-day-worldwide/, 23 Aralık, 2021.
  • [8]. Lynch, K., “How Big Data Aids Cybersecurity”, 2019, https://www.isaca.org/resources/news-and-trends/isaca-now-blog/2019/how-big-data-aids-cybersecurity, 23 Aralık, 2021.
  • [9]. Jagpreet, K., Ramkumar K. R, “The recent trends in cyber security: A review”, Journal of King Saud University - Computer and Information Sciences, 2021.
  • [10]. Highnam, K., Arulkumaran, K., Hanif, Z.D., and Jennings, N.R, “BETH Dataset: Real Cybersecurity Data for Anomaly Detection Research”, ICML In workshop on Uncertainty and Robustness in Deep Learning 2021 and Conference on Applied Machine Learning for Information Security (CAMLIS 2021), 2021.
  • [11]. Ibrahim, A., Targio, H., Ibrar, Y., Nor Badrul, A., Salimah, M., Abdullah, G., Samee Ullah, K., “The rise of “big data” on cloud computing: Review and open research issues”, Information Systems, Volume 47, Pages 98-115, 2015.
  • [12]. Rin, H., Hironori, U., Asato, N., Keitaro, T., Yoshihisa, N., “Large scale log anomaly detection via spatial pooling”, Cognitive Robotics, Volume 1, Pages 188-196, 2021.
  • [13]. Amir, F. T. and Aaron, G., “Unsupervised log message anomaly detection”, ICT Express, Volume 6, Issue 3, Pages 229-237, 2020.
  • [14]. Tuncer, O., Ates, E., Zhang, Y., Turk, A., Brandt, J., Leung, V. J., and Coskun, A. K., “Diagnosing Performance Variations in HPC Applications Using Machine Learning”. In: Kunkel J.M., Yokota R., Balaji P., Keyes D. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science, vol 10266. Springer, 2017.
  • [15]. Wang, B., Shi, Y., Cheng, G., Wang, R., Yang, Z., and Dong, B., “Log-Based Anomaly Detection with the Improved K-Nearest Neighbor”, International Journal of Software Engineering and Knowledge Engineering, 2020.
  • [16]. Wittkopp, T., Wiesner, P., Scheinert, D. and Kao, O., “A Taxonomy of Anomalies in Log Data”. In AIOPS workshop 2021 co-located with ICSOC 2021, 2021.
  • [17]. Oliner, A. and Stearley, J., "What Supercomputers Say: A Study of Five System Logs," 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), pp. 575-584, 2007.
  • [18]. Muhammad, A., Sagheer, A., Khan, M.A., Areej, F., Muhammad Adnan, K. and Sang-Woong, L., “MapReduce Based Intelligent Model for Intrusion Detection Using Machine Learning Technique”, Journal of King Saud University - Computer and Information Sciences, 2021.
  • [19]. Lee, H. and Lee, S., “A Study on Security Event Detection in ESM Using Big Data and Deep Learning”, International Journal of Internet, Broadcasting and Communication Vol.13 No.3 42-49, 2021.
  • [20]. Tommaso, Z., Andrea, C., Lorenzo, S. and Andrea, B., “On the educated selection of unsupervised algorithms via attacks and anomaly classes”, Journal of Information Security and Applications, Volume 52, 2020.
  • [21]. Ma, Q., Huang, W., Jin Y. and Mao J., "Encrypted Traffic Classification Based on Traffic Reconstruction," 2021 4th International Conference on Artificial Intelligence and Big Data (ICAIBD), pp. 572-576, 2021.
  • [22]. Sahu, S. K., Mohapatra, D. P., Rout, J. K., Sahoo, K. S., & Luhach, A. K., “An ensemble-based scalable approach for intrusion detection using big data framework,” Big Data, 9(4), 303-321, 2021.
  • [23]. Pande, S., Aditya K., and Deepak G., "Recommendations for DDOS Attack-Based Intrusion Detection System Through Data Analysis." Proceedings of Second Doctoral Symposium on Computational Intelligence. Springer, Singapore, 2022.
  • [24]. KDD-99 Veri Seti. The Fifth International Conference on Knowledge Discovery and Data Mining Konferansında Sunulmuştur, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html adresinden erişilmiştir, 23 Aralık, 2021.
  • [25]. Sinha, A. and Rastogi, S. and Kaur, G., “Mining Anomalies in Large ISCX Dataset Using Machine Learning Algorithms in KNIME (April 28, 2018)”. Proceedings of 3rd International Conference on Internet of Things and Connected Technologies (ICIoTCT), 2018 held at Malaviya National Institute of Technology, Jaipur (India) on March 26-27, 2018.
  • [26]. Tavallaee, M., Bagheri, E., Lu, W. and A. Ghorbani, “A Detailed Analysis of the KDD CUP 99 Data Set,” Submitted to Second IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), 2009.
  • [27]. BETH Veri Seti Erişim. https://www.kaggle.com/katehighnam/beth-dataset adresinden erişilmiştir. Erişim Tarihi: 23/12/2021
  • [28]. Ghurab, M., Al-gaphari, G., Alshami, F., Alshamy, R. & Othman, S., “A Detailed Analysis of Benchmark Datasets for Network Intrusion Detection System”, 2021.
  • [29]. Shetty, SD. “Sentiment Analysis, Tweet Analysis and Visualization on Big Data Using Apache Spark and Hadoop”. IOP Conf. Ser.: Mater. Sci. Eng., 2021.
  • [30]. Zichan, R., Yuantian, M., Lei, P., Nicholas, P., Jun, Z., “Visualization of big data security: a case study on the KDD99 cup data set”, Digital Communications and Networks, Volume 3, Issue 4, Pages 250-259, 2017.
  • [31]. Microsoft. “Real-Time Streaming in Power BI”, https://docs.microsoft.com/en-us/power-bi/connect-data/service-real-time-streaming, 24 Aralık, 2021.
  • [32]. BETH_Veri_Seti_Örneğinde_PCA_Görselleştirmesi.ipynb, https://colab.research.google.com/drive/1Ll8riSCBEUhWleWVEPcKMexBNmYZIMQQ?usp=sharin, 24 Aralık, 2021. [33]. Test1.csv, https://www.dset.com.tr/wp-content/uploads/test1.csv, 24 Aralık, 2021.
There are 32 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Makaleler
Authors

Hamza Aytaç Doğanay 0000-0003-4816-4373

Abdullah Orman 0000-0002-3495-1897

Murat Dener 0000-0001-5746-6141

Publication Date December 31, 2022
Submission Date November 24, 2022
Acceptance Date December 4, 2022
Published in Issue Year 2022 Volume: 9 Issue: 4

Cite

IEEE H. A. Doğanay, A. Orman, and M. Dener, “Big Data Visualization for Cyber Security: BETH Dataset”, ECJSE, vol. 9, no. 4, pp. 1572–1582, 2022, doi: 10.31202/ecjse.1209586.