Research Article
BibTex RIS Cite

Anormal Trafik Tespiti için Veri Madenciliği Algoritmalarının Performans Analizi

Year 2019, Volume: 7 Issue: 2, 205 - 216, 25.05.2019
https://doi.org/10.21541/apjes.418519

Abstract

Bu çalışmada, bilgisayar ağ trafiğinde tehlike oluşturabilecek zararlı trafiğin tespit edilmesi için kullanılan veri madenciliği
algoritmalarının performans değerlendirilmesi gerçekleştirilmiştir. İlk olarak, farklı özellik çıkarım algoritmaları ile NSL-KDD
veri setinden nitelik çıkarım işlemi gerçekleştirilmiştir. Bu işlem sonucunda farklı niteliklerden oluşan yeni veri setleri
oluşturulmuştur. Bu veri setleri üzerinde farklı veri madenciliği algoritmaları kullanılarak anormal trafik tespiti için testler
yapılmıştır. Yapılan testler sonucunda, farklı veri madenciliği ve özellik çıkarım algoritmalarının performans değerlendirmesi
sunulmuştur.

References

  • [1] R. Deng, P. Zhuang, and H. Liang, “CCPA: Coordinated Cyber-Physical Attacks and Countermeasures in Smart Grid,” IEEE Transactions on Smart Grid, vol. 8, no. 5, pp. 2420–2430, 2017.
  • [2] S. Wang, A. Zhou, M. Yang, L. Sun, C.-H. Hsu, and F. Yang, “Service Composition in Cyber-Physical-Social Systems,” IEEE Transactions on Emerging Topics in Computing, pp. 1–1, 2017.
  • [3] L. Qi, W. Dou, Y. Zhou, J. Yu, and C. Hu, “A Context-aware Service Evaluation Approach over Big Data for Cloud Applications,” IEEE Transactions on Cloud Computing, pp. 1–1, 2015.
  • [4] D. Denning, “An Intrusion-Detection Model,” IEEE Transactions on Software Engineering, vol. SE-13, no. 2, pp. 222–232, 1987.
  • [5] A. Milenkoski, M. Vieira, S. Kounev, A. Avritzer, and B. D. Payne, “Evaluating Computer Intrusion Detection Systems,” ACM Computing Surveys, vol. 48, no. 1, pp. 1–41, 2015.
  • [6] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection,” ACM Computing Surveys, vol. 41, no. 3, pp. 1–58, Jan. 2009.
  • [7] K. Julisch, “Data Mining for Intrusion Detection,” Advances in Information Security Applications of Data Mining in Computer Security, pp. 33–62, 2002.
  • [8] W. Lee, S. Stolfo, and K. Mok, “A data mining framework for building intrusion detection models,” Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).
  • [9] N. B. Amor, S. Benferhat, and Z. Elouedi, “Naive Bayes vs decision trees in intrusion detection systems,” Proceedings of the 2004 ACM symposium on Applied computing - SAC 04, 2004.
  • [10] M. Blowers and J. Williams, “Machine Learning Applied to Cyber Operations,” Advances in Information Security Network Science and Cybersecurity, pp. 155–175, 2013.
  • [11] G. Nadiammai and M. Hemalatha, “Effective approach toward Intrusion Detection System using data mining techniques,” Egyptian Informatics Journal, vol. 15, no. 1, pp. 37–50, 2014.
  • [12] J. Patel, K. Panchal, “Effective Intrusion Detection System using Data Mining Technique”, Journal of Emerging Technologies and Innovative Research (JETIR), Vol. 2, no. 6, pp 1869- 1878, 2015.
  • [13] Y. Li, J. Xia, S. Zhang, J. Yan, X. Ai, and K. Dai, “An efficient intrusion detection system based on support vector machines and gradually feature removal method,” Expert Systems with Applications, vol. 39, no. 1, pp. 424–430, 2012.
  • [14] S.-J. Horng, M.-Y. Su, Y.-H. Chen, T.-W. Kao, R.-J. Chen, J.-L. Lai, and C. D. Perkasa, “A novel intrusion detection system based on hierarchical clustering and support vector machines,” Expert Systems with Applications, vol. 38, no. 1, pp. 306–313, 2011.
  • [15] W.-C. Lin, S.-W. Ke, and C.-F. Tsai, “CANN: An intrusion detection system based on combining cluster centers and nearest neighbors,” Knowledge-Based Systems, vol. 78, pp. 13–21, 2015.
  • [16] C.-B. Jiang, I.-H. Liu, Y.-N. Chung, and J.-S. Li, “Novel intrusion prediction mechanism based on honeypot log similarity,” International Journal of Network Management, vol. 26, no. 3, pp. 156–175, Dec. 2016.
  • [17] B. K. Kumar, A. Bhaskar, “Identifying Network Anomalies Using Clustering Technique in Weblog Data”, International Journal of Computers & Technology, Vol. 2 No. 3,2012.
  • [18] V. Sharma and A. Nema, “Innovative Genetic Approach for Intrusion Detection by Using Decision Tree,” 2013 International Conference on Communication Systems and Network Technologies, 2013.
  • [19] A. Ashoor, S. Gore , “Intrusion Detection System (IDS): Case Study”, International Conference on Advanced Materials Engineering,2011.
  • [20] J. Stenico, L. Ling, “Network Traffic Monitoring and Analysis”, The State of the Art in Intrusion Prevention and Detection, 23-46, 2014.
  • [21] The KDD CUP 1999 Data. 1999, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (Erişim zamanı; Nisan, 2, 2018)
  • [22] NSL-KDD, http:// unb.ca/cic/datasets/nsl.html (Erişim zamanı; Nisan, 3, 2018)
  • [23] M. Hall, E. Frank, J. Holmes, B. Pfahringer, P. Reutemann, and I. Witten, “The WEKA data mining software: An update,” ACM SIGKDD Explor. Newslett., vol. 11, no. 1, pp. 10–18, 2009.
  • [24] R Language Definition, R Core Team, ftp://155.232.191.133/cran/doc/manuals/r-devel/R-lang.pdf (Erişim zamanı; Nisan, 3, 2018)
  • [25] M. Graczyk, T. Lasota, and B. Trawiński, “Comparative Analysis of Premises Valuation Models Using KEEL, RapidMiner, and WEKA,” Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems Lecture Notes in Computer Science, pp. 800–812, 2009.
  • [26] M. A. Hall, L. A. Smith, “Practical feature subset selection for machine learning”. In C. McDonald(Ed.), Computer Science ’98 Proceedings, pp. 181-191,1998.
  • [27] H,Almuallim, T. G. Dietterich, “Efficient algorithms for identifying relevant features”. In Proc. of the 9th Canadian Conference on Artificial Intelligence, pp. 38-45, 1991.
  • [28] K. Kenji, L. A. Rendell. "The feature selection problem: Traditional methods and a new algorithm." AAAI'92 Proceedings of the tenth national conference on Artificial intelligence, pp. 129-134. 1992.
  • [29] G. H. John, R. Kohavi, and K. Pfleger, “Irrelevant Features and the Subset Selection Problem,” Machine Learning Proceedings 1994, pp. 121–129, 1994.
  • [30] R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artificial Intelligence, vol. 97, no. 1-2, pp. 273–324, 1997.
  • [31] D. Sanmay. "Filters, wrappers and a boosting-based hybrid for feature selection." In ICML, vol. 1, pp. 74-81. 2001.
  • [32] M. Dash, K. Choi, P. Scheuermann, and H. Liu, “Feature selection for clustering - a filter solution,” 2002 IEEE International Conference on Data Mining, 2002. Proceedings.
  • [33] T. S. Chou, K. K. Yen, and J. Luo,”Network Intrusion Detection Design Using Feature Selection of Soft Computing Paradigms”, International Journal of Computational Intelligence, vol. 4, no. 3, pp.196-208,2008.
  • [34] K. Selvakuberan, M. Indradevi, Dr. R. Rajaram “Combined Feature Selection and classification – A novel approach for the categorization of web pages”, Journal of Information and Computing Science, Vol. 3, No. 2, pp. 83-89, 2008.
  • [35] H. Abdi and L. J. Williams, “Principal component analysis,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 4, pp. 433–459, 2010.
  • [36] T. Metsalu and J. Vilo, “ClustVis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap,” Nucleic Acids Research, vol. 43, no. W1, Dec. 2015.
  • [37] S. L. Scott, “A Bayesian paradigm for designing intrusion detection systems,” Computational Statistics & Data Analysis, vol. 45, no. 1, pp. 69–83, 2004.
  • [38] D. Mladenic, M. Grobelnik, “Feature selection for unbalanced class distribution and naive bayes”, In ICML Vol. 99, pp. 258-267,1999.
  • [39] K. Alsubhi, I. Aib, and R. Boutaba, “FuzMet: a fuzzy-logic based alert prioritization engine for intrusion detection systems,” International Journal of Network Management, vol. 22, no. 4, pp. 263–284, 2011.
  • [40] R. Quinlan, C4.5: Programs for Machine Learning. San Mateo, CA, USA: Morgan Kaufmann, 1993.
  • [41] J. Cannady, “Artificial neural networks for misuse detection”, In: Proceedings of the National Information Systems Security Conference; 368-381,1998.
  • [42] Z. Zhang and H. Shen, “Application of online-training SVMs for real-time intrusion detection with different considerations,” Computer Communications, vol. 28, no. 12, pp. 1428–1442, 2005.
  • [43] T. Denœux, “A k-Nearest Neighbor Classification Rule Based on Dempster-Shafer Theory,” IEEE transactions on systems, man, and cybernetics, vol. 25, no.5, pp. 804-813, 1995.
  • [44] J. A. Hartigan, M. A. Wong, “A k-means clustering algorithm. Journal of the Royal Statistical Society”, Series C (Applied Statistics), vol. 28, no. 1, pp.100-108, 1979.
  • [45] E. Alpaydin, “Introduction to machine learning,”MIT Press, 2004.
  • [46] J. Han, and M. Kamber, “Data mining: concepts and techniques’” (2nd ed.). Morgan Kaufmann Publishers, 2006.
  • [47] Y. Liao and V. Vemuri, “Use of K-Nearest Neighbor classifier for intrusion detection,” Computers & Security, vol. 21, no. 5, pp. 439–448, 2002.
  • [48] N. Japkowicz, M. Shah, “Evaluating learning algorithms: a classification perspective”, Cambridge University Press, 2011.
  • [49] T. R. Patil, S. S. Sherekar, “Performance analysis of Naive Bayes and J48 classification algorithm for data classification”, International Journal of Computer Science and Applications, vol. 6, no. 2, pp.256-261, 2013.
  • [50] X. Deng, Q. Liu, Y. Deng, and S. Mahadevan, “An improved method to construct basic probability assignment based on the confusion matrix for classification problem,” Information Sciences, vol. 340-341, pp. 250–261, 2016.
  • [51] Y. Liu, J. Cheng, C. Yan, X. Wu, and F. Chen, “Research on the Matthews Correlation Coefficients Metrics of Personalized Recommendation Algorithm Evaluation,” International Journal of Hybrid Information Technology, vol. 8, no. 1, pp. 163–172, 2015.
  • [52] J. A. Swets, “ROC Analysis Applied to the Evaluation of Medical Imaging echniques,” Investigative Radiology, vol. 14, no. 2, pp. 109–121, 1979.

The Performance Analysis of Data Mining Algorithms for Anomaly Detection

Year 2019, Volume: 7 Issue: 2, 205 - 216, 25.05.2019
https://doi.org/10.21541/apjes.418519

Abstract

In this study, performance evaluation of data mining algorithms used to detect harmful computer network traffic is
realized. Firstly, feature selection process is performed from the NSL-KDD dataset with different feature selection algorithms.
As a result of this process, different datasets are created by combining different attributes. Performans tests are conducted for
the detection of anormal traffic using different data mining algorithms on these data sets. As a result of the tests, performance
evaluation of different data mining and feature selection algorithms is presented.

References

  • [1] R. Deng, P. Zhuang, and H. Liang, “CCPA: Coordinated Cyber-Physical Attacks and Countermeasures in Smart Grid,” IEEE Transactions on Smart Grid, vol. 8, no. 5, pp. 2420–2430, 2017.
  • [2] S. Wang, A. Zhou, M. Yang, L. Sun, C.-H. Hsu, and F. Yang, “Service Composition in Cyber-Physical-Social Systems,” IEEE Transactions on Emerging Topics in Computing, pp. 1–1, 2017.
  • [3] L. Qi, W. Dou, Y. Zhou, J. Yu, and C. Hu, “A Context-aware Service Evaluation Approach over Big Data for Cloud Applications,” IEEE Transactions on Cloud Computing, pp. 1–1, 2015.
  • [4] D. Denning, “An Intrusion-Detection Model,” IEEE Transactions on Software Engineering, vol. SE-13, no. 2, pp. 222–232, 1987.
  • [5] A. Milenkoski, M. Vieira, S. Kounev, A. Avritzer, and B. D. Payne, “Evaluating Computer Intrusion Detection Systems,” ACM Computing Surveys, vol. 48, no. 1, pp. 1–41, 2015.
  • [6] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection,” ACM Computing Surveys, vol. 41, no. 3, pp. 1–58, Jan. 2009.
  • [7] K. Julisch, “Data Mining for Intrusion Detection,” Advances in Information Security Applications of Data Mining in Computer Security, pp. 33–62, 2002.
  • [8] W. Lee, S. Stolfo, and K. Mok, “A data mining framework for building intrusion detection models,” Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).
  • [9] N. B. Amor, S. Benferhat, and Z. Elouedi, “Naive Bayes vs decision trees in intrusion detection systems,” Proceedings of the 2004 ACM symposium on Applied computing - SAC 04, 2004.
  • [10] M. Blowers and J. Williams, “Machine Learning Applied to Cyber Operations,” Advances in Information Security Network Science and Cybersecurity, pp. 155–175, 2013.
  • [11] G. Nadiammai and M. Hemalatha, “Effective approach toward Intrusion Detection System using data mining techniques,” Egyptian Informatics Journal, vol. 15, no. 1, pp. 37–50, 2014.
  • [12] J. Patel, K. Panchal, “Effective Intrusion Detection System using Data Mining Technique”, Journal of Emerging Technologies and Innovative Research (JETIR), Vol. 2, no. 6, pp 1869- 1878, 2015.
  • [13] Y. Li, J. Xia, S. Zhang, J. Yan, X. Ai, and K. Dai, “An efficient intrusion detection system based on support vector machines and gradually feature removal method,” Expert Systems with Applications, vol. 39, no. 1, pp. 424–430, 2012.
  • [14] S.-J. Horng, M.-Y. Su, Y.-H. Chen, T.-W. Kao, R.-J. Chen, J.-L. Lai, and C. D. Perkasa, “A novel intrusion detection system based on hierarchical clustering and support vector machines,” Expert Systems with Applications, vol. 38, no. 1, pp. 306–313, 2011.
  • [15] W.-C. Lin, S.-W. Ke, and C.-F. Tsai, “CANN: An intrusion detection system based on combining cluster centers and nearest neighbors,” Knowledge-Based Systems, vol. 78, pp. 13–21, 2015.
  • [16] C.-B. Jiang, I.-H. Liu, Y.-N. Chung, and J.-S. Li, “Novel intrusion prediction mechanism based on honeypot log similarity,” International Journal of Network Management, vol. 26, no. 3, pp. 156–175, Dec. 2016.
  • [17] B. K. Kumar, A. Bhaskar, “Identifying Network Anomalies Using Clustering Technique in Weblog Data”, International Journal of Computers & Technology, Vol. 2 No. 3,2012.
  • [18] V. Sharma and A. Nema, “Innovative Genetic Approach for Intrusion Detection by Using Decision Tree,” 2013 International Conference on Communication Systems and Network Technologies, 2013.
  • [19] A. Ashoor, S. Gore , “Intrusion Detection System (IDS): Case Study”, International Conference on Advanced Materials Engineering,2011.
  • [20] J. Stenico, L. Ling, “Network Traffic Monitoring and Analysis”, The State of the Art in Intrusion Prevention and Detection, 23-46, 2014.
  • [21] The KDD CUP 1999 Data. 1999, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (Erişim zamanı; Nisan, 2, 2018)
  • [22] NSL-KDD, http:// unb.ca/cic/datasets/nsl.html (Erişim zamanı; Nisan, 3, 2018)
  • [23] M. Hall, E. Frank, J. Holmes, B. Pfahringer, P. Reutemann, and I. Witten, “The WEKA data mining software: An update,” ACM SIGKDD Explor. Newslett., vol. 11, no. 1, pp. 10–18, 2009.
  • [24] R Language Definition, R Core Team, ftp://155.232.191.133/cran/doc/manuals/r-devel/R-lang.pdf (Erişim zamanı; Nisan, 3, 2018)
  • [25] M. Graczyk, T. Lasota, and B. Trawiński, “Comparative Analysis of Premises Valuation Models Using KEEL, RapidMiner, and WEKA,” Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems Lecture Notes in Computer Science, pp. 800–812, 2009.
  • [26] M. A. Hall, L. A. Smith, “Practical feature subset selection for machine learning”. In C. McDonald(Ed.), Computer Science ’98 Proceedings, pp. 181-191,1998.
  • [27] H,Almuallim, T. G. Dietterich, “Efficient algorithms for identifying relevant features”. In Proc. of the 9th Canadian Conference on Artificial Intelligence, pp. 38-45, 1991.
  • [28] K. Kenji, L. A. Rendell. "The feature selection problem: Traditional methods and a new algorithm." AAAI'92 Proceedings of the tenth national conference on Artificial intelligence, pp. 129-134. 1992.
  • [29] G. H. John, R. Kohavi, and K. Pfleger, “Irrelevant Features and the Subset Selection Problem,” Machine Learning Proceedings 1994, pp. 121–129, 1994.
  • [30] R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artificial Intelligence, vol. 97, no. 1-2, pp. 273–324, 1997.
  • [31] D. Sanmay. "Filters, wrappers and a boosting-based hybrid for feature selection." In ICML, vol. 1, pp. 74-81. 2001.
  • [32] M. Dash, K. Choi, P. Scheuermann, and H. Liu, “Feature selection for clustering - a filter solution,” 2002 IEEE International Conference on Data Mining, 2002. Proceedings.
  • [33] T. S. Chou, K. K. Yen, and J. Luo,”Network Intrusion Detection Design Using Feature Selection of Soft Computing Paradigms”, International Journal of Computational Intelligence, vol. 4, no. 3, pp.196-208,2008.
  • [34] K. Selvakuberan, M. Indradevi, Dr. R. Rajaram “Combined Feature Selection and classification – A novel approach for the categorization of web pages”, Journal of Information and Computing Science, Vol. 3, No. 2, pp. 83-89, 2008.
  • [35] H. Abdi and L. J. Williams, “Principal component analysis,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 4, pp. 433–459, 2010.
  • [36] T. Metsalu and J. Vilo, “ClustVis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap,” Nucleic Acids Research, vol. 43, no. W1, Dec. 2015.
  • [37] S. L. Scott, “A Bayesian paradigm for designing intrusion detection systems,” Computational Statistics & Data Analysis, vol. 45, no. 1, pp. 69–83, 2004.
  • [38] D. Mladenic, M. Grobelnik, “Feature selection for unbalanced class distribution and naive bayes”, In ICML Vol. 99, pp. 258-267,1999.
  • [39] K. Alsubhi, I. Aib, and R. Boutaba, “FuzMet: a fuzzy-logic based alert prioritization engine for intrusion detection systems,” International Journal of Network Management, vol. 22, no. 4, pp. 263–284, 2011.
  • [40] R. Quinlan, C4.5: Programs for Machine Learning. San Mateo, CA, USA: Morgan Kaufmann, 1993.
  • [41] J. Cannady, “Artificial neural networks for misuse detection”, In: Proceedings of the National Information Systems Security Conference; 368-381,1998.
  • [42] Z. Zhang and H. Shen, “Application of online-training SVMs for real-time intrusion detection with different considerations,” Computer Communications, vol. 28, no. 12, pp. 1428–1442, 2005.
  • [43] T. Denœux, “A k-Nearest Neighbor Classification Rule Based on Dempster-Shafer Theory,” IEEE transactions on systems, man, and cybernetics, vol. 25, no.5, pp. 804-813, 1995.
  • [44] J. A. Hartigan, M. A. Wong, “A k-means clustering algorithm. Journal of the Royal Statistical Society”, Series C (Applied Statistics), vol. 28, no. 1, pp.100-108, 1979.
  • [45] E. Alpaydin, “Introduction to machine learning,”MIT Press, 2004.
  • [46] J. Han, and M. Kamber, “Data mining: concepts and techniques’” (2nd ed.). Morgan Kaufmann Publishers, 2006.
  • [47] Y. Liao and V. Vemuri, “Use of K-Nearest Neighbor classifier for intrusion detection,” Computers & Security, vol. 21, no. 5, pp. 439–448, 2002.
  • [48] N. Japkowicz, M. Shah, “Evaluating learning algorithms: a classification perspective”, Cambridge University Press, 2011.
  • [49] T. R. Patil, S. S. Sherekar, “Performance analysis of Naive Bayes and J48 classification algorithm for data classification”, International Journal of Computer Science and Applications, vol. 6, no. 2, pp.256-261, 2013.
  • [50] X. Deng, Q. Liu, Y. Deng, and S. Mahadevan, “An improved method to construct basic probability assignment based on the confusion matrix for classification problem,” Information Sciences, vol. 340-341, pp. 250–261, 2016.
  • [51] Y. Liu, J. Cheng, C. Yan, X. Wu, and F. Chen, “Research on the Matthews Correlation Coefficients Metrics of Personalized Recommendation Algorithm Evaluation,” International Journal of Hybrid Information Technology, vol. 8, no. 1, pp. 163–172, 2015.
  • [52] J. A. Swets, “ROC Analysis Applied to the Evaluation of Medical Imaging echniques,” Investigative Radiology, vol. 14, no. 2, pp. 109–121, 1979.
There are 52 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Articles
Authors

Ünal Çavuşoğlu

Sezgin Kaçar

Publication Date May 25, 2019
Submission Date April 25, 2018
Published in Issue Year 2019 Volume: 7 Issue: 2

Cite

IEEE Ü. Çavuşoğlu and S. Kaçar, “Anormal Trafik Tespiti için Veri Madenciliği Algoritmalarının Performans Analizi”, APJES, vol. 7, no. 2, pp. 205–216, 2019, doi: 10.21541/apjes.418519.