Saldırı Tespit Sistemlerinde K-Means Algoritması ve Silhouette Metriği ile Optimum Küme Sayısının Belirlenmesi

Fatih Topaloğlu

doi:10.17671/gazibtd.1412641

Research Article

Saldırı Tespit Sistemlerinde K-Means Algoritması ve Silhouette Metriği ile Optimum Küme Sayısının Belirlenmesi

Year 2024, Volume: 17 Issue: 2, 71 - 79, 30.04.2024

Fatih Topaloğlu

https://doi.org/10.17671/gazibtd.1412641

Abstract

Günümüz internetleri neredeyse yarım milyon farklı ağdan oluşmaktadır. Bir ağ bağlantısında, saldırıları türlerine göre tanımlamak zordur. Çünkü farklı saldırılar çeşitli bağlantılara sahip olabilir ve sayıları birkaç ağ bağlantısından yüzlerce ağ bağlantısına kadar değişebilmektedir. Bu nedenden dolayı saldırı tespiti için kullanılan veri setlerinin doğru sınıflandırılması zorlaşmaktadır. Geçmişte pek çok araştırmacı, farklı yöntemler kullanarak davetsiz misafirleri tespit etmek için saldırı tespit sistemleri geliştirmiştir. Ancak mevcut yöntemlerin tespit doğruluğu ve zaman kaybı açısından bazı dezavantajları bulunmaktadır. Çalışmanın temel motivasyonu, saldırı tespit sistemlerinde yüksek boyutluluğun getirdiği zorlukların üstesinden gelmek ve sınıflandırma performansını geliştirmek, sonuçta izinsiz girişlerin daha doğru ve verimli tespitini sağlamaktır. Çalışmada KDD Cup’99 saldırı tespiti veri setinin k-means kümeleme algoritması ile farklı k değerlerine göre analiz edilmesi ve silhouette metriği ile optimum küme sayısının belirlenmesi amaçlanmıştır. Çalışmada farklı k değerleri için yapılan analizlerde, k=10’a kadar olası her konfigürasyon için silhouette skoru hesaplanmıştır. Bu metriğe göre en iyi küme sayısı 4 ve silhouette skoru 0.83 olarak bulunmuştur. Ayrıca silhouette grafiği kalınlıkları ile küme boyutları görselleştirilmiştir.

Keywords

Saldırı tespit sistemleri, k-means, silhouette metriği

References

M. Baykara, R. Daş, "SoftSwitch: a centralized honeypot-based security approach using software-defined switching for secure management of VLAN networks," Turkish Journal of Electrical Engineering and Computer Sciences, Vol. 27, no. 5, pp. 3309-3325, 2019.
L. Hung-Jen, C.-h. R. Lin, “Intrusion detection system a comprehensive review”, Journal of network and applications, vol. 36, no. 1, pp. 16–24, 2013.
H. L. Motoda, H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, vol. 454, Springer, 1998.
L. D. S. Silva, A. C. Santos, T. D. Mancilha, J. D. Silva, A. Montes, “Detecting attack signatures in the real network traffic with ANNIDA”, Expert Systems with Applications, vol. 34, no. 4, pp. 2326–2333, 2008.
A. Patcha, J. M. Park, “An overview of anomaly detection techniques: existing solutions and latest technological trends”, Computer Networks, vol. 51, no. 12, pp. 3448–3470, 2007.
C. Manikopoulos, S. Papavassiliou, “Network intrusion and fault detection. A statistical anomaly approach,” IEEE Communications Magazine, vol. 40, no. 10, pp. 76–82, 2002.
P. Fournier-Viger, C. W. Lin, A. Gomariz et al., “The SPMF open-source data mining library version 2”, Joint European conference on machine learning and knowledge discovery in databases, pp. 36–40, Riva del Garda, Italy, 2016.
P. Fournier-Viger, J. C.-W. Lin, R. U. Kiran, Y. S. Koh, R. Thomas, “A survey of sequential pattern mining”, Data Science and Pattern Recognition, vol. 1, no. 1, pp. 54–77, 2017.
A. Smola, S.V.N. Vishwanathan, Introduction to Machine Learning, Cambridge University Press, ISBN-10: 0521825830, 2008.
Z. Xiaojin, Semi-Supervised Learning Literature Survey, vol. 2, Computer Science, University of Wisconsin, Madison, 2008.
S. Mukkamala, A. H. Sung, A. Abraham, “Modeling intrusion detection systems using linear genetic programming approach,” in The 17th international conference on industrial & engineering applications of artificial intelligence and expert systems, innovations in applied artificial intelligence, pp. 633–642, Berlin, Heidelberg, 2004.
J. Pearl, “Bayesian networks. A model of self-activated memory for evidential reasoning,” in Proceedings of the 7th Conference of the Cognitive Science Society, University of California, pp. 329–334, Irvine, CA, 2009.
N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression (PDF),” The American Statistician, vol. 46, no. 3, pp. 175–185, 1992.
J. B. MacQueen, “Some methods for classification and analysis of multivariate observations,” in 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297, University of California Press, 1967.
L. E. Baum, T. Petrie, “Statistical inference for probabilistic functions of finite state Markov chains,” The annals of mathematical statistics, vol. 37, no. 6, pp. 1554–1563, 1966.
M. Mohammed, M. B. Khan, E. B. Bashier, Machine Learning Algorithms and Applications, CRC press Taylor and Francis Group, ISBN-10: 1498705383, 2016.
J. Arif, F. Malik, K. Aslam, “A hybrid technique using binary particle swarm optimization and decision tree pruning for network intrusion detection”, Cluster Computing, vol. 21, pp. 667–680, 2017.
I. Ahmed, L. Saleh, M. Fatma, L. Talaat, “A hybrid intrusion detection system (HIDS) based on prioritized k-nearest neighbors and optimized SVM classifiers”, Artificial Intelligence Review, vol. 51, pp. 403–443, 2017.
D. Tirtharaj, “A study on intrusion detection using neural networks trained with evolutionary algorithms”, Soft Computing, vol. 21, pp. 2687–2700, 2017.
Y. Haipeng, W. Qiyi, “An intrusion detection framework based on hybrid multi-level data mining,” International Journal of Parallel Programming, vol. 47, pp. 740–758, 2017.
M. Suad, M. Fadl, “Intrusion detection model using machine learning algorithm on Big Data environment”, Journal of big data, vol. 5, pp. 1–12, 2018.
S. Ijaz, F. A. Hashmi, S. Asghar, M. Alam, “Vector based genetic algorithm to optimize predictive analysis in network security”, Applied intelligence, vol. 48, no. 5, pp. 1086–1096, 2018.
A. Mohammad, A. Nauman, “A P2P Botnet detection scheme based on decision tree and adaptive multilayer neural networks”, Neural Computing & Applications, vol. 29, pp. 991–1004, 2018.
V. Sivakumar, S. Rajalakshmi, “Optimal and novel hybrid feature selection framework for effective data classification,” in Advances in Systems, Control and Automation, pp. 499–514, Springer, Singapore, 2018.
K. Neeraj, K. Upendra, “Knowledge computational intelligence in network intrusion detection systems”, Knowledge Computing and Its Applications, pp. 161–176, Springer, Singapore, 2018.
C. Unal, “A new hybrid approach for intrusion detection using machine learning methods”, Applied Intelligence, vol. 49, pp. 2735–2761, 2019.
S. Akash, S. Khushboo, “Hybrid technique based on DBSCAN for selection of improved features for intrusion detection system”, in Emerging Trends in Expert Applications and Security, pp. 365–377, Springer, Singapore, 2019.
P. Kar, S. Banerjee, K. C. Mondal, G. Mahapatra, S. Chattopadhyay, “A hybrid intrusion detection system for hierarchical filtration of anomalies”, Information and Communication Technology for Intelligent Systems, vol. 106, pp. 417–426, Springer, Singapore, 2019.
M. Baykara, R. Daş, " A novel honeypot based security approach for real-time intrusion detection and prevention systems," Journal of Information Security and Applications (JISA), Vol.41, pp. 103-116, 2018.
V. Dutta, M. Choras, R. Kozik, M. Pawlicki, “Hybrid model for improving the classification effectiveness on network intrusion detection system”, in Conference on Complex, Intelligent, and Software Intensive Systems, Cham, 2020.
M. Latah, L. Toker, “An efficient flow-based multi-level hybrid intrusion detection system for software-defined networks”, CCF Transactions on Networking, vol. 3, pp. 26–271, 2020.
I. Sumaiya Thaseen, J. Saira Banu, K. Lavanya, M. Rukunuddin Ghalib, K. Abhishek, “An integrated intrusion detection system using correlation-based attribute selection and artificial neural network”, Transactions on Emerging Telecommunications Technologies, vol. 32, no. 2, article e4014, 2021.
M. Safaldin, M. Qtair, L. Abualigah, “Improved binary gray wolf optimizer and SVM for intrusion detection system in wireless sensor networks”, Journal of Ambient Intelligence and Humanized Computing, vol. 12, no. 2, pp. 1559–1576, 2021.
G. Vallathan, A. John, C. Thirumalai, “Suspicious activity detection using deep learning in secure assisted living IoT environments”, The Journal of Supercomputing, vol. 77, pp. 3242–3260, 2021.
M. Baykara, R. Daş, " A Novel Hybrid Approach for Detection of WebBased Attacks in Intrusion Detection Systems," International Journal of Computer Networks and Applications (IJCNA,) Vol.4, no. 2, pp. 62-76, 2017.
M. Ishaque, Md G. Md Johar, A. Khatibi, M. Yamin, “A novel hybrid technique using fuzzy logic, neural networks and genetic algorithm for intrusion detection system,” Measurement: Sensors, Vol.30, pp. 1-12 ,2023.
F. Nabi, X. Zhou, “Enhancing intrusion detection systems through dimensionality reduction: A comparative study of machine learning techniques for cyber security”, Cyber Security and Applications, Vol.2, pp. 1-8, 2024.
N. O. Aljehane, H. A. Mengash, M. M. Eltahir, F. A. Alotaibi, S. S. Aljameel, A. Yafoz, R. Alsini, M. Assiri, “Golden jackal optimization algorithm with deep learning assisted intrusion detection system for network security”, Alexandria Engineering Journal, Vol.86, pp. 415-424, 2024.
S. Fraihat, S. Makhadmeh, M. Awad, M. A. Al-Betar, A. Al-Redhaei, “Intrusion detection system for large-scale IoT NetFlow networks using machine learning with modified Arithmetic Optimization Algorithm”, Internet of Things, Vol. 22, pp. 1-22, 2023.
K. Pramilarani, P. V. Kumari, “Cost based Random Forest Classifier for Intrusion Detection System in Internet of Things”, Applied Soft Computing, Vol. 151, pp. 1-8, 2024.
T. Al Nuaimi, S. Al Zaabi, M. Alyilieli, M. AlMaskari, S. Alblooshi, F. Alhabsi, M. F. Bin Yusof, A. Al Badawi, “A comparative evaluation of intrusion detection systems on the edge-IIoT-2022 dataset”, Intelligent Systems with Applications, Vol.20, pp. 1-10, 2023.
Z. Sun, G. An, Y. Yang, Y. Liu, “Optimized machine learning enabled intrusion detection 2 system for internet of medical things”, Franklin Open, Vol.6, pp. 1-11, 2024.
M. S. Korium, M. Saber, A. Beattie, A. Narayanan, S. Sahoo, P. H.J. Nardelli, “Intrusion detection system for cyberattacks in the Internet of Vehicles environment”, Ad Hoc Networks, Vol. 153, pp. 1-16, 2024.
M. Tavallaee, N. Stakhanova, A. A. Ghorbani, "Toward Credible Evaluation of Anomaly-Based Intrusion-Detection Methods", Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 40, no. 5, pp. 516-524, 2010.

Determining the Optimum Number of Clusters with K-Means Algorithm and Silhouette Metric in Intrusion Detection Systems

Year 2024, Volume: 17 Issue: 2, 71 - 79, 30.04.2024

Fatih Topaloğlu

https://doi.org/10.17671/gazibtd.1412641

Abstract

TToday's internet consists of almost half a million different networks. In a network connection, it is difficult to identify attacks by type. Because different attacks can have various connections and their number can vary from a few network connections to hundreds of network connections. For this reason, it becomes difficult to correctly classify the data sets used for attack detection. The main motivation of the study is to overcome the challenges of high dimensionality in intrusion detection systems and improve classification performance, ultimately providing more accurate and efficient detection of intrusions. In the past, many researchers have developed intrusion detection systems to detect intruders using different methods. However, existing methods have some disadvantages in terms of detection accuracy and time loss. In the study, it was aimed to analyze the KDD Cup'99 attack detection data set according to different k values with the k-means clustering algorithm and to determine the optimum number of clusters with the silhouette metric. In the analysis carried out for different k values in the study, the silhouette score was calculated for each possible configuration up to k = 10. According to this metric, the best number of clusters was found to be 4 and the silhouette score was 0.83. Additionally, silhouette graphic thicknesses and cluster sizes are visualized.

Keywords

Intrusion detection system, k-means, silhouette metric

References

M. Baykara, R. Daş, "SoftSwitch: a centralized honeypot-based security approach using software-defined switching for secure management of VLAN networks," Turkish Journal of Electrical Engineering and Computer Sciences, Vol. 27, no. 5, pp. 3309-3325, 2019.
L. Hung-Jen, C.-h. R. Lin, “Intrusion detection system a comprehensive review”, Journal of network and applications, vol. 36, no. 1, pp. 16–24, 2013.
H. L. Motoda, H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, vol. 454, Springer, 1998.
L. D. S. Silva, A. C. Santos, T. D. Mancilha, J. D. Silva, A. Montes, “Detecting attack signatures in the real network traffic with ANNIDA”, Expert Systems with Applications, vol. 34, no. 4, pp. 2326–2333, 2008.
A. Patcha, J. M. Park, “An overview of anomaly detection techniques: existing solutions and latest technological trends”, Computer Networks, vol. 51, no. 12, pp. 3448–3470, 2007.
C. Manikopoulos, S. Papavassiliou, “Network intrusion and fault detection. A statistical anomaly approach,” IEEE Communications Magazine, vol. 40, no. 10, pp. 76–82, 2002.
P. Fournier-Viger, C. W. Lin, A. Gomariz et al., “The SPMF open-source data mining library version 2”, Joint European conference on machine learning and knowledge discovery in databases, pp. 36–40, Riva del Garda, Italy, 2016.
P. Fournier-Viger, J. C.-W. Lin, R. U. Kiran, Y. S. Koh, R. Thomas, “A survey of sequential pattern mining”, Data Science and Pattern Recognition, vol. 1, no. 1, pp. 54–77, 2017.
A. Smola, S.V.N. Vishwanathan, Introduction to Machine Learning, Cambridge University Press, ISBN-10: 0521825830, 2008.
Z. Xiaojin, Semi-Supervised Learning Literature Survey, vol. 2, Computer Science, University of Wisconsin, Madison, 2008.
S. Mukkamala, A. H. Sung, A. Abraham, “Modeling intrusion detection systems using linear genetic programming approach,” in The 17th international conference on industrial & engineering applications of artificial intelligence and expert systems, innovations in applied artificial intelligence, pp. 633–642, Berlin, Heidelberg, 2004.
J. Pearl, “Bayesian networks. A model of self-activated memory for evidential reasoning,” in Proceedings of the 7th Conference of the Cognitive Science Society, University of California, pp. 329–334, Irvine, CA, 2009.
N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression (PDF),” The American Statistician, vol. 46, no. 3, pp. 175–185, 1992.
J. B. MacQueen, “Some methods for classification and analysis of multivariate observations,” in 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297, University of California Press, 1967.
L. E. Baum, T. Petrie, “Statistical inference for probabilistic functions of finite state Markov chains,” The annals of mathematical statistics, vol. 37, no. 6, pp. 1554–1563, 1966.
M. Mohammed, M. B. Khan, E. B. Bashier, Machine Learning Algorithms and Applications, CRC press Taylor and Francis Group, ISBN-10: 1498705383, 2016.
J. Arif, F. Malik, K. Aslam, “A hybrid technique using binary particle swarm optimization and decision tree pruning for network intrusion detection”, Cluster Computing, vol. 21, pp. 667–680, 2017.
I. Ahmed, L. Saleh, M. Fatma, L. Talaat, “A hybrid intrusion detection system (HIDS) based on prioritized k-nearest neighbors and optimized SVM classifiers”, Artificial Intelligence Review, vol. 51, pp. 403–443, 2017.
D. Tirtharaj, “A study on intrusion detection using neural networks trained with evolutionary algorithms”, Soft Computing, vol. 21, pp. 2687–2700, 2017.
Y. Haipeng, W. Qiyi, “An intrusion detection framework based on hybrid multi-level data mining,” International Journal of Parallel Programming, vol. 47, pp. 740–758, 2017.
M. Suad, M. Fadl, “Intrusion detection model using machine learning algorithm on Big Data environment”, Journal of big data, vol. 5, pp. 1–12, 2018.
S. Ijaz, F. A. Hashmi, S. Asghar, M. Alam, “Vector based genetic algorithm to optimize predictive analysis in network security”, Applied intelligence, vol. 48, no. 5, pp. 1086–1096, 2018.
A. Mohammad, A. Nauman, “A P2P Botnet detection scheme based on decision tree and adaptive multilayer neural networks”, Neural Computing & Applications, vol. 29, pp. 991–1004, 2018.
V. Sivakumar, S. Rajalakshmi, “Optimal and novel hybrid feature selection framework for effective data classification,” in Advances in Systems, Control and Automation, pp. 499–514, Springer, Singapore, 2018.
K. Neeraj, K. Upendra, “Knowledge computational intelligence in network intrusion detection systems”, Knowledge Computing and Its Applications, pp. 161–176, Springer, Singapore, 2018.
C. Unal, “A new hybrid approach for intrusion detection using machine learning methods”, Applied Intelligence, vol. 49, pp. 2735–2761, 2019.
S. Akash, S. Khushboo, “Hybrid technique based on DBSCAN for selection of improved features for intrusion detection system”, in Emerging Trends in Expert Applications and Security, pp. 365–377, Springer, Singapore, 2019.
P. Kar, S. Banerjee, K. C. Mondal, G. Mahapatra, S. Chattopadhyay, “A hybrid intrusion detection system for hierarchical filtration of anomalies”, Information and Communication Technology for Intelligent Systems, vol. 106, pp. 417–426, Springer, Singapore, 2019.
M. Baykara, R. Daş, " A novel honeypot based security approach for real-time intrusion detection and prevention systems," Journal of Information Security and Applications (JISA), Vol.41, pp. 103-116, 2018.
V. Dutta, M. Choras, R. Kozik, M. Pawlicki, “Hybrid model for improving the classification effectiveness on network intrusion detection system”, in Conference on Complex, Intelligent, and Software Intensive Systems, Cham, 2020.
M. Latah, L. Toker, “An efficient flow-based multi-level hybrid intrusion detection system for software-defined networks”, CCF Transactions on Networking, vol. 3, pp. 26–271, 2020.
I. Sumaiya Thaseen, J. Saira Banu, K. Lavanya, M. Rukunuddin Ghalib, K. Abhishek, “An integrated intrusion detection system using correlation-based attribute selection and artificial neural network”, Transactions on Emerging Telecommunications Technologies, vol. 32, no. 2, article e4014, 2021.
M. Safaldin, M. Qtair, L. Abualigah, “Improved binary gray wolf optimizer and SVM for intrusion detection system in wireless sensor networks”, Journal of Ambient Intelligence and Humanized Computing, vol. 12, no. 2, pp. 1559–1576, 2021.
G. Vallathan, A. John, C. Thirumalai, “Suspicious activity detection using deep learning in secure assisted living IoT environments”, The Journal of Supercomputing, vol. 77, pp. 3242–3260, 2021.
M. Baykara, R. Daş, " A Novel Hybrid Approach for Detection of WebBased Attacks in Intrusion Detection Systems," International Journal of Computer Networks and Applications (IJCNA,) Vol.4, no. 2, pp. 62-76, 2017.
M. Ishaque, Md G. Md Johar, A. Khatibi, M. Yamin, “A novel hybrid technique using fuzzy logic, neural networks and genetic algorithm for intrusion detection system,” Measurement: Sensors, Vol.30, pp. 1-12 ,2023.
F. Nabi, X. Zhou, “Enhancing intrusion detection systems through dimensionality reduction: A comparative study of machine learning techniques for cyber security”, Cyber Security and Applications, Vol.2, pp. 1-8, 2024.
N. O. Aljehane, H. A. Mengash, M. M. Eltahir, F. A. Alotaibi, S. S. Aljameel, A. Yafoz, R. Alsini, M. Assiri, “Golden jackal optimization algorithm with deep learning assisted intrusion detection system for network security”, Alexandria Engineering Journal, Vol.86, pp. 415-424, 2024.
S. Fraihat, S. Makhadmeh, M. Awad, M. A. Al-Betar, A. Al-Redhaei, “Intrusion detection system for large-scale IoT NetFlow networks using machine learning with modified Arithmetic Optimization Algorithm”, Internet of Things, Vol. 22, pp. 1-22, 2023.
K. Pramilarani, P. V. Kumari, “Cost based Random Forest Classifier for Intrusion Detection System in Internet of Things”, Applied Soft Computing, Vol. 151, pp. 1-8, 2024.
T. Al Nuaimi, S. Al Zaabi, M. Alyilieli, M. AlMaskari, S. Alblooshi, F. Alhabsi, M. F. Bin Yusof, A. Al Badawi, “A comparative evaluation of intrusion detection systems on the edge-IIoT-2022 dataset”, Intelligent Systems with Applications, Vol.20, pp. 1-10, 2023.
Z. Sun, G. An, Y. Yang, Y. Liu, “Optimized machine learning enabled intrusion detection 2 system for internet of medical things”, Franklin Open, Vol.6, pp. 1-11, 2024.
M. S. Korium, M. Saber, A. Beattie, A. Narayanan, S. Sahoo, P. H.J. Nardelli, “Intrusion detection system for cyberattacks in the Internet of Vehicles environment”, Ad Hoc Networks, Vol. 153, pp. 1-16, 2024.
M. Tavallaee, N. Stakhanova, A. A. Ghorbani, "Toward Credible Evaluation of Anomaly-Based Intrusion-Detection Methods", Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 40, no. 5, pp. 516-524, 2010.

There are 44 citations in total.

Details

Primary Language	Turkish
Subjects	Semi- and Unsupervised Learning
Journal Section	Articles
Authors	Fatih Topaloğlu 0000-0002-2089-5214
Publication Date	April 30, 2024
Submission Date	December 31, 2023
Acceptance Date	February 12, 2024
Published in Issue	Year 2024 Volume: 17 Issue: 2

Cite

APA	Topaloğlu, F. (2024). Saldırı Tespit Sistemlerinde K-Means Algoritması ve Silhouette Metriği ile Optimum Küme Sayısının Belirlenmesi. Bilişim Teknolojileri Dergisi, 17(2), 71-79. https://doi.org/10.17671/gazibtd.1412641

Download Cover Image

Article Files

Full Text