Research Article
BibTex RIS Cite

Intrusion Detection Model Based on TF.IDF and C4.5 Algorithms

Year 2021, Volume: 24 Issue: 4, 1691 - 1698, 01.12.2021
https://doi.org/10.2339/politeknik.693221

Abstract

In recent years, the use of machine learning and data mining technologies has drawn researchers’ attention to new ways to improve the performance of Intrusion Detection Systems (IDS). These techniques have proven to be an effective method in distinguishing malicious network packets. One of the most challenging problems that researchers are faced with is the transformation of data into a form that can be handled effectively by Machine Learning Algorithms (MLA). In this paper, we present an IDS model based on the decision tree C4.5 algorithm with transforming simulated UNSW-NB15 dataset as a pre-processing operation. Our model uses Term Frequency.Inverse Document Frequency (TF.IDF) to convert data types to an acceptable and efficient form for machine learning to achieve high detection performance. The model has been tested with randomly selected 250000 records of the UNSW-NB15 dataset. Selected records have been grouped into various segment sizes, like 50, 500, 1000, and 5000 items. Each segment has been, further, grouped into two subsets of multi and binary class datasets. The performance of the Decision Tree C4.5 algorithm with Multilayer Perceptron (MLP) and Naive Bayes (NB) has been compared in Weka software. Our proposed method significantly has improved the accuracy of classifiers and decreased incorrectly detected instances. The increase in accuracy reflects the efficiency of transforming the dataset with TF.IDF of various segment sizes.

References

  • Yu Z. Intrusion Detection: A Machine Learning Approach (Volume 3). London, UK: Imperial College Press, 2011.
  • Armin J, Thompson B, Ariu D, Giacinto G, Roli F, Kijewski P. 2020 cybercrime economic costs: No measure no solution. In: 10th International Conference on Availability, Reliability and Security (ARES); 24-27 Aug. 2015; Toulouse, France. New York, NY, USA: IEEE. pp.701-710.
  • Bhattacharyya DK, Kalita JK. Network anomaly detection: A machine learning perspective. New York, NY, USA: CRC Press, 2013.
  • Katkar VD, Bhatia DS. Lightweight approach for detection of denial of service attacks using numeric to binary preprocessing. In: IEEE International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA); 4-5 April 2014; Mumbai, India. New York, NY, USA: IEEE. pp. 207-212.
  • Mehmood T, Rais H. Machine learning algorithms in context of intrusion detection. In: IEEE 2016 Computer and Information Sciences (ICCOINS), International Conference; 15-17 Aug. 2016; Kuala Lumpur, Malaysia. New York, NY, USA: IEEE. pp. 369-373.
  • Mane D, Pawar S. Anomaly based IDS using Backpropagation Neural Network. International Journal of Computer Applications 2016; 136: 29-34.
  • Deshmukh DH, Ghorpade T, Padiya P. Intrusion detection system by improved preprocessing methods and Naive Bayes classifier using NSL-KDD 99 Dataset. In: IEEE 2014 International Conference on Electronics and Communication Systems (ICECS); 13-14 Feb. 2014; Coimbatore, India. New York, NY, USA: IEEE. pp. 1-7.
  • Mogal DG, Ghungrad SR, Bhusare BB. NIDS using Machine Learning Classifiers on UNSW-NB15 and KDDCUP99 Datasets. International Journal of Advanced Research in Computer and Communication Engineering 2017; 6: 533-537.
  • Dadgar SMH, Araghi MS, Farahani MM. A novel text mining approach based on TF-IDF and Support Vector Machine for news classification. In: IEEE 2016 International Conference on Engineering and Technology (ICETECH); 17-18 March 2016; Coimbatore, India. New York, NY, USA: IEEE. pp. 112-116.
  • Manning CD, Raghavan P, Schütze H. An Introduction to Information Retrieval. England: Cambridge University Press, 2009.
  • Leskovec J, Rajaraman A, Ullman JD. Mining of massive datasets. 5th ed. England: Cambridge University Press, 2014.
  • Aggarwal CC. Data mining: the textbook. New York, USA: Springer, 2015.
  • Zaki MJ, Meira W. Data mining and analysis: fundamental concepts and algorithms. 1st ed. New York, USA: Cambridge University Press, 2014.
  • Moustafa N, Slay J. The significant features of the UNSW-NB15 and the KDD99 data sets for Network Intrusion Detection Systems. In: IEEE 2015 4th Intl Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS); 5-5 Nov. 2015; Kyoto, Japan. USA: IEEE. pp. 25-31.
  • Moustafa N, Slay J. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: IEEE 2015 Military Communications and Information Systems Conference (MilCIS); 10-12 Nov 2015; Canberra, Australia. USA: IEEE pp. 1-6.
  • Hssina B., Merbouha A, Ezzikouri H, Erritali M. A comparative study of decision tree ID3 and C4.5. International Journal of Advanced Computer Science and Applications 2014; 4; 13-19.
  • Witten IH, Frank E, Hall MA, Pal CJ. Data Mining: Practical machine learning tools and techniques. 4th ed. Morgan Kaufmann, 2016.
  • Kumar V, Wu X. The Top Ten Algorithms in Data Mining (Chapman & Hall/CRC Data Mining and Knowledge Discovery). 1st ed. Chapman and Hall/CRC, 2009.
  • RRevathy R, Lawrance L. Comparative Analysis of C4.5 and C5.0 Algorithms on Crop Pest Data. International Journal of Innovative Research in Computer and Communication Engineering 2017; 5: 50-58.
  • Choudhury S, Bhowal A. Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection. In: IEEE 2015 International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM); 6-8 May 2015; Chennai, India. USA: IEEE. pp. 89-95.
  • Garg T, Khurana SS. Comparison of classification techniques for intrusion detection dataset using WEKA. In: IEEE 2014 International Conference on Recent Advances and Innovations in Engineering (ICRAIE-2014); 09-11 May 2014; Jaipur, India. USA: IEEE. pp.1-5.
  • Gadal SM, Mokhtar R. Anomaly detection approach using hybrid algorithm of data mining technique. In: IEEE 2017 International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE); 16-18 Jan. 2017; Khartoum, Sudan. USA: IEEE. pp. 1-7.
  • Elhamahmy ME, Elmahdy HN, Saroit IA. A new approach for evaluating intrusion detection system. CiiT International Journal of Artificial Intelligent Systems and Machine Learning, 2010; 2: 290-298.

Intrusion Detection Model Based on TF.IDF and C4.5 Algorithms

Year 2021, Volume: 24 Issue: 4, 1691 - 1698, 01.12.2021
https://doi.org/10.2339/politeknik.693221

Abstract

In recent years, the use of machine learning and data mining technologies has drawn researchers’ attention to new ways to improve the performance of Intrusion Detection Systems (IDS). These techniques have proven to be an effective method in distinguishing malicious network packets. One of the most challenging problems that researchers are faced with is the transformation of data into a form that can be handled effectively by Machine Learning Algorithms (MLA). In this paper, we present an IDS model based on the decision tree C4.5 algorithm with transforming simulated UNSW-NB15 dataset as a pre-processing operation. Our model uses Term Frequency.Inverse Document Frequency (TF.IDF) to convert data types to an acceptable and efficient form for machine learning to achieve high detection performance. The model has been tested with randomly selected 250000 records of the UNSW-NB15 dataset. Selected records have been grouped into various segment sizes, like 50, 500, 1000, and 5000 items. Each segment has been, further, grouped into two subsets of multi and binary class datasets. The performance of the Decision Tree C4.5 algorithm with Multilayer Perceptron (MLP) and Naive Bayes (NB) has been compared in Weka software. Our proposed method significantly has improved the accuracy of classifiers and decreased incorrectly detected instances. The increase in accuracy reflects the efficiency of transforming the dataset with TF.IDF of various segment sizes.

References

  • Yu Z. Intrusion Detection: A Machine Learning Approach (Volume 3). London, UK: Imperial College Press, 2011.
  • Armin J, Thompson B, Ariu D, Giacinto G, Roli F, Kijewski P. 2020 cybercrime economic costs: No measure no solution. In: 10th International Conference on Availability, Reliability and Security (ARES); 24-27 Aug. 2015; Toulouse, France. New York, NY, USA: IEEE. pp.701-710.
  • Bhattacharyya DK, Kalita JK. Network anomaly detection: A machine learning perspective. New York, NY, USA: CRC Press, 2013.
  • Katkar VD, Bhatia DS. Lightweight approach for detection of denial of service attacks using numeric to binary preprocessing. In: IEEE International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA); 4-5 April 2014; Mumbai, India. New York, NY, USA: IEEE. pp. 207-212.
  • Mehmood T, Rais H. Machine learning algorithms in context of intrusion detection. In: IEEE 2016 Computer and Information Sciences (ICCOINS), International Conference; 15-17 Aug. 2016; Kuala Lumpur, Malaysia. New York, NY, USA: IEEE. pp. 369-373.
  • Mane D, Pawar S. Anomaly based IDS using Backpropagation Neural Network. International Journal of Computer Applications 2016; 136: 29-34.
  • Deshmukh DH, Ghorpade T, Padiya P. Intrusion detection system by improved preprocessing methods and Naive Bayes classifier using NSL-KDD 99 Dataset. In: IEEE 2014 International Conference on Electronics and Communication Systems (ICECS); 13-14 Feb. 2014; Coimbatore, India. New York, NY, USA: IEEE. pp. 1-7.
  • Mogal DG, Ghungrad SR, Bhusare BB. NIDS using Machine Learning Classifiers on UNSW-NB15 and KDDCUP99 Datasets. International Journal of Advanced Research in Computer and Communication Engineering 2017; 6: 533-537.
  • Dadgar SMH, Araghi MS, Farahani MM. A novel text mining approach based on TF-IDF and Support Vector Machine for news classification. In: IEEE 2016 International Conference on Engineering and Technology (ICETECH); 17-18 March 2016; Coimbatore, India. New York, NY, USA: IEEE. pp. 112-116.
  • Manning CD, Raghavan P, Schütze H. An Introduction to Information Retrieval. England: Cambridge University Press, 2009.
  • Leskovec J, Rajaraman A, Ullman JD. Mining of massive datasets. 5th ed. England: Cambridge University Press, 2014.
  • Aggarwal CC. Data mining: the textbook. New York, USA: Springer, 2015.
  • Zaki MJ, Meira W. Data mining and analysis: fundamental concepts and algorithms. 1st ed. New York, USA: Cambridge University Press, 2014.
  • Moustafa N, Slay J. The significant features of the UNSW-NB15 and the KDD99 data sets for Network Intrusion Detection Systems. In: IEEE 2015 4th Intl Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS); 5-5 Nov. 2015; Kyoto, Japan. USA: IEEE. pp. 25-31.
  • Moustafa N, Slay J. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: IEEE 2015 Military Communications and Information Systems Conference (MilCIS); 10-12 Nov 2015; Canberra, Australia. USA: IEEE pp. 1-6.
  • Hssina B., Merbouha A, Ezzikouri H, Erritali M. A comparative study of decision tree ID3 and C4.5. International Journal of Advanced Computer Science and Applications 2014; 4; 13-19.
  • Witten IH, Frank E, Hall MA, Pal CJ. Data Mining: Practical machine learning tools and techniques. 4th ed. Morgan Kaufmann, 2016.
  • Kumar V, Wu X. The Top Ten Algorithms in Data Mining (Chapman & Hall/CRC Data Mining and Knowledge Discovery). 1st ed. Chapman and Hall/CRC, 2009.
  • RRevathy R, Lawrance L. Comparative Analysis of C4.5 and C5.0 Algorithms on Crop Pest Data. International Journal of Innovative Research in Computer and Communication Engineering 2017; 5: 50-58.
  • Choudhury S, Bhowal A. Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection. In: IEEE 2015 International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM); 6-8 May 2015; Chennai, India. USA: IEEE. pp. 89-95.
  • Garg T, Khurana SS. Comparison of classification techniques for intrusion detection dataset using WEKA. In: IEEE 2014 International Conference on Recent Advances and Innovations in Engineering (ICRAIE-2014); 09-11 May 2014; Jaipur, India. USA: IEEE. pp.1-5.
  • Gadal SM, Mokhtar R. Anomaly detection approach using hybrid algorithm of data mining technique. In: IEEE 2017 International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE); 16-18 Jan. 2017; Khartoum, Sudan. USA: IEEE. pp. 1-7.
  • Elhamahmy ME, Elmahdy HN, Saroit IA. A new approach for evaluating intrusion detection system. CiiT International Journal of Artificial Intelligent Systems and Machine Learning, 2010; 2: 290-298.
There are 23 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Research Article
Authors

Khaldoon Awadh This is me 0000-0001-6697-931X

Ayhan Akbaş 0000-0002-6425-104X

Publication Date December 1, 2021
Submission Date February 24, 2020
Published in Issue Year 2021 Volume: 24 Issue: 4

Cite

APA Awadh, K., & Akbaş, A. (2021). Intrusion Detection Model Based on TF.IDF and C4.5 Algorithms. Politeknik Dergisi, 24(4), 1691-1698. https://doi.org/10.2339/politeknik.693221
AMA Awadh K, Akbaş A. Intrusion Detection Model Based on TF.IDF and C4.5 Algorithms. Politeknik Dergisi. December 2021;24(4):1691-1698. doi:10.2339/politeknik.693221
Chicago Awadh, Khaldoon, and Ayhan Akbaş. “Intrusion Detection Model Based on TF.IDF and C4.5 Algorithms”. Politeknik Dergisi 24, no. 4 (December 2021): 1691-98. https://doi.org/10.2339/politeknik.693221.
EndNote Awadh K, Akbaş A (December 1, 2021) Intrusion Detection Model Based on TF.IDF and C4.5 Algorithms. Politeknik Dergisi 24 4 1691–1698.
IEEE K. Awadh and A. Akbaş, “Intrusion Detection Model Based on TF.IDF and C4.5 Algorithms”, Politeknik Dergisi, vol. 24, no. 4, pp. 1691–1698, 2021, doi: 10.2339/politeknik.693221.
ISNAD Awadh, Khaldoon - Akbaş, Ayhan. “Intrusion Detection Model Based on TF.IDF and C4.5 Algorithms”. Politeknik Dergisi 24/4 (December 2021), 1691-1698. https://doi.org/10.2339/politeknik.693221.
JAMA Awadh K, Akbaş A. Intrusion Detection Model Based on TF.IDF and C4.5 Algorithms. Politeknik Dergisi. 2021;24:1691–1698.
MLA Awadh, Khaldoon and Ayhan Akbaş. “Intrusion Detection Model Based on TF.IDF and C4.5 Algorithms”. Politeknik Dergisi, vol. 24, no. 4, 2021, pp. 1691-8, doi:10.2339/politeknik.693221.
Vancouver Awadh K, Akbaş A. Intrusion Detection Model Based on TF.IDF and C4.5 Algorithms. Politeknik Dergisi. 2021;24(4):1691-8.