With the increasing number of network users, intrusion detection systems (IDS) have become a critical area of focus. The deployment of machine learning (ML)-based systems is crucial due to their ability to learn from data. However, the network data often contains both numerical and categorical features. This presents a significant challenge as some ML algorithms, such as Support Vector Machine (SVM) and k-Nearest Neighbour (kNN), require encoding before using categorical features. Here, we investigate the impact of One-Hot Encoding (OHE) on the classification performance and time complexity of ML algorithms, including Decision Trees (DTs) (which accept categorical features), SVM, kNN, and others. In this study, intrusion datasets such as NSLKDD and UNSWNB15, which contain categorical features, are used. The performance of DTs and other classifiers was compared on encoded and unencoded datasets. Our findings are: (1) OHE can improve the classification performance of DT classifiers, and it does not negatively affect DT classifiers. However, OHE increases the time complexity due to increased dimensionality; (2) comparing the performance of DT with other classifiers showed that DT achieve a comparable performance with less time complexity. (3) OHE can help to transform complex categorical features to eliminate irrelevant categories. The results of this experiment are presented to visualise the importance of the properties of DTs. This study shows that DTs are promising in developing time-efficient and accurate IDS.
| Primary Language | English |
|---|---|
| Subjects | Artificial Intelligence (Other) |
| Journal Section | Research Article |
| Authors | |
| Submission Date | August 16, 2024 |
| Acceptance Date | November 16, 2025 |
| Publication Date | December 31, 2025 |
| Published in Issue | Year 2025 Volume: 12 Issue: 4 |
Hittite Journal of Science and Engineering is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY NC).