RULE GENERATION BASED ON MODIFIED CUTTLEFISH ALGORITHM FOR INTRUSION DETECTION SYSTEM

Nowadays, with the rapid prevalence of networked machines and Internet technologies, intrusion detection systems are increasingly in demand. Consequently, numerous illicit activities by external and internal attackers need to be detected. Thus, earlier detection of such activities is necessary for protecting data and information. In this paper, we investigated the use of the Cuttlefish optimization algorithm as a new rule generation method for the classification task to deal with the intrusion detection problem. The effectiveness of the proposed method was tested using KDD Cup 99 dataset based on different evaluation methods. The obtained results were also compared with the results obtained by some classical well-known algorithms namely Decision Tree (DT), Naïve Bayes (NB), Support Vector Machine (SVM), and K-Nearest Neighborhood (K-NN). Our experimental results showed that the proposed method demonstrates a good classification performance and provides significantly preferable results when compared with the performance of other traditional algorithms. The proposed method produced 93.9%, 92.2%, and 94.7% in terms of precision, recall, and area under curve, respectively.


INTRODUCTION
Extensive use of the Internet and data sharing on the web led the security of the networks to become a challenging issue (Duric, 2014;Zhang, 2020). An intrusion detection system (IDS) is one of the most important methods used to strengthen the security of the web and help the computer systems on how to deal with attacks (Khraisat, 2019). Intrusion is defined as an illicit attempt to access the computer systems, and IDS is a critical technology in terms of software and hardware that automates the procedure of monitoring and analyzing of illegal events. It is known as one of the best suitable methods to prevent and reveal such attacks (Jose, 2018;Vancea, 2014).
Knowledge Discovery in Databases (KDD) is defined as the operation of extracting patterns and models from large databases. Data mining is often used as a synonym for the KDD process, and it refers to the process of applying the discovery algorithm to the data (Schuh, 2019). One of

CUTTLEFISH OPTIMIZATION ALGORITHM (CFA)
Cuttlefish algorithm (CFA) is a new bio-inspired optimization method, which was proposed in 2013 by (Eesa, 2013) and was successfully used as a possible alternative tool for global optimization problems (Eesa, 2014), dimensionality reduction (Arshak & Eesa, 2018) and clustering problems (Eesa & Orman, 2019). This algorithm simulates the process of light reflection through the three skin layers of a cuttlefish, including iridophores, chromatophores, and leucophores. The interaction between these three layers through the six cases are shown in Figure  1 that allows the cuttlefish to produce complex patterns and colors.
There are two main processes considered for this algorithm. The first process is called the reflection, which mimics the light reflection, and the second process is called the visibility that simulates the matching patterns. These two processes are formulated in (1) to calculate the new solution.
The formulas for the interaction between the three layers of cells in six cases are described as follows: For case 1 and 2: [ ]

Figure 1: Six cases of cells interaction used by the cuttlefish
where R and V are random variables with the values varying between (-1, 1), S1 is a subset of the solutions, i is the i th element in S1, j is the j th point in the element i, and Best_points denotes the best solution points. For these two cases, the value of R is generated when V is set to 1.

For case 3 and 4:
where R is equal to 1, and V is generated randomly.

For case 5:
where Best_points is the optimal solution, and AV_best is the average of all optimal solutions. In this case, the value of R is generated, and V is set to 1.

For case 6:
where random is a random value between 0 and 1, U and L are the upper and lower limits of the problem domain. The algorithm divides the population into four subsets as S1, S2, S3, and S4. (2) and (3) in cases 1 and 2 are formulated to be used for the first subset of cells S1 whereas (4) and (5) in cases 3 and 4, (6) and (7) in case 5 and (8) in case 6 are considered for S2, S3, and S4, respectively. The main steps of CFA are expressed in Figure 2 as follows.

Figure 2:
The main steps of the CFA Initialize the population P with random solutions using Equation (8). Compute and keep both: the best solution and the average of the best solutions. While (error > ɛ and the number of iteration is not meet) do Begin For each s in S1 do Begin Compute a new solution by applying case 1 and 2 using Equations (2)

THE PROPOSED CFA FOR RULE GENERATOR
This paper aims to use the CFA as a rule generator for IDS problem. The generated rules are then used to classify the instances to one of the five class labels in the KDD-Cup-99 dataset: Normal, Dos, Probing, R2L, and U2R (Tavallaee, 2009). First, the training dataset is divided into five groups according to the number of class labels in the KDD-Cup-99. Then, for each feature in each group, the maximum and the minimum values are calculated to produce the two vectors Maxc [N] and Minc [N], where c = 1, 2, …, C and C is the number of classes, and N presents the number of features in each sample. In the training stage, each newly generated rule (Upper and Lower) for each group of training data at each step of the CFA including the initialization process, the newly generated rule is tested using the training data set as follows. If any sample belonging to the group (i) is satisfied by the newly generated rule, then remove this sample from the group (i) and recalculate the Max and Min vectors. This process is repeated for all groups until all samples are removed. The initialization process and the work of the CFA are described in the following sections.

The population P[M] is initialized with
where Max[n] is the maximum, and Min[n] is the minimum values of feature n, random is a random number to be generated between (0, 1). In the original CFA, the population is divided into four subsets; however, in this study, the population is divided into three subsets S1, S2, and S3, because case 5 is used for the pruning rule, thus we only need three subsets in our modified CFA. After the initialization step, the processes of the six cases of the CFA are applied as follows.

Application of case 1 and 2 on S1
In cases 1 and 2, the light reflection process occurs because of the association between the chromatophores layer and the iridophores layer, and they are used for the global search. In the original CFA, R simulates the stretching process of saccule while V simulates the last view of the matched pattern. In order to use the CFA as a rule generator for the IDS classification problem, (2) and (3) which have been used to find the reflection and the visibility in each element in S1 are modified as given below: where Max [n] and Min[n] are defined for feature n, record[n] is any random value of feature n selected randomly from the training dataset. R is 1 and V is generated randomly between the interval (0, 1), with 0.2 probability newUpper and newLower values of new solutions are calculated using (1) as follows: where i = 1, 2, …, S1. Size.
While, with the probability of 0.8, the other cases are used to produce the new solution. Sometimes, the value of the newly generated rule (newUpper[n] or newLower[n]) is out of range.
In this case, any selected random value from feature[n] can be satisfied.

Application of case 3 and 4 on S2
The iridophores are the reflective cells. They reflect light for the concealment the organs, which means that the outgoing light must be close to the environment. Therefore, the incoming light is displayed as a feature value and is revised with a small difference. The simulation of this process is reformulated in (18)-(21) as follows: where Max[n] and Min[n] are the respectively the max and the min values of feature n, record is an instance that is selected randomly from the training dataset. The R value is equal to 1, but the V is generated randomly from the interval (0, 1). Then the new newUpper and newLower are calculated using (1) as follows: where i = 1, 2, …, S2.

Application of case 6 on S3
In the original CFA, case 5 is used before case 6. In this study, case 5 is used for the rule pruning process, which is described in section 3.5. CFA uses case 6 as the global search; hence any random solution is satisfactory. In this study, the same Equations (9), (10), and (11) that are used in the initialization process are reused here for S3.

Application of case 5 for rule pruning
Rule pruning is an important task to increase the accuracy of the model and enhancing the quality of the produced rule itself. It can be used to remove irrelevant information from the rule. The objective of rule pruning is to evacuate redundant or unnecessary features from the dataset, which may negatively affect the results and the performance of the model. The study of (Eesa, 2015) is based on using CFA for feature selection. The authors have successfully used case 5 of CFA to remove one feature at a time and evaluate the remaining features. If the remaining features produce some better results, they are kept, and another feature is removed. This procedure is repeated until all features are examined; hence the most relevant features are selected. In this study, we have used the same method. At each time a sub-rule is removed from the current rules, then the remaining sub-rules will be tested using the training dataset. If the remaining rules are produced a better result, then another sub-rule is removed and so on until all sub-rules are examined. For more description, consider a vector called Flag with the size of N is selected, and N represents all features considered in the training dataset, and let x to be a rule that belongs to class c. At each time the rule of feature n which is represented by x.Upper[n]  To assess the quality of the produced rules, the following fitness method is used: where TP and TN indicate the quantity of true positive and true negative instances which are classified effectively, whereas FP and FN indicate the number of instances that are incorrectly classified as a false positive and false negative, respectively (Khraisat, 2019; Sumaiya Thaseen & Aswani Kumar, 2017). The classification process will be described next in Section 3.6. The general steps of the rule pruning process are described in the procedure shown in Figure 3.

Classification using the generated rules
After applying the rule pruning process and removing the unnecessary sub-rules, the pruned rules are used to classify each instance in the testing data to one of the five class labels in the KDD-Cup-99-dataset: Normal, Dos, Probing, U2R, and R2L. The classification process works as follows: If all features' values of record r are covered by the rule x of class c so that all values are between the x.Upper and x.Lower, then r is classified as class c. However, this is not always the case, as sometimes one instance in the testing data may be involved by more than one rule for various classes. In such a case, the bias-value is calculated for all the covered rules. Then these values are accumulated according to different possible classes. The class with the greatest biasvalue is chosen to be the true predicted class. The calculation of bias-value is formulated in (25).
where a and b are two weighted values to be determined by the user. The value of a is between (0, 1), while the value of b is equal to (1-a). In this study, the values of a and b are set to 0.5. The value of the fitness metric is calculated using (24), and for each covering rule belonging to class c, accuracy is calculated using equation (26),

EXPERIMENTAL SETUP
To assess the efficiency of the proposed method, we have compared its performance with four traditional classifiers: DT, SVM, K-NN, and NB. The proposed method is experimentally assessed using the KDD-Cup-1999 dataset, which is obtained from the UCI machine learning repository (UCI Machine Learning Repository, 2015). The experiments are performed in order to present that our proposed method is generally feasible to be used for rule generation and hence it can be used as a new classifier for IDS.

Data preparation
The "10%KDD-Cup-99" is a very popular dataset commonly used for benchmarking intrusion detection problems (Tavallaee et al., 2009). The dataset contains 494,020 training and 311,028 testing connection records (Eesa, 2015). Each record contains 41 independent features, and it is labelled as one of the five classes considered in the "KDD-Cup-99" dataset: Normal, Dos, Probing, R2L and U2R. Where Dos (denial of service) is a type of attack that causes some computing or memory resource to be busy or too full to handle legitimate requests. Probing is a class of attacks where an attacker scans a network to gather information or find known vulnerabilities. R2L (Remote to Local) is a class of attacks where an attacker sends packets to a machine over a network, then exploits the machine vulnerability to illegally gain local access as a user. While with the U2R (User to Root) attack, a normal account is used by an attacker to login into the system of a victim and tries to gain administrator privileges by exploiting someone vulnerability in the victim. However, this dataset is too big to be used in such experiments. Therefore, the tra ining and the testing data are chosen randomly from the 10%KDD-Cup-99 dataset to be utilized for our experiment. Table 1 describes the amount of each attack class in the chosen training and testing subsets. In order to keep the same proportion of data, eac h attack is divided by 100 (Eesa, 2015) in the training and testing datasets. In Table 1, Psweep (12,3) means that this attack has 12 attacks in the training set and 3 attacks in the testing set. In our study, all categorical values in the datasets are converted to numerical values. For example, the protocol_type attribute consists of three categorical values (tcp, udp, icmp), and these values are converted to (10, 20, and 30), respectively. For instance, if an attribute consists of 100 categorical values, these values are converted to (10,20,30, …, 1000), respectively.

Evaluation
In order to assess the effectiveness of the proposed method using CFA classification model, five well-known metrics are used in our evaluation process; namely "True Positive Rate" (TPR), "False Positive Rate" (FPR), Precision, Recall, and Area Under the Curve (AUC) (Jiao & Du, 2016). Then, the efficiency of the proposed CFA method is compared with four well-known techniques in Weka (Hall et al., 2009), namely DT, K-NN, SVM, and NB. The formulas of the five-evaluation metrics are stated below: where i = 1, 2, …, C, and C is the number of classes.

EXPERIMENTAL RESULTS
The proposed method is implemented using C# language within the Microsoft Visual Studio environment. The population size is set to 10. First, the validation of the proposed model is tested for 10 independent runs. Table 2 describes the obtained results in terms of TPR metric for each run. It can be noticed that our proposed CFA classification method has successfully classified the KDD-Cup-99 data, and it obtains a good result where TPR is varied between 91.24 and 92.71, and the average overall 10 independent runs is equal to 92.203.  Table 3 illustrates the comparison results of the proposed method with the other four techniques DT, K-NN, SVM, and NB. The comparison results based on the for metrics (FPR, Precision, Recall, AUC) are detailed in Table 3 and graphically shown in Figures 4 and 5. All the reported performance results of the proposed method in Table 3 are averaged over 10 independent runs. From Table 3 and Figures 4 and 5, it can be seen clearly that the new CFA method has provided better results than all other classification methods in terms of Precision, Recall, and AUC with a lower FPR. Since the AUC metric is commonly used to distinguish the performance of more than one model (Tharwat, 2018), we can conclude that our proposed method was successful in terms of this evaluation metric, as illustrated obviously in Figure 5. Besides, in order to further investigate the efficiency and performance of the newly proposed CFA method, we compared the obtained results with our previews work (Eesa, 2015). Table 4 illustrates the comparative results in terms of TPR evaluation metric. Results in Table 4 present that the newly proposed method has performed better than our previous study [12] in terms of TPR, even when using different numbers of features. For instance, although the previous method has provided the highest TPR of 92.051, our new method can provide higher TPR than that without using feature selection. These results suggest that even without using any feature selection technique, the newly proposed method performs better.
From the obtained results, we believe that CFA can be used as an alternative t ool for data mining in the IDS domain. The proposed CFA method has been tested many times in many different experiments, and it has provided the same results with (±0.2) errors which present the robustness and the stability of the proposed model.

CONCLUSION AND FUTURE WORK
In this paper, we investigated the use of the modified CFA for IDS as a rule generation tool. The CFA was modified to generate a set of rules for each class considered in the dataset. One of the fundamental features of the CFA method is its simplicity as the generated rules are only represented by two vectors Upper and Lower, and they can easily be used for the classification task. In order to check the efficiency of the proposed method, we used the "KDD-Cup-99" dataset. The obtained results were promising and showed the robustness and effectiveness of the proposed method. The achieved results were assessed utilizing the five-performance metrics TPR, FPR, Precision, Recall, and AUC. Experimental results also demonstrated that the new CFA offers a very competitive method in comparison with many traditional classification methods. During the experiments, we observed that the proposed method was time-consuming to find rules. The execution time for training and testing processes for each run took about 20 seconds while the running times for DT, SVM, K-NN and NB were 0.3, 1, 6, 0.4 seconds, respectively. This limitation can be considered as future work to be further investigated. In addition, the proposed CFA data mining method can be utilized to address the classification problems in different domains. Based on the above analysis, excluding the execution time, we conclude that CFA is a considerable potential rule-generation tool.