Automatic Detection of Epilepsy Using EEG Energy and Frequency Bands

This paper demonstrates the effectiveness of information fusion at the feature vectors level for automatic detection of epilepsy. Experiments used features ranging from separate EEG frequency band waves to combinations of band waves, in addition to signal energy. We used three classifiers with the feature vectors: TreeBoost, Random Forests, and support vector machines. We carried out experiments using a real life EEG signals data set that is available from the University of Bonn Hospital in Germany. This paper shows the effect of combining together signal energy with different EEG frequency band waves in order to classify epilepsy, and that this combination has computed 97.5% accuracy over using feature vectors with fewer band wave transformations (84-95.5% accuracy), using the TreeBoost algorithm and 10 folds cross validation. This combination computed 99% specificity and 95.5% sensitivity. Furthermore, the paper demonstrates and analyses the effectiveness of using ensemble based tree learning.


Introduction
The electroencephalogram (EEG) has been used successfully in the literature to study brain disorder conditions that affect the brain's electrical activity.Epilepsy is characterized by excessive electrical discharges from brain cells.It is one of the conditions that predisposes patients to experiencing recurrent seizures.A seizure is a sudden uncontrolled, transient electrical brain activity that causes abnormal body movements EEG is the recording of the electric activity of the brain.Its compositions include the frequency of EEG sub-signal following frequency bands [1]:  Delta (less than 4 Hz)  Theta (4 to 8 Hz)  Alpha (8 to 12 Hz)  Beta (12 to 40 Hz) Waveform activity varies according to the function of the brain associated with the different tasks.For example, a recorded EEG signal during sleep has a higher percentage of long waves (delta and theta), while the shorter waves (alpha and beta) dominates in the awake time [1].In this paper, we study how to detect epilepsy automatically using computational intelligence (CI) techniques, viz.TreeBoost, Random Forest, and support vector machines (SVM).We start by analyzing a given EEG data set and composing useful features.Next, CI techniques are used to detect epilepsy.The rest of this paper is organized as follows: literature review is presented in Section II.computational intelligence techniques that are used in this paper are presented in section III.Feature composition is presented in section IV.The used dataset is described in Section V. Section VI discusses the computed results and Section VII concludes the paper.

Literature Survey
Different researchers addressed the problem of automatically detecting epilepsy using different approaches [2]- [9].A recent survey on the topic can be cited in [10].The proposed approaches used different types of features such as wavelet transform [11], [12]; Fourier transform [13]; or other feature extraction methods [2]- [9] .Elmahdy et al proposed the use of two types of features; viz.singular values (SVD) and classical features [3].They applied band pass filter to EEG signals as a preprocessing step.They used classical features such as total average power, delta band average power, variance and mean.The authors established a link between energy and singular value decomposition, such that they "sensed" abrupt changes in the EEG data (due to epileptic seizures) and that these changes affect the distribution of energy.They used SVM as a classifier and computed 94.82% accuracy.Tawfik et al proposed the use of Weighted Permutation Entropy (WPE) values of EEG signals as features [8].WPE measures the complexity and irregularity of a time series of a given signal.It consists of the ordinal pattern and the amplitude of its sample points.Their work was motivated by the fact that the entropy of EEG segments with seizures is lower than that in non-seizure EEG.The authors reported that the SVM classifier obtained better results compared to Artificial Neural Networks, with a 93.75% accuracy.A recent study was proposed by Samiee et al.They used a technique that utilizes rational decomposition of EEG signals and 1D LGBP for feature extraction [9].They extracted features by decomposing the EEG signal into 8 sparse rational components.
Then, they applied 1D LGBP operator.The authors used different classifiers; viz.Logistic regression, Random forests, and SVM.Sareen et al proposed a seizure alert system [6].They collected data from body sensors via patients' mobile phones with Bluetooth technology.They extracted features from EEG signals using the fast Walsh-Hadamard transform (FWHT).Higher order spectral analysis was used to reduce the extracted features.They used a Gaussian process as the classifier.The authors evaluated their technique using 50,000 EEG replicated data points of five patients with bootstrapping technique [14], and reported 85% accuracy, overall.Xun et al proposed a context-learning model to detect seizures [5].The authors used hidden and temporal features from EEG signal.They segmented EEG signals into several overlapped fragments with fixed length.Sparse auto-encoder [15] was used to extract the hidden inherent features from fragments.Each EEG fragment was translated to an EEG word.The temporal features were extracted by analyzing the context information of EEG words.A support vector machines classifier was used.The authors evaluated their technique using EEG data from the Children's Hospital Boston (CHB-MIT) dataset [16].The authors reported 22.93 % as an error rate.Murali et al used a low-power adaptive filter with recurrence quantification analysis (RQA) to classify EEG data [4].Notch, wavelet and adaptive filters were used to preprocess EEG data.EEG data was quantified by as RQA features, viz.determinism, average diagonal line length, entropy, laminarity and trapping time.The CHB-MIT dataset [16] was used to evaluate the technique.The authors reported 97.4 and 93.5% as sensitivity and specificity, respectively, and a 95% accuracy.Mihandoost et al used markov random fields to select features from discrete wavelet transform (DWT) [12].They used multilayer perceptron (MLP) neural network with three hidden layers.They reported 98.88 accuracy.Similarly, Guo et al used DWT with relative wavelet energy as input features to a feed-forward neural network [17].They used one hidden layer with 10 neurons and reported 95.2% accuracy.Fu et al utilized Hilbert-Huang transform (HHT) and SVM with radial basis function (RBF) kernel to predict epilepsy [2].They used statistical features from the histogram of HHT grayscale sub-images.The extracted features included mean, variance, skewness and kurtosis of pixel intensity.They computed 99.13% accuracy.Sharma and Pachori proposed features based on the phase space representation (PSR) to solve the problem [7].Least squares support vector machine (LS-SVM) was used for classification.They computed 98.67% accuracy.

Computational Intelligence Techniques
Different CI techniques were trained to classify EEG signals as normal or abnormal [2], [6], [7], [13], [18], [19].In this work, we used TreeBoost, Random Forests, and SVM classifiers.Friedman proposed the TreeBoost algorithm to improve the accuracy of models built on decision trees [20].Equation (1) describes the model mathematically: where V is the target value, F0 is the starting value for the series, X is a vector of "pseudo-residual" values, and T1(X),T2(X) represent the trees given to the pseudo-residuals.TreeBoost algorithm computesB1, B2,… etc. as coefficients of the tree node predicted values.TreeBoost consists of ensembles of many trees and does not require preselecting or transforming predictor variables and it is robust against outliers [21].A full TreeBoost series can have hundreds of trees.In this work, the series has 460 trees, and the maximum depth used for any tree in the series is 5.The Random forest classifier was developed by Breiman as an ensemble classifier with collection of decision trees [22].He combined bagging and random feature selection.During bagging, a tree is constructed using bootstrap sample of the training set.Random feature selection was utilized while a tree is grown on a new training set.The Random forest builds trees in parallel and uses voting to predict the target class.On the other hand, Treeboot creates a series of trees, each tree gradually contributes to the classification result [23].Support vector machines builds a hyperplane or a set of hyperplanes in a high-dimensional space [24].The best achieved separation by the hyperplane is the one with the largest distance to the nearest training data point from any category.SVM performs well in higher-dimensional spaces and copes with the problem of missing of data.In this work, the SVM model is built using an RBF kernel function.

Feature Composition
Given an EEG signal, we used a frequency band pass filter to compute filtered EEG waveforms.The frequency composition of the EEG signal include the following frequency bands [1]:  Beta [12 to 40Hz] Fig. 1 shows a raw EEG signal with the corresponding delta, theta, alpha, and beta filtered band waves.Data taken from these band forms will be selected to compose the feature vectors as will be described in Section 6.In addition to these features, one experiment uses energy of the raw EEG signal data in combination of the frequency bands, as will be discussed later.Given a signal x consisting of N points with duration T, signal energy is calculated using (2): A signal is called an energy signal if E is finite.

Dataset
In this work, we used EEG signals from the Department of Epileptology at the University Hospital of Bonn [25].The segments were recorded using an amplifier system with 128channel and were digitized using a sampling rate of 173.61 Hz and 12-bit A/D resolution.A band pass filter with 0.53-40 Hz (12 dB/octave) was used to filter the digitized segments.EEG of health conditions was obtained using standard surface electrode placement system (International 10-20 system).Fig. 2 shows the location of electrodes in the international system as standardized by the American Electroencephalographic Society [26].There are five subsets (A-E) with 100 single-channels, each, and EEG segments with 23.6 sec.duration.Segments in set D were recorded from the epileptogenic zone.On the other hand, segments in set C were recorded from the hippocampal complex of the opposite hemisphere of the brain.Segments in sets C and D contain seizure free intervals.However, set E contains seizure activity only.

Experimental Results
Different experiments have been conducted using TreeBoost, Random Forest and SVM to detect epilepsy in a given EEG signal.In these experiments, we used the feature vectors described in Section 4. Experiments used a variety of features, such as separate EEG filtered frequency band waves, or combinations of the filtered band waves.Also, the energy feature was used in all experiments, unless otherwise specified.We used the implementations of TreeBoost and SVM by the DTReg software for classification [23].Implementation of Random Forest in Weka was used [27].EEG wave forms; Delta, Theta, Alpha, and Beta; are used separately in different experiments to predict epilepsy (with the added energy feature value per signal).We used the data sets A, B, C, and E, described, above.The results of the experiments using separate wave bands are compared to 2 experiments that combine together all of the wave bands, either with or without the energy feature value, per signal.A combination of filtered band waves is used to form one vector.The combination is obtained by appending together the first quarter of samples of each wave form, i.e., 1024 data points.In addition, the energy value is appended to the combined signal parts as scalar feature values, thus computing a total of 4097 data points (feature vector values), per EEG filtered signal.Later on, we will refer to the complete set of feature vectors as "Combined."Classifiers' performance is measured by computing accuracy, sensitivity, and specificity [29].All experiments' results were evaluated using a 10 fold cross validation.Table 1 compares the computed performance results using the three classifiers.It is clear that the maximum accuracy rate computed by TreeBoost using any single frequency band feature was 93.25%.This rate improved to 95.5% when the 4 bands are combined together as features.A further improvement is computed if we add signal energy as a feature, with a 97.5% computed accuracy using TreeBoost.
Similarly, the maximum accuracy rate computed by Random Forests using any single frequency band feature was 88.75%.This rate improved to 95.25% when the 4 bands are combined as features.Similar to TreeBoost, adding signal energy as feature improved the result to 96% computed accuracy.
Using the SVM classifier, energy has shown improvement from 83.75% to 84.75% when all feature vectors were combined.Therefore, the experiments and analysis have shown the significance of signal energy as a feature and that it has a meaningful effect that enhances the results.These findings add to those of Elmahdy et al [3] in highlighting the significance of EEG signal energy.Fig. 7 further demonstrates the significance of the energy feature.
As can be seen from the figure, energy is a significant discriminatory feature for the normal (i.e., not ill) class.Table 2 compares our work to other works, as described in Section 2, above.We find that our results are comparable to the literature, even though we used untransformed data.Also, our technique involves a larg set of the EEG signal domain by including EEG band sets A, B, C, and E plus signal energy.It is worth mentioning that the cited research works have used a variety of datasets and therefore it would be difficult to do and exact comparison of results.[4] CHB-MIT 95.7 Samiee (2017) [9] CHB-MIT Sensitivity (91.13)Guo et al (2011) [17] A, E 95.2 Mihandoost et al (2012) [12] A, D, E 98.87 Fu et al (2014) [2] A, E 99.13 Sharma and Pachori (2015) [7] C, D, E 98.67 We did not use the band set D because it proved to be unstable when used in learning untransformed (i.e.raw) data.However, using it after doing a transformation has shown success in some works, as can be seen from Table 2.

Conclusion and Future Directions
In this paper, we have presented a novel application that uses the presented approach for automatic detection of epilepsy.Signal energy was used as an added feature that was combined with the delta, theta, alpha, and beta frequency band waves extracted from EEG signals.
It is notable that Treeboost computed the best accuracy results (97.5%), compared to Random Forests (96%), and SVM (84.75%).Intuitively, TreeBoost appears to outperform SVM because TreeBoost considers samples that were incorrectly classified by its previously trained tree(s), in order to enhance learning in the next tree level.Additionally, Random Forests fits many parallel independent trees against different samples of data such that the average error is computed and reduced.Both Treeboost and Random Forests use ensemble based tree learning in reducing error, and this property appears to be the reason why their results over performed SVM that does not have error reducing properties, especially that we have used untransformed (i.e.raw) frequency bands data.What is additionally interesting is that TreeBoost computed strong results in EEG classification problems in this paper as it has formerly did in one of our previous works on heart disease classification using electrocardiograms (ECG) [28]- [30].Therefore, TreeBoost appears to offer promise for biomedical signal research.Based on the results and this analysis, we find it promising to further the experimentation using the TreeBoost and Random Forest learning algorithms for the automatic detection of epilepsy.Moreover, this paper's results have shown that using untransformed signal data can be used to detect epilepsy, especially when used with ensemble based tree learning.This implies that computationally inexpensive methods can be used in biomedical signal analysis.For future work, we plan to advance the application and analysis of ensemble based tree learning in biomedical signal classification.Moreover we intend to improve the results of our experiments using further studies into signal energy and by looking into a mutual information-based approach to feature extraction.,

Fig 1 .
Fig 1. Raw signal (top) with the extracted 4 wave bands

Fig 2 .
Fig 2. The placement of electrodes in the international 10-20 system Fig. 3 shows a sample taken from the corresponding set.Sets A and B consist of segments taken from five healthy subjects.Volunteers were relaxed in an awake state with eyes open (A) and eyes closed (B), respectively.

Fig 3 .
Fig 3. EEG samples from the corresponding sets Signals in groups C, D, and E were recorded from five patients.Segments in set D were recorded from the epileptogenic zone.On the other hand, segments in set C were recorded from the hippocampal complex of the opposite hemisphere of the brain.Segments in sets C and D contain seizure free intervals.However, set E contains seizure activity only.

Fig 4 .
Fig 4. Classifiers performance comparison using Accuracy Figs. 4, 5 and 6 offer bar plots of the accuracy, specificity and sensitivity results, respectively.

Fig 7 .
Fig 7. Histogram of the value of energy feature for normal and abnormal cases

Table 1 .
Classifiers results computed using six types of feature vectors

Table 2 .
Performance comparison to literature