Non-linear Analysis of the Electroencephalogram in Alzheimer’s Disease by Means of Symbolic Sequence Decomposition Method

In this pilot study, a symbolic sequence decomposition method was used in conjunction with Shannon‟s entropy to investigate the changes in electroencephalogram signals of 11 patients with Alzheimer‟s disease and 11 age-matched control subjects. Results were statistically analysed by student t-test and later classified with receiver operating curves. Statistically significant differences between both groups were found at electrodes Fp1, O2, P3, T4 and T5. Sensitivity (defined as percentages of correctly classified patients) and specificity (defined as correctly classified controls) were evaluated using the receiver operating curves method. Accuracy of the methods was calculated according to sensitivity and specificity measures of electrodes showing statistically significant differences between the control group and Alzheimer‟s disease patients and ranged between 72.73-77.27%. These accuracy values were in agreement with previously published entropy studies on this data set. Although combining these methods did not provide any greater accuracy over previous findings, using a symbolic sequence decomposition method enhanced the data processing.


Introduction
Alzheimer"s disease (AD) is the most frequent cause of dementia in the western world and is caused by excessive amyloid deposition and accumulation of abnormal tau protein in the brain which affect the cognitive ability of the sufferer [1]. The clinical diagnosis of AD is made primarily on the basis of medical history, psychiatric evaluation and different memory and mental health tests [2]. However, an indisputable diagnosis is only possible post-mortem [3]. Symptoms of the disease vary from patient to patient along with the severity of the disease. Early diagnosis is crucial in terms of lessening the effects of the disease with available drug treatments or making necessary life change adjustments for the patient and care takers to ensure optimum quality of life [4]. The electroencephalogram (EEG) has been used as a tool in the investigation of dementia for several decades. EEG signals reflect brain electrical activity and can be non-invasively recorded with surface electrodes. AD is a cortical dementia and as such electrical abnormalities in the brain signals caused by AD can be captured with cranial surface electrodes [5]. Generally, EEG signals of AD patients show a shift to lower frequencies in spectral analysis which suggests a decreased cohesion of cognitive networks [6]. Moreover, AD patients" EEGs display less complexity and contain more regular patterns compared to those of control subjects [5,7,8].
Due to the intrinsic irregular and aperiodic nature of the EEG signal, spectral analyses techniques might not be sufficient to characterise the dynamics of the events underlying the EEG signals. Thus, additional techniques, such as non-linear time series analysis techniques were applied to provide a better understanding of EEG signals [9]. Correlation dimension (D2) is used to identify the complexity of a system [10]. AD patients" D2 measures show decreased values indicating less complex dynamics of neural networks in the brain, possibly due to the loss of neurons and synapses [11]. The Lyapunov exponents (L1) have been used to characterise nonlinear behaviour of a system and can be seen as a measure of unpredictability. AD patients" EEGs have lower L1 values describing a more reliable or regular signal, which could suggest information processing, is less flexible in diseased brain [12]. Nevertheless, the amount of data required for meaningful results with these methods is very high. Also, a detailed signal conditioning is necessary in order to apply algorithms for both analysis methods [9]. Therefore, in this pilot study a new method has been applied on AD EEGs which is combining two non-linear analysis methods, i.e., symbolic sequence decomposition and Shannon"s entropy measures. Symbolic dynamical analysis is a family of non-linear signal processing techniques which investigate a signal in small, discrete time dynamics which relate to portions of the original signal. Symbolic sequence decomposition takes a finite number of samples and reforms a symbol series out of the original sample series depending on the value of the original sample relative to a threshold value of the whole sample series [13]. The overall process ensures an approximate analysis of a complex biological system [14,15]. Entropy studies on the other hand, analyse the randomness or predictability of systems. First used as a thermodynamics term, in biomedical engineering, Shannon"s entropy identifies the amount of information within the biological signal where greater entropy indicates more information than _______________________________________________________________________________________________________________________________________________________________ lower entropy within a signal. This feature can be interpreted as regularity when the entropy is low and as complexity in parallel with a higher entropy value [16]. The current paper is organised as follows. Section 2 describes selection of AD patients and controls, as well as the process for data collection. The symbolic sequence decomposition and statistical analyses are also defined. The results are presented in Section 3. Finally, Section 4 contains a discussion of the results and conclusions.

EEG Signal Database
The database consisted of 22 subjects, 11 AD patients (5 men and 6 women, 72.5 ± 8.3 years (mean ± standard deviation (SD)) and 11 age-matched controls (7 men and 4 women 72.8 ± 6.1 years mean ± SD). All patients were diagnosed with AD after a detailed medical test and a Mini-Mental State Examination (MMSE) which evaluates cognitive impairment [4,17]. The MMSE scores, which identifies cognitive impairment, were 13.1 ± 5.9 and 30.0 ± 0 (mean ± SD) respectively. The data collection and evaluation of the signals received ethical approval along with the permission of care takers of patients. Signals were recorded with a sampling frequency of 256 Hz in a resting but awake state with eyes closed using the international 10-20 electrode placement system (electrodes Fp1, Fp2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T3, T4, T5, T6, Fz, Cz and Pz). Recordings were carried over five minutes in order to reduce artefacts on the EEG recordings. Five minute recordings were sufficient to collect EEGs without occurrence of sleep. Signals were then checked by a specialist to eliminate signals contaminated with muscle movement and cardiac signal artefacts. For each subject, artefact-free 5-sec epochs (1280 data points) were selected and contributed to further analyses. An average of 30.0 ± 12.5 (mean ± SD) epochs was selected for each electrode of each subject.

Symbolic Sequence Decomposition
Symbolic sequence decomposition is a method which can be used to convert a raw series into a symbolic series. This approach provides the advantage to create more numerical computations since these discrete symbolic time series are represented in binary codes [15]. Each region is associated with a unique symbol and these symbols are involved in creating another series of data, called the symbol series, out of the original series depending on the region which the original value falls into [14]. First, the threshold value for partitioning is calculated. In the median technique, the threshold is the statistically calculated median value. By using this median value, data series were separated into two parts, i.e., partitioned into two different regions. Time series" value was defined in binary codes, "0" being in region 1, values which are less than the threshold value and "1" being in region 2 for values higher than the threshold. After the symbolisation was performed, a two-step template window was slid over the symbols in order to create code series which allows further extraction of additional information. Template size and window sliding steps are two main components to consider when extracting information from the data series. To take into account the general trend of the EEG signal, slide step and window size can be selected equally to avoid overlapping [14]. On the other hand, local changes are captured in small step sizes. However, small sliding steps affect reliability and smoothness of the probability density function (PDF) of the time series [16]. In the current study, a two-step template with a single step windowing was performed on the time series to obtain code series in order to identify local events without affecting the PDF of data series.

Shannon's Entropy
Shannon"s entropy (SE) was first introduced by Claude Shannon [18] as a communication technology application. It quantifies the amount of information that is carried within a signal, i.e., the significance of the signal is extracted from a group of data with relevance to its statistical mechanics [19]. Let us assume X being an arbitrary variable taking values on a set (a 1 , a 2 , …, a m ) with probability; Then, SE of X can be defined as follows; H(X) is the entropy value of a random variable X or the entropy of the probability distribution (p 1 , … , p m ). The sum of all these probability distributions is equal to 1.
H s stands for the normalised Shannon"s entropy value and is ranged between 0-1 where values which are approximately 1 represent an irregularity whereas 0 represent regularities within the data set [20]. When the randomness of the spectral distribution is relatively flat, lower entropy values were evaluated. N obs (equation 3) is the number of possible sequences in the code series. Because a two-step window was used when creating codeseries, this number is equal to 4 in our calculations. p i values are thus p 1 , p 2 , p 3 , p 4 . There are four possible sequences for a two size window template where, p 1 is the probability of "00" sequence within the code-series; p 2 is the probability of "01"; p 3 is the probability of "10" and p 4 is the probability of "11". The total number of samples is determined by the length of the epoch and sampling rate of the data set (i.e., 1280 bits for each subject and electrode).

Receiver Operating Curves
The receiver operating curves (ROC) gives a graphical representation of the accuracy of a statistical test. In this study, ROC was used to validate student"s t-test analysis performed on our two sets of data (AD patients vs. age-matched controls). Sensitivity and specificity of a given data set is sketched to show number of true positives and true negatives. These numbers are usually expressed by percentages and highly depend on the Type I and Type II errors [21]. Thus, percentages of correctly identified patients, called as sensitivity, and the percentage of correctly identified controls, also known as specificity, are two important components of this method. Incorrect identification of patients and controls lead to Type I and II errors mentioned earlier.
True positives are abbreviated as TP and false positives as FP.
Values, FP are equal to incorrectly classified negatives and N is number of total negatives, which includes FP and correctly classified controls. TP is correctly classified positives and P is the total number of positives, which includes TP and incorrectly identified controls. Both N and P values are equal to 11 in this study with 11 AD patients and 11 age-matched controls. Sensitivity is calculated as the percentage of true positives among true positives and false negatives and equals to fp rate (equation 4). Specificity is evaluated as the percentage of true negatives among true negatives and false positives and equals to tp rate (equation 5). Once these percentages are calculated, accuracy can be established as the ratio of the number of correctly identified subjects (both AD and controls) and the total number of subjects, which is 22 (11 AD patients and 11 age-matched controls).

Results
SE values of each electrode are listed in Table 1. Statistically significant electrodes (p<0.01) marked with asterisk. Box plots of the mean SE values for statistically significantly electrodes can be seen in Figure 1. Sensitivity values (true positives, correctly identified AD patients) against specificity (true negatives, correctly identified healthy controls) were calculated using an online software programme called MedCalc. Accuracies of statistically significant electrodes were calculated according to these sensitivity and specificity values. Table 2 summarises the accuracy values calculated for statistically significant electrodes. The threshold value is the mean SE value selected for electrodes Fp1, O2, P3, T4 and T5 respectively. Sensitivity and specificity percentages of this particular threshold value provide the accuracy of the electrode.

Discussions and Conclusions
In this study, symbolic sequence decomposition method was used together with Shannon"s entropy in order to investigate EEG changes caused by Alzheimer"s disease compared to age-matched controls. O2, P3, T4 and T5 have been frequently reported as significant differences showing electrodes between AD patients and control subjects [8,20]. In particular, P3 electrode was the only statistically significant electrode which is common for all entropy studies performed using this data set. Except for electrode Fp1, all other electrodes which were statistically different, were located on the posterior side of the skull. Sensitivity of Fp1 is observed as 100% which is higher than the only value for the same electrode of 90.91% sensitivity of a previous work by Abásolo [7,8]. O2 accuracy was at 100% which is higher than any other entropy measure previously mentioned by Abásolo and collaborators [8]. P3 has a greater sensitivity together with the other two electrodes, compared to the previous database studies. On the other hand, specificity values of these five electrodes were lower than any other value evaluated before. Our results showed similar features to previous studies. Statistically significant electrodes were situated at the posterior part of the skull for Sample entropy, Spectral entropy, Approximate entropy and Lempel-Ziv complexity measures [6,7,8]. Statistically significant electrodes of SE were observed for the same brain regions. To conclude, the symbolic sequence decomposition supports current knowledge about the effects of Alzheimer"s disease on the EEG. While this method did not provide any improvement over other entropy techniques in terms of the possible use of it as a diagnostics tool, it proved to be relatively faster, hence can be used as a part of an analysis method to characterize EEG in other cerebral disorders. Also a future work on the statistically significant electrodes" locations can be conducted since it may be a beneficial indicator of a cerebral disorder.