Patient Specific Congestive Heart Failure Detection From Raw ECG signal

In this study; in order to diagnose congestive heart failure (CHF) patients, non-linear second-order difference plot (SODP) obtained from raw 256 Hz sampled frequency and windowed record with different time of ECG records are used. All of the data rows are labelled with their belongings to classify much more realistically. SODPs are divided into different radius of quadrant regions and numbers of the points fall in the quadrants are computed in order to extract feature vectors. Fisher's linear discriminant, Naive Bayes, Radial basis function, and artificial neural network are used as classifier. The results are considered in two step validation methods as general k-fold cross-validation and patient based cross-validation. As a result, it is shown that using neural network classifier with features obtained from SODP, the constructed system could distinguish normal and CHF patients with 100% accuracy rate. Keywords


Introduction
Heart failure is a stage when the heart unable to pump sufficient amount of blood that tissues need or just able to perform this with high filling pressures (Braunwald E, Zipes DP, Libby P 2004).Heart failure is a worldwide common disease affecting approximately 15 million people in the world.Its incidence increases with age.While there is 1-2% incidence between 50 and 60 year age group, it reaches 10% over the age of 75.Average 80% of all congestive heart failure (CHF) are seen in people over 65 years.0.3% of men and 0.2% of women aged between 50 and 59 and 2.7% of man and 2.2% of women aged between 80 and 89 are exposed to CHF.The male / female ratio is calculated as 1 / 3.Each year in the United Volume 1, No. 1, 32-42, 2016 States, causes of approximately 45000 patients death are declared as heart failure, and with each passing year, this number is increasing due to population aging and rising rates of cardiovascular disease-free survival.In addition, both medical expenses and loss of manpower negatively impacts on the economy (Topol & Califf 2007;Yayla 2010).Heart failure is a disease that could not be assessed easily in clinically.Early diagnosis and effective treatment of heart failure are obviously important for patients, and reduce the percentage of mortality and morbidity (Yayla 2010;Işler & Kuntalp 2007).Due to the fact that, appropriate way for treatment is possible if congestive heart failure disease is early diagnosed, automatic determination of CHF disease from ECG recordings is clinically very important.This study is an accomplished attempt to distinguish normal and CHF patients with a reliable classification system.Heart disease analysis is performed by Physical examination.However, definitive diagnosis of CHF could not be performed with Physical examination.So, in patients with suspected congestive heart failure, one or more diagnostic tests such as echocardiography, angiography, electrocardiography, chest X-ray film, brain (B-type) natriuretic peptide and stylistic trailblazer hormone, the N-terminal fragment, MR imaging are applied before making a decision (Yayla 2010;Işler & Kuntalp 2007;Kamath 2012a).In the literature, recently, by applying different signal processing and analysis methods, it is tried to advance effective computer-aided diagnosis methods (Işler & Kuntalp 2007;Kamath 2012b;Kannathal et al. 2006;Thuraisingham 2010;Karmakar et al. 2009;D. 2009;Engin 2007;Maurice E.Cohen 1996;Dabanloo et al. 2010;Işler Y n.d.).Some of them considered Heart Rate Variability (HRV) of ECG signals (Işler & Kuntalp 2007;Kamath 2012b;Işler Y n.d.;Kannathal et al. 2006;Thuraisingham 2010).In both cases, hidden important information (such as wavelet, eigenvector methods, Poincare, RR interval, etc.) in the ECG record are revealed and used as feature vectors for detecting CHF patients.Işler and Kuntalp (Işler & Kuntalp 2007) used HRV to analyse CHF disease and they used many techniques to extract features such as time frequency domain features, frequency domain features, point care plot features, and patient information.Kamath (Kamath 2012b) reported the central tendency measure of the R-R interval and showed the efficacy of redial distance of the Teager energy scatter plot to distinguish CHF from normal subjects.A measure of complex correlation to quantify temporal variability in the Poincare plot was introduced by Karmakar et al. (Karmakar et al. 2009).Thuraisingham (Thuraisingham 2010) used RR interval to analyse SODP of RR intervals.And he introduced a classification system which employs a statistical procedure.Cohen et al. (Maurice E.Cohen 1996) analysed SODPs and Central tendency measure using CHF and HRV data.Some of them considered beat based methods (Kannathal et al. 2006;Karmakar et al. 2009;D. 2009;Engin 2007).Furthermore, different methods (such as expert system, KNN, fuzzy, neuro-fuzzy, etc.) have been used as classifier.Poincare plot and the SODP are commonly used methods as non-linear analysis of the components in biomedical signals (Işler Y n.d.;Thuraisingham 2010;Karmakar et al. 2009;Maurice E.Cohen 1996;Dabanloo et al. 2010).It is also known that poincare plot is an example of chaotic system (Maurice E. Cohen 1996).While poincare map reveals the relations of the consecutive point with each other, SODP reveals the relationship of consecutive difference values with each other (Kamath 2012b;Thuraisingham 2010;Maurice E.Cohen 1996).When researcher used RR interval series to obtain features, they constructed the system by using R-R interval obtained holter record.It was long-term record.Therefore many researchers still try to increase their success and reliability by developing new approaches.They try to find less computational complexity and much useful system needed less data information.In this study, the goal is to construct a system that could be worked in real time and fast and high performance.So, whole new meaningful features are tried to extract from SODP measurement using raw ECG record in order to distinguish CHF and normal patients.And it is focused two objectives that to show reliability of sampling frequency and window size of pattern in analysis.In present study, the ECG pattern with CHF and normal ECG records are achieved from the Physiobank database.Neural Network, Naive Bayes, and Linear Discriminant algorithm are used to construct classification system.Then the results are considered in two step validation methods as general k-fold cross-validation and patient based cross-validation.In the following section, the data acquisition, pre-processing steps, method of feature extraction, the classifiers and performance measures of classifier are presented.In Section 3, the results and discussion are given.The conclusion of the study is presented in Section 4.

Material and methods
This section introduced the data acquisition, process of data preparation and feature extraction methods used in the proposed recognition system.Figure 1 shows the general block diagram of the constructed system.and Baseline wander effects because of respiration.Baseline wander and Power line interference consist of low frequency components and high frequency components, respectively.The records are filtered with two median filters to eliminate the baseline wander and with a notch filter to eliminate power-line frequency (Suri et al. 2007).In all other processing, the filtered signals are used.In order to examine amount of information in a window time, the records are divided into four different window time that are 10 sec, 7 sec, 5 sec and 3 sec.for 10 sec of window time, 150 windowed sample are extracted from record.Total of 4950 windowed samples are extracted for analyse.Similarly, for other window time, windowed samples are extracted from record.The samples of windowed records are shown in Figure 2. The normalization process is applied by dividing each beat to absolute maximum value (Duda & Hart n.d.;Bishop & others 1995) for each sample, before extracting features from raw ECG data records.

Second-Order Difference Plot (SODP)
SODP is a feature extraction method that obtains form time domain information.Implementation of SODP is very easy.The method of SODP is used both to provide independent feature extraction tools and to be used as complementary method to verify the frequency domain results (Thuraisingham 2010; Maurice E. Cohen 1996) .SODP has meaningful information about ECG (Kamath 2012b;Thuraisingham 2010).If X(t) is the ECG signal, SODP is formed by [X (t+1)-X (t)], and [X (t+2)-X (t+1)] points on the plot.In other words, SODP includes scattering of consecutive difference values of points in ECG signal.Thus, the statistical situation of consecutive differences can be observed.Figure 3 shows the Second-order difference plot of CHF and Normal ECG signals.
Figure 3.A sample of the Second-order difference plot for (a) CHF and (b) Normal ECG signal

Feature Extraction
Feature is information of the pattern.Therefore, the information is tried to extract from second-order difference plot of the CHF and Normal ECG record.The SODP is used to extract information about signals.The SODP is a figure of two-dimensional Cartesian system.The axes of a two-dimensional Cartesian system divide the quadrants which are four infinite regions numbered from 1st to 4th, each bounded by two half-axes.The region number and two coordinates sings are I ( + , + ), II ( − , + ), III ( − , − ) and IV ( + , − ).In the studies, SODPs are divided into different radius of circle regions in order to extract features (Thuraisingham 2010;Kamath 2012b).Second-order difference plot's four regions of a quadrant are shown below (Figure 4).It shows the region divided by circles centered at the origin.There are four quadrants of a Cartesian coordinate system.Each quadrant has four region divided circle.Each region shows the increasing number of points in SODP.There are sixteen different regions (four regions of four quadrants).The numbers of points are calculated for each region and used as a feature vectors.All quadrant regions contain valuable information to classify SODP points.In the second region ( − , + ) and the fourth region ( + , − ) represent balanced increasing and decreasing, the first region ( + , + ) and the third Region ( − , − ) represent the continuous increasing and decreasing (Kamath 2012b).

Performance Measurements
In this study, total of 33 ECG records (15 CHF and 18 normal ECG) is used.The records are windowed as 3 sec, 5 sec, 7 sec, 10 sec.150 windowed record are used for each record.16 features are extracted from each window.Total of 4950 feature vectors are extracted for each window.Performances are evaluated by using LDA, NB, network, ANN classifiers.The performances are validated by two methods: general k-fold cross-validation and patient based cross-validation.

General k-fold Cross-validation
Cross-validation is also known rotation estimation.It is a method to determine how the results of a statistical analysis will generalize to a new data set.In this method, all data set is randomly separated k equal subsets.One subset is used as validation and all other subsets are used as train.This step iterated for k times leaving one different fold for evaluation each time.This validation method is performed for better approximating error (Duda & Hart n.d.).

Patient Based Cross-validation
Patient based cross-validation is a method that is similar to k-fold cross-validation.Prepared data sets are obtained from M different people.Each data vector are labelled both as information of disease and patient numbers.In the patient based cross-validation, feature vectors that belong to only one person are used as validation and all other data are used as training.The step is iterated M times leaving one different person's data for evaluation each time.M different test set from different people are classified.Correctly classification rate of individual data is used to determine the performance of the system.According to rate of having CHF the system decides whether or not the patient has any disease.When the rate is over 0.50 the decision is positive.When the rate is less than 0.50 the decision is negative.
According to being CHF and Normal record, the results are determined.The performance of the system is assessed as rate of all correctly classified patients.

Result and Discussion
This study is carried out with using Core2Due 2.4 GHz processor, PC that has 2 GB of memory and MATLAB package program.Total of 33 ECG records (15 CHF and 18 normal ECG) are used.In literature both frequencies are used to analyse.So both normal and CHF patient's ECG records with 256 Hz sampling frequency are considered in this study for analysing.The CHF is a pattern type heart disease so the analysed data should consist of at least three peaks.Therefore, in order to examine amount of information in a window of raw ECG pattern, the records are divided into four different window time (10 sec, 7 sec, 5 sec and 3 sec).4950 windowed samples of raw ECG records are arranged for each window time.These data sets are examined separately.SODP is used to extract information from ECG signal in this study.SODP is divided into four circles with different radius for CHF and normal data as shown in Figure 7.There are four region of a quadrant.The number of points that fall in the region of quadrants is calculated.This information is used as feature vectors.16-dimensional feature vectors are generated and the classifier performances are analysed.Sensitivity, selectivity, specificity, overall accuracy and correct classification rate are used as performance measure of classifiers.The general 10-fold cross-validation method is used in this step.The 10 equal subsets are prepared from data set.One subset is used as validation and all other subsets are used as train set.The process is iterated for 10 times.The performance measurements (sensitivity, selectivity, specificity and accuracy) for different window time, and for two sampling frequency are shown in Table 1 and 2 for classifiers of LDA, NB and MLP, respectively.The best performance could be taken by using window time of 10 sec.The best overall accuracy of classifiers are 98.15%, 93.31% and 99.96%, respectively LDA, NB MLP classifiers.The system was planned to build using a classifier with the best performance.For this purpose, the classifiers with the best results that obtained with the above Tables, the second step of validation accomplished.The second step of validation was patient based cross-validation.The goal of this step, the candidate subject was separated CHF subjects from healthy subjects with high performance.In this step, the feature vectors are obtained from 33 (18 normal and 15 CHF patient) different candidates.Each data vectors are labelled both as information of disease and patient numbers.

Figure 1 .
Figure 1.Block diagram of the proposed system.Data acquisition The windowed records (respectively, 10 sec, 7 sec, 5 sec and 3 sec) a)Normal ECG record b) CHF record

Table 1 .
The validation step consists of two parts that are performed as general 10-fold crossvalidation and patient based cross-validation.At first, the feature set is classified without information of candidate.So all windowed data are only labelled as CHF and Normal and used to construct a classification system with high accuracy.LDA, NB, MLP, are used as classifier.The records with two different sampling frequencies and four different windows for each record and four different classifier models are used in the analysis.Different case (combination of two sampling rate, four window time, classifier with different parameter such as neuron size of MLP) are considered.The performance measure of LDA and NB classifiers.
that is divided into four radius circle region (a) CHF patient's ECG signal (b) Normal ECG signal In the patient based crossvalidation, 150 feature vectors that belongs to only one person are used as test set and all other data are used as training set.This process is iterated 33 times for all patients' data.33 different test set from different people are classified as CHF or Normal.According to rate of having CHF which is correctly classification rate of individual data, the system decides whether or not the patient has CHF.When the rate of having CHF is over 0.50 the decision is assessed as positive.When the rate of having CHF is less than 0.50 the decision is assessed as negative.According to being CHF and Normal record, the results are determined.The diagnosis of the patient is determined by looking to the value of rate.If rate is over 0.50, the person is a patient with CHF.Otherwise, the person is a normal patient.Table3shows the result of a classifier.The result indicates that whether the decision is correct or not.The error measurement of the system is assessed as rate of all misclassified CHF and Normal patients.The misclassification performance measurements of systems are shown 2.94, 0.00, 2.94 for LDA, NB, MLP classifiers.It can be released seen that MLP classifier has 0% of misclassification rate which is the pest performance.The comparison of the proposed systems with similar systems is difficult because of the varieties in the classification techniques, feature extraction techniques, data used in system, and performance measurements.

Table 2 .
(Thuraisingham 2010)şler & Kuntalp 2007er.Nonetheless, some conclusions can be drawn.Işler and Kuntalp (Işler & Kuntalp 2007) used R-R interval series and used many techniques to extract features such as time frequency domain features, frequency domain features, point care plot features, and patient information.Genetic algorithm is used to find best subsets of these feature vectors.They reported the maximum performances with sensitivity of 100% and specificity of: 94.74%.But the methods have high calculation complexity.Kamath(Kamath 2012b) reported almost 100% classification rate of CHF subject using k-nearest neighbor classifier analyzing R-R interval series using the central tendency measure of the plots.Thuraisingham(Thuraisingham 2010)reported Central tendency measure to extract features using SODP of RR intervals.He reported as almost 100% of accuracy using a recognition system which utilizes a statistical method to distinguish CHF from normal patients.When the researcher used RR interval series to extract features, they need all R-R interval of holter record to analyse.It is long-term record and is required high computational time.The proposed system could separate CHF form Normal candidates with excellent performance.Furthermore, prepared system used only few part of record to detect CHF patient instead of the whole ECG record.

Table 3
The best performance measure of patient based classification (for window time of 10 sec, for 250 Hz sampled frequency and for LDA classifier)