A heuristic approach with artificial neural network for Parkinson’s disease

ABSTRACT


Introduction
Parkinson's disease is caused by the breakdown of cells that produce the substance called dopamine in the brain that allows brain cells to communicate with each other.Cells that produce dopamine in the brain are responsible for the control, alignment and fluency of movements.
Parkinson's disease commonly manifests itself with slowness in movements, tremor while resting, psychiatric disorders.The disease affects a population of more than 10 million people worldwide.Incidence of Parkinson's disease has been observed to increase with age.It is estimated that four percent of people with PD are diagnosed before age 50 [1].
Besides the movement related issues PD patients experience, they may have many different complaints.These complaints include fatigue, decreased cognitive functions, changes in speech, depression, anxiety, behavioural disorders, visual impairments, weight loss, sleep abnormalities and pain [2].
Early treatment slows the progression of the disease and improves the patient's quality of life.Many different treatment methods can be applied by the specialist physician according to the stages of the disease.Firstly, it is aimed that the patient can continue his life without addiction.The decrease in dopaminergic nerve signals that develops with the loss of dopamine-producing cells is ensured to be balanced with the drug.There are many studies based on voice recordings for the diagnosis of Parkinson's disease.These studies apply various machine learning methods such as Decision Tree, Naive Bayes, Support Vector Machines, Random Forest and Neural Network [3], [4].
Many researchers have been studied machine learning methods in different domains and approaches since they can be automatically trained and improved with the training datasets.Sakar et al. [5] collect the Parkinson's Disease Classification dataset and examine some experiments on it.They perform naïve bayes, k nearest neighbour, random forest, multilayer perception, and support vector machines classifiers using different feature subsets in the dataset.They obtain the best result using minimum redundancy maximum relevance (mRMR) feature selection method with support vector machine (SVM) classifier with an accuracy of 86%.
Solana-Lavalle et al. [6] examine wrappers feature selection method to choose the most important features.Then they classify the features using four machine learning classifiers namely k nearest neighbour, multi-layer perceptron, support vector machine and random forest.
Their study shows that an accuracy of 94.7% by using support vector machine classifier.
Almeida et al. [7] analyse eighteen feature extraction methods using four machine learning classifiers on Parkinson's disease dataset obtained from sustained phonation and speech tasks.They present the task of phonation is more efficient than speech tasks in the detection of Parkinson's disease.
Shahbakhi et al. [4] propose an approach to diagnose Parkinson's disease based on speech analysis.They use genetic algorithm to select informative features.Then they classify the features using support vector machine algorithm.Their method achieves with accuracy 94.5% for 4 optimized features, 93.66% for 7 optimized features 94.22% for 9 optimized features.
Peker et al. [8] analyse biomedical sound measurements obtained from continuous phonation samples to diagnose PD.They apply minimum redundancy maximum relevance (mRMR) feature selection method to identify informative features.They classify these features using artificial neural network (ANN) and complex-valued artificial neural network (CVANN).They obtain 95% using ANN and 99% using CVANN in F score.
In this study, we aim to analyze the best combination of feature selection and classification methods for diagnosing Parkinson's disease.For this purpose, we investigate different combinations of two filters feature selection method with a bio-inspired search algorithm using machine learning classifiers.
The next section introduces research methodology that is used for the study.Section III presents the experimental results.Finally, Section IV concludes the paper.

Wolf Search Algorithm
The wolf search algorithm (WSA) is an algorithm system that is inspired by the hierarchy and hunting mechanisms of wolves in nature.The algorithm was developed based on the hunting behaviour of wolf packs.Unlike other bio-inspired algorithms like particle swarm optimization [9], it is a method in which multiple wolves without a single leader go to find the best solution in multiple directions.The task of a group of wolves is to scout for prey.Each wolf in the WSA moves independently of each other to achieve a better position according to their own characteristics.Unlike the ant colony optimization algorithm [10] that uses pheromones to communicate with each other, WSA does not engage in such communication.This results in a shorter searching run time for the WSA.While hunting, wolves try to hide themselves when they approach their prey.This feature encourages hunters in WSA to constantly change their position in an effort to find and move to better positions with less vulnerabilities.During the search, wolves simultaneously look for prey and pay attention to threats.Each wolf in the pack picks its own position, constantly moving to better positions while tracking for potential threats.With the excellent sense of smell that wolves possess they often find their prey by tracking their smell.Similar to this tracking feature, in the WSA each wolf has a visual range that creates a sensing radius.The wolves in WSA are limited by this visual range with regards to their search for food, their awareness of peers in the pack in the search for a better position, and awareness of enemies that may be close by (to jump from the visual range).When the wolves sense a prey is close, they act fast, quietly and carefully, in an effort to hide their presence from the prey.
The pseudo code of the WSA is presented in Figure 1 and summarized basic rules of the algorithm as follows: 1.Each wolf in the WSA has set a fixed visual range with the radius (r).In the hyper plane dominated by multiple features, the distance is estimated by the Minkowski distance.Each wolf can only perceive wolves in his visual range.The distance taken by the wolf at a time is usually smaller than the wolf's visual range.2. The result of the objective function shows wolf's current location.Wolf constantly works in an effort to move to better positions.However, the wolf prefers to move to the best position that includes a companion.The wolf chooses the best position within the given options in the case where there is more than one better position taken control by a companion.If there are no better options than the wolf's current location, the wolf continues to change position randomly.3.If the wolf detects an enemy, it will try to avoid it by escaping to a random location that is beyond the enemy's visual range.

Information Gain Feature Selection Method
Information gain feature selection method is an entropybased feature selection method and it is calculated by the feature's contribution to overall entropy.The expected information is calculated to classify a tuple in D as follows [11]: where m is the number of classes and pi is the probability that a tuple in D. If we have to partition the samples in D on some feature A having v distinct values, {a1,a2,…,av}, D will split into v subsets {D1,D2,…,Dv}.
The expected information required to classify a tuple from D based on the subset on A can be calculated as follows: where the term || is the weight of jth subset.The information gain is defined as the difference between the original information based on the subset of classes and the obtained information after partitioning on A in the following: The information gain method ranks features according to their highest information gain score.The top ranked features are selected to reduce the feature size for better classification results.

ReliefF Feature Selection Method
ReliefF algorithm is proposed by Kononenko [12][13] by extending Relief algorithm for multi class problems.As shown in Figure 2, the algorithm by random selection picks an instance Ri then it searches for k nearest neighbours from the same class (Hj).The algorithm also searches for k nearest neighbours from each of class (nearest misses Mj(C)).The algorithm updates the vector W[A] of estimations of the qualities of attributes depending on Ri, Hj, and Mj(C).The whole process is repeated m times.
The main purpose of the ReliefF algorithm is to separate each pair of classes regardless of which two classes are closest to each other.

Machine Learning Algorithms for Classification
In this study, we apply four machine learning classifiers, namely logistic regression, support vector machines, random forest, and artificial neural networks, and the system is evaluated using five-fold cross validation.We compare the classification results of using different feature selection methods.
Logistic regression is one of the most popular algorithms in statistics and belongs to the exponential classifiers' family.It extracts some set of features multiplying by a weight from the input dataset.If feature f1 and feature f2 are correlated, regression assigns half the weight to w1 and half to w2.Logistic regression classifier estimates p(c|x) by extracting features from the datasets as: where N is the number of features, and features are a property of the observation x and output class c [14].
The main goal of the support vector machines algorithm is to find the maximum-margin hyperplane linear model that separates two classes.The maximum-margin hyperplane identifies the largest separation between two classes.As shown in Figure 3, open and filled circles represent different classes.Samples that are closest to the maximum-margin hyperplane are called support vectors.There is always at least one support vector for each class [15].

Algorithm ReliefF
Input: for each training instance, a vector of feature values and the class value 1. initialize vector W 2. for i= 1 to m do 3.
randomly select a target instance Ri 4.
find k nearest hits Hj 5.
for each class C <> class(Ri) do 6.
from class C find k nearest misses Mj(C) 7.
end for 8. end for 9. for A= 1 to a do Random forest is an ensemble algorithm that includes same or different kind of algorithms for classifying objects.Random forest classifier randomly selects subsets from the training set to create decision trees and classifies them based on the predictions from each of the decision trees to make an average prediction.The low correlation is important to produce ensemble predictions because decorrelated models protect from individual errors [14].
In artificial neural networks contains multiple layers of neurons.A neuron feeds all the neurons in the next layer.Sigmoid function is used to activate each neuron.This neural network is trained with the back-propagation algorithm based on recursively using gradient descent to adjust the weights.Tunning of the parameters is performed backwards from the output layer to the input layer.An artificial neural network consists of an input layer, one or more hidden layers and an output layer.Figure 4 shows an example of an artificial neural network.The only disadvantage of this method is that it is 20-700 times slower than the other methods.I, feature groups are summarized with their descriptions [6].

Performance Evaluation
The performance of the classification experiments was evaluated with the F-score.The F-score is an evaluation criterion that takes into account both the precision and recall of the classifier [11].Precision (P) is an evaluation criterion about how sensitive the classifier is in the classifications made for a class.Recall (R) is the criterion that evaluates the proper classification of samples.The Fscore is the harmonic mean of precision and recall values as follows: −  = 2 *  *  + (5)

Experimental Results and Discussions
In this study, we investigate which feature selection method or combination produces the most accurate classification result.Firstly, we use some common feature selection methods namely information gain and reliefF methods.Secondly, we investigate using a heuristic method to select the most important features with these commonly used feature selection methods.For this purpose, we use Wolf Search Algorithm to select the most important features.Finally, we apply information gain and reliefF feature selection methods on these features.
Furthermore, we apply logistic regression (LR), support vector machines (SVM), random forest (RF) and artificial neural network (ANN) to classify the PD dataset.We use the classifiers from the Weka data mining tool version 3.8.1.We evaluate the classification performance by using the micro-average of F-score with five-fold cross validations.

Figure 1 .
Figure 1.Pseudo code of Wolf Search Algorithm.

Figure 4 .
Figure 4.An artificial neural network3.Dataset Used and Experimental Results3.1.DatasetsWe use a publicly available dataset for the study from the University of California at Irvine Machine Learning Repository by Sakar et al.[5].Parkinson's Disease

Table 1 .
Feature set of the Parkinson's disease dataset