Comparison of Support Vector Machine Models in the Classification of Susceptibility to Schistosomiasis

— Schistosomiasis has become epidemic sending millions of people into untimely graves. A lot of contributing efforts in term of research have been made to eradicate or reduce the rate of this dangerous infection. In this research work, the concept of Machine Learning as one of the sub-division of Artificial Intelligence is being used to determine the level of susceptibility of Schistosomiasis. The research made a comparison of the various support vector machine models Linear, Quadratic, Cubic, Fine Gaussian, Medium Gaussian and Coarse Gaussian model to determine the level of susceptibility to Schistosomiasis. The results obtained which include Confusion Matrix (CM), Receiver Operating Character (ROC) and Parallel Coordinate Plots (PCP) were interpreted in the form of accuracy, processing speed and execution time. It was finally concluded that Medium Gaussian is the best of all the six models.


I. INTRODUCTION
NE OF the major problems in the Africa sub-region is the high mortality and morbidity rate in connection with the disease known as schistosomiasis. Schistosomiasis is a blood dwelling parasitic worm that represents a series of health problems in a tropical region. More than two hundred million people have been infected. It is one of the widely spread and prevalent parasitic diseases in the world today (Makolo and Akinyemi, 2016). Since the beginning of 20th century, this has been found endemic in several countries and its discontinuous geographical distribution A lot of research work has been carried out in this area using various approaches or methods. Computer Scientists are putting in their best yet, enough has not been done to manage the scourge. This research is out to measure and to compare the performance of Support Vector Machine models at the various level of susceptibility of the scourge. The models will be implemented using MatLab Machine Learning (ML) tools.

A. Schistosomiasis
This is a typical type of disease also called bilharzia. It is caused by parasitic worm released by snail into a river. A victim is infected when in contact with contaminated water and the worm penetrated the skin of the victim. It enters into the body system where it continues to grow for several weeks and become an adult worm. The adult worm lives in the blood vessels where the female type continues to produce eggs. The egg when hatched releases freeswimming larva called miracidia into freshwater. The lava finds its way into a freshwater snail and the victim is infected when in contact with such water. The freshwater can also be contaminated when an already infected animal is in contact with freshwater through its urine or faeces. The worm when absorbed into the bloodstream find its level into the liver, intestine or other vital organs in the body system. Possible symptoms include muscular aches, itching skin, persistence cough, headache, stomach pain, joint pain etc. If not treated on time, the patient starts releasing blood in the urine or stew. This leads to retarded growth, especially in children. It can also cause bladder cancer as well as kidney or liver problem. Some other major complications include high blood pressure (hypertension), urinary problem and destruction of vital organs etc. It is a chronic communicable disease that can lead to death.

B. The Concept of Artificial Intelligence and Machine Learning
Between 1940 and 1950, Artificial Intelligence(AI) emerged as a separate and independent field of study. Though, earlier before this time, literature revealed that there has been in existence of different areas of research which studied different concept that form the basis of AI. Those areas were integrated to shape AI as one of the major and independent areas of study. An attempt to create an "Artificial Intelligence" in a machine such that the machine too can think and solve real-life problems in a humanlike manner is what is being described as Artificial Intelligence. It is a field where systems are made to cope with a certain degree of uncertainty like the accuracy of unexpected events such as unpredictable changes in the world in which the system operates. AI can be defined as the simulation of human intelligence on a machine, to make the machine efficient to identify and use the Comparison of Support Vector Machine Models in the Classification of Susceptibility to Schistosomiasis D. OLANLOYE, R. OLASUNKANMI and E. ODUNTAN  O right piece of knowledge at a given step of solving a problem. Amint Konar (1999). One major subfield of AI is ML. Learning could simply be defined as the process of acquiring knowledge or skill which could be applied in some application areas in futures. The process of making a machine to learn through codes or algorithm to acquire enough skill or knowledge to solve future problems is described as Machines Learning. So, the machine learns whenever it changes its structure, programme or data (based on its input or in response to external information) in such a manner that its expected future performance improves. (Kelvin, 2008).
C. Support Vector Machine SVM is a typical example of ML algorithm. Support Vector Machine (SVM) was introduced by Boser, Guyon and Vapink in COLT 92. It is a supervised learning algorithm used for classification and regression. It is referred to as generalized linear classifier. In other words, it is a classification and regression tool that uses machine learning theory to maximize predictive accuracy while automatically avoiding overfitting of the data. (Vikramadiya, 2006). Support vector machine (SVM) is a Machine Learning algorithm that learns by examples to assign labels to objects. For instances, an SVM can identify a non-fraudulent and fraudulent card (William, 2006). It maximizes a particular mathematical function for a given collection of data. Eventually, the algorithm presents an appropriate classification.

D. Related Works
Machine Learning Principles or tools have been used to carry out a series of research work in the area of disease classification or detection. Stefanie et.al (2013) presented a comprehensive overview of schistosomiasis and explained the latest trend in the diagnosis and treatment of the disease most especially in children. It also revealed the number of people including young and old, recently treated with praziquantel. The article was a review and hence did not propose a specific methodology in detecting or diagnosing the disease. Makolo  Deepti and Sheetal (2013) used SVM and ANN for classification of heart disease in an attempt to assist the physicians to achieve a speedy diagnosis with accurate result. The diagnosis time is reduced with more accuracy in the result obtained. It was finally concluded that SVM performs better than ANN. Prashasti and Disha (2016) predicted the spread of cardiovascular diseases using SVM and Bayesian classification. The research work predicted accuracy, and sensitivity using SVM and Bayesian classification. The research was able to predict whether a person has heart disease or not. Accuracy graph shows that SVM is better than Naive Bayes. Shanshikant, Cheta and Ashak (2011) developed a heart disease diagnosis system using SVM. It is an expert system that can decide what type of heart disease a patient suffers for. In the research work, it was established that SVM with sequential minimum optimization is as good as Radial Basis function for diagnosis of heart disease.

E. Significance of the Research
This work analyses the predictive factors in the dataset to establish the spread of scistosomiasis amongst different age groups, across continents and maximally evaluates the six (6) SVM models in term of speed, accuracy and processing time.The result obtained will serve as an eye opener to the researchers working in the area of machine learning,especially those that are interested in using SVM models to carry out further research work in the area of identification, classification and prediction of scistosomiasis and othet related diseases. The result obtained, will serve as a good platform for further research. work.

A. Data
The research data was compiled across 4 age groups (5 -24, 25 -49, 50 -74, above 74), sex (male and female), 3 different levels of exposure (2, 5, 10 for low, average and high). Following this trend, 3, 2, and 1 were assigned to age group 25 to 49, 50 to 74 and 1 to 4 respectively. 1 was also assigned to those that are above 74 years of age. This means that age groups 1 to 4 and those above 74 are in the same group simply because they are too young or too old to make attempt to swim in the river where they can easily be infected with schistosomiasis. In term of gender, males are perceived to swim in rivers than females and therefore stand a higher risk of contracting the disease and therefore assigned a higher risk value of 2 while female gender was assigned 1. Considering the continent's level of development, Africa is at the highest risk of contracting the disease from the river and therefore assigned the highest risk factor of 4, followed by Asia, South America and others with risk factors of 3, 2 and 1 respectively. For the level of exposure, the values used are 2, 5 and 10 for low, average and high level of exposure.

B. Method
We tested various SVM models using MATLAB machine learning toolbox to classify the susceptibility of man to schistosomiasis infection. Five different predictors which includeage, location, sex, exposure and calculated score were used. The predictors were used as input in the classification learner. The scatter dot plot of the predictors.i.e. the plot of the calculated score against the other four predictors are shown in fig1. The six SVM models used are Linear, Quadratic, Cubic, Fine Gaussian, Medium Gaussian and Coarse Gaussian. We selected the six SVM models and used MATLAB to implement the classification model. Each algorithm was trained using the data set and the result of the training for each model was evaluated using ROC curve, Confusion matrix and parallel coordinate plot.

III. RESULTS
Evaluation results obtained from each model using ROC curve, Confusion matrix and parallel coordinate plot are shown in fig 2-7. The performance of the models was also compared using accuracy, speed and training time as shown in table1.   Therefore, it appears to be the best of all.

V. CONCLUSION
Schistosomiasis is presently one of the diseases that is spreading all over the world especially in the Africa sub-region. The disease has in recent time become endemic and destroying the lives of innocent people all over the world. A lot of research is going on to determine how to manage or eradicate or reduce this scourge.
In this research, efforts were made to look into the behaviour of support Vector Machine models in determining the likelihood of being infected. Six models were trained and the result obtained in term of CM, ROC and PCP were interpreted clearly in term of