STATISTICAL COMPARISON OF MACHINE LEARNING CLASSIFICATION ALGORITHMS USING BIOMEDICAL DATA SETS
Year 2014,
Volume: 16 Issue: 48, 30 - 42, 01.09.2014
Murat Karakoyun
,
Mehmet Hacıbeyoğlu
Abstract
sector. Patient datas are stored in computers with the using of digital hospital systems and in this
way biomedical data sets are consisted. The size of these data sets is too large and it is very difficult to
be analyzed and interpreted by a human. The machine learning algorithms which are workspace of
computer engineering are used for analyzing and interpreting these data sets. In this study 6 machine
learning algorithms’ performance has been tested with using 9 different biomedical data sets and the
obtained results were compared statistically. According to the experimental and statistical results of
this study, for the small and medium sized datasets Artificial Neural Network algorithm and K-Nearest
Neighbor algorithm are succeeded in terms of classification accuracy performance and cpu time
performance, respectively. A part of this work was presented at the ASYU 2014/Izmir symposium.
References
- Akman M., Genç Y., Aankarali H. (2011): "Random Forests Yöntemi ve Sağlık Alanında Bir Uygulama", Türkiye Klinikleri Biyoistatistik Dergisi, Cilt 3, No. 1, s.36–48.
- Alpaydın E. (2010) : "Introduction to Machine Learning", The MIT Press Cambridge, Massachusetts London, England.
- Altman N. S. (2007): "An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression", The American Statiscian, Cilt 46, No. 3, s.175–185.
- Bermejo P., Gámez J. A., Puerta J. M. (2011) : "Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets", Expert Systems with Applications, Cilt 38, No. 3, s.2072–2080.
- Breiman L. (2001) : "Random Forests", Machine Learning, Cilt 45, No. 1, s.5–32.
- Clark P., Niblett T. (1989): "The CN2 Induction Algorithm", Machine Learning, Cilt 3, No. 4, s.261–283.
- Çölkesen I. (2009): " Uzaktan Algılamada İ̇Lerı̇ Sınıflandırma Tekniklerinin Karşılaştırılması ve Analizi", Gebze Yüksek Teknoloji Entitüsü, Jeodezi ve Fotogrametri Mühendisliği, Y.Lisans Tezi.
- Cortes C., Vapnik V. (1995): "Support-vector networks", Machine Learning, Cilt 20, No. 3, s.273.
- DTREG, "Support Vector Machines", http://www.dtreg.com/svm.htm, Erişim Tarihi: 10.12.2014.
- Farid D. M., Li Z., Mofizur R. C., Hossain M.A., Strachan R. (2014): "Hybrid Decision Tree and Naïve Bayes Classifiers for Multi-Class Classification Tasks", Expert Systems with Applications, Cilt 41, No. 4, s.1937–1946.
- Fu K., Qu J., Chai Y., Dong Y. (2014): "Classification of Seizure Based on the Time-Frequency Image of EEG Signals Using HHT and SVM", Biomedical Signal Processing and Control, Cilt 13, s.15–22.
- Gershenson C., "Artificial Neural Networks for Beginners", http://arxiv.org/ftp/cs /papers/0308/0308031.pdf, Erişim Tarihi: 10.12.2014.
- Hacibeyoğlu M., Arslan A., Kahramanlı S. (2011): "Improving Classification Accuracy with Discretization on Datasets Including Continuous Valued Features", World Academy of Science, Engineering and Technology, Cilt 5, No. 6, s.497–500.
- He B., Shi Y., Wan Q., Zhao X. (2014): "Prediction of Customer Attrition of Commercial Banks based on SVM Model", Procedia Computer Science, Cilt 31, s.423–430.
- Huang J., Lu J., Ling C.X. (2003): "Comparing Naive Bayes, decision Trees, and SVM with AUC and Accuracy", Third IEEE International Conference on Data Mining, s.553–556.
- Han J., Kamber M. (2006): "Data Mining : Concepts and Techniques", Morgan Kaufmann Publishers.
- Kaya E., O. Fındık, Babaoğlu İ., Arslan A. (2011): "Effect of Discretization Method on the Diagnosis of Parkinson’s Disease", International Journal of Innovative Computing, Information and Control , Cilt 7, No. 8, s.4669–4678.
- Kecman V. (2001): "Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models", The MIT Press, Cambridge, MA.
- Küçük H., Tepe C., Eminoğlu I. (2013): "Classification of EMG Signals by k-Nearest Neighbor Algorithm and Support Vector Machine Methods", In 2013 21st Signal Processing and Communications Applications Conference (SIU) IEEE, s.1–4.
- Mather P. M. (2005): "Computer Processing of Remotely-Sensed Images".
- McCulloch W. S., Pitts W. (1990): "A Logical Calculus of the Ideas Immanent in Bervous Activity", Bullettin of Mathematical Biophysics, Cilt 5, s.115-133.
- Nitze I., Barrett B. ve Cawkwell F. (2015): "Temporal optimisation of image acquisition for land cover classification with Random Forest and MODIS time-series.", International Journal of Applied Earth Observation and Geoinformation, Cilt.34, s.136–146.
- Pal M. (2005): "Random forest classifier for remote sensing classification", International Journal of Remote Sensing, Cilt.26, No.1, s.217–222.
- Rajendra U.A., Subbanna P.B., Iyengar S.S. Rao A. ve Dua S. (2003): "Classification of heart rate data using artificial neural network and fuzzy equivalence relation", Pattern Recognition, Cilt.36, No.1, s.61–68.
- Tebelskis J. (1995): "Speech Recognition using Neural Networks", Carnegie Mellon University Pittsburgh, Pennsylvania.
- UCI, http://archive.ics.uci.edu/ml/, Erişim Tarihi: 10.12.2014
BİYOMEDİKAL VERİ KÜMELERİ İLE MAKİNE ÖĞRENMESİ SINIFLANDIRMA ALGORİTMALARININ İSTATİSTİKSEL OLARAK KARŞILAŞTIRILMASI
Year 2014,
Volume: 16 Issue: 48, 30 - 42, 01.09.2014
Murat Karakoyun
,
Mehmet Hacıbeyoğlu
Abstract
Günümüzde bilişim teknolojileri hemen hemen her alanda kullanılmaktadır. En çok kullanılan
alanlardan bir tanesi de sağlık sektörüdür. Dijital hastane sistemlerinin kullanılmasıyla birlikte hasta
verileri artık bilgisayar ortamında saklanmakta ve böylelikle biyomedikal veri kümeleri oluşmaktadır.
Boyut olarak çok büyük olan bu veri kümelerinin bir insan tarafından analiz edilmesi ve yorumlanması
çok zordur. Bunun için bilgisayar mühendisliği çalışma alanlarından biri olan makine öğrenmesi
algoritmaları kullanılır. Bu çalışmada 6 tane makine öğrenmesi algoritmalarının başarımları 9 farklı
biyomedikal veri kümesi üzerinde test edilmiştir ve elde edilen sonuçlar istatistiksel olarak
karşılaştırılmıştır. Deneysel ve istatistiksel sonuçlar birlikte incelediğinde küçük ve orta büyüklükteki
biyomedikal veri kümeleri için Yapay Sinir Ağları algoritması sınıflandırma başarımı açısından ve Ken
Yakın Komşu algoritması ise çalışma zamanı açısından daha başarılı olmuştur. Bu çalışmanın bir
bölümü ASYU 2014/İzmir sempozyumunda bildiri olarak sunulmuştur
References
- Akman M., Genç Y., Aankarali H. (2011): "Random Forests Yöntemi ve Sağlık Alanında Bir Uygulama", Türkiye Klinikleri Biyoistatistik Dergisi, Cilt 3, No. 1, s.36–48.
- Alpaydın E. (2010) : "Introduction to Machine Learning", The MIT Press Cambridge, Massachusetts London, England.
- Altman N. S. (2007): "An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression", The American Statiscian, Cilt 46, No. 3, s.175–185.
- Bermejo P., Gámez J. A., Puerta J. M. (2011) : "Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets", Expert Systems with Applications, Cilt 38, No. 3, s.2072–2080.
- Breiman L. (2001) : "Random Forests", Machine Learning, Cilt 45, No. 1, s.5–32.
- Clark P., Niblett T. (1989): "The CN2 Induction Algorithm", Machine Learning, Cilt 3, No. 4, s.261–283.
- Çölkesen I. (2009): " Uzaktan Algılamada İ̇Lerı̇ Sınıflandırma Tekniklerinin Karşılaştırılması ve Analizi", Gebze Yüksek Teknoloji Entitüsü, Jeodezi ve Fotogrametri Mühendisliği, Y.Lisans Tezi.
- Cortes C., Vapnik V. (1995): "Support-vector networks", Machine Learning, Cilt 20, No. 3, s.273.
- DTREG, "Support Vector Machines", http://www.dtreg.com/svm.htm, Erişim Tarihi: 10.12.2014.
- Farid D. M., Li Z., Mofizur R. C., Hossain M.A., Strachan R. (2014): "Hybrid Decision Tree and Naïve Bayes Classifiers for Multi-Class Classification Tasks", Expert Systems with Applications, Cilt 41, No. 4, s.1937–1946.
- Fu K., Qu J., Chai Y., Dong Y. (2014): "Classification of Seizure Based on the Time-Frequency Image of EEG Signals Using HHT and SVM", Biomedical Signal Processing and Control, Cilt 13, s.15–22.
- Gershenson C., "Artificial Neural Networks for Beginners", http://arxiv.org/ftp/cs /papers/0308/0308031.pdf, Erişim Tarihi: 10.12.2014.
- Hacibeyoğlu M., Arslan A., Kahramanlı S. (2011): "Improving Classification Accuracy with Discretization on Datasets Including Continuous Valued Features", World Academy of Science, Engineering and Technology, Cilt 5, No. 6, s.497–500.
- He B., Shi Y., Wan Q., Zhao X. (2014): "Prediction of Customer Attrition of Commercial Banks based on SVM Model", Procedia Computer Science, Cilt 31, s.423–430.
- Huang J., Lu J., Ling C.X. (2003): "Comparing Naive Bayes, decision Trees, and SVM with AUC and Accuracy", Third IEEE International Conference on Data Mining, s.553–556.
- Han J., Kamber M. (2006): "Data Mining : Concepts and Techniques", Morgan Kaufmann Publishers.
- Kaya E., O. Fındık, Babaoğlu İ., Arslan A. (2011): "Effect of Discretization Method on the Diagnosis of Parkinson’s Disease", International Journal of Innovative Computing, Information and Control , Cilt 7, No. 8, s.4669–4678.
- Kecman V. (2001): "Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models", The MIT Press, Cambridge, MA.
- Küçük H., Tepe C., Eminoğlu I. (2013): "Classification of EMG Signals by k-Nearest Neighbor Algorithm and Support Vector Machine Methods", In 2013 21st Signal Processing and Communications Applications Conference (SIU) IEEE, s.1–4.
- Mather P. M. (2005): "Computer Processing of Remotely-Sensed Images".
- McCulloch W. S., Pitts W. (1990): "A Logical Calculus of the Ideas Immanent in Bervous Activity", Bullettin of Mathematical Biophysics, Cilt 5, s.115-133.
- Nitze I., Barrett B. ve Cawkwell F. (2015): "Temporal optimisation of image acquisition for land cover classification with Random Forest and MODIS time-series.", International Journal of Applied Earth Observation and Geoinformation, Cilt.34, s.136–146.
- Pal M. (2005): "Random forest classifier for remote sensing classification", International Journal of Remote Sensing, Cilt.26, No.1, s.217–222.
- Rajendra U.A., Subbanna P.B., Iyengar S.S. Rao A. ve Dua S. (2003): "Classification of heart rate data using artificial neural network and fuzzy equivalence relation", Pattern Recognition, Cilt.36, No.1, s.61–68.
- Tebelskis J. (1995): "Speech Recognition using Neural Networks", Carnegie Mellon University Pittsburgh, Pennsylvania.
- UCI, http://archive.ics.uci.edu/ml/, Erişim Tarihi: 10.12.2014