Comparison of Different Training Data Reduction Approaches for Fast Support Vector Machines Based on Principal Component Analysis and Distance Based Measurements

Gür Emre Güraksın; Harun Uğuz

doi:10.22399/ijcesen.374222

EN

Comparison of Different Training Data Reduction Approaches for Fast Support Vector Machines Based on Principal Component Analysis and Distance Based Measurements

Öz

Support vector machine is a supervised learning algorithm, which is recommended for classification and nonlinear function approaches. Support vector machines require remarkable amount of memory with time consuming process for large data sets in the training process. The main reason for this is the solving a constrained quadratic programming problem within the algorithm. In this paper, we proposed three approaches for identifying the non-critical points in training set and remove these non-critical points from the original training set for speeding up the training process of support vector machine. For this purpose, we used principal component analysis, Mahalanobis distance and Euclidean distance based measurements for the elimination of non-critical training instances in the training set. We compared the proposed methods in terms of computational time and classification accuracy between each other and conventional support vector machine. Our experimental results show that principal component analysis and Mahalanobis distance based proposed methods have positive effects on computational time without degrading the classification results than the Euclidean distance based proposed method and conventional support vector machine.

Anahtar Kelimeler

Support Vector Machines,Principal Component Analysis,Machine Learning

Kaynakça

Vapnik V. “The nature of statistical learning theory” (Springer-Verlag, New York, 1995).
Cervantes J., X. Li, W. Yu, “Support vector classification for large data sets by reducing training data with change of classes”, International conference on systems, man. and cybernetics, IEEE, 2008, 2609-2614.
Javed I., M.N. Ayyaz, W. Mehmood “Efficient training data reduction for SVM based handwritten digits’ recognition”, International conference on electrical engineering, 2007, 1-4.
Koggalage R., S. Halgamuge “Reducing the number of training samples for fast support vector machine classification”, Neural information processing - letters and reviews, vol. 2, no. 3, 2004, 57-65.
Fortuna J., D. Capson, “Improved support vector classification using PCA and ICA feature space modification”, Pattern recognition, 37, 2004, 1117-1129.
Cao L. J., K.S. Chua, W.K. Chong, H.P. Lee, Q.M. Gu “A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine”, Neurocomputing, 55, 2003, 321-336.
Subasi A., M.I. Gursoy “EEG signal classification using PCA, ICA, LDA and support vector machines”, Expert systems with applications, vol. 37, no. 12, 2010, 8659-8666.
Gertych A., A. Zhang, J. Sayre, S.P. Kurkowska, H.K. Huang “Bone age assessment of children using a digital hand atlas”, Computerized Medical Imaging and Graphics, 31, 2007, 322-331.

Güraksın G. E., H. Uğuz, Ö.K. Baykan “Bone age determination in young children from newborn to 6 year-old using support vector machines”, Turkish Journal of Electrical Engineering and Computer Sciences, 24, 1693-1708.
Frank A., A. Asuncion “UCI Machine Learning Repository” [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 2010.
Edla D. R., V. Gondlekar, V. Gauns “HK-Means:A Heuristic Approach to Initialize and estimate the number of clusters in biological data”, ICCESEN2016, ACTA PHYSICA POLONICA A, 130(1), 2016,78-82.
Grimaldi M., P. Cunningham, A. Kokaram “An evaluation of alternative feature selection strategies and ensemble techniques for classifying music”, Workshop in Multimedia Discovery and Mining, ECML/PKDD03, Dubrovnik, Croatia, 2003.
Tamura H., K. Tanno “Midpoint validation method for support vector machines with margin adjustment technique”, International Journal of Innovative Computing, Information and Control, 5(11(A)), 2009, 4025-4032.
Yüksel A. S., Ş.F. Çankaya, İ.S. Üncü “Design of a machine learning based predictive analytic system for spam problem”, ICCESEN2016, ACTA PHYSICA POLONICA A, 132 (2), 2017, 500-504.
Cömert Z., A.F. Kocamaz “Comparison of Machine Learning Technişques for Fetal Heart Rate Classification”, ICCESEN2016, ACTA PHYSICA POLONICA A, 132(3), 2017, 451-454.
Uğuz H. “A Biomedical System Based on Artificial Neural Network and Principal Component Analysis for Diagnosis of the Heart Valve Diseases”, J. Med. Syst., 36, 2012, 61-72.
Lindsay I., A. Smith “A tutorial on principal components analysis”, < http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial>, 2002.
Valle S., W. Li, S.J. Qin “Selection of the number of principal components: The variance of the reconstruction error criterion with a comparison to other methods”, Ind. Eng. Chem. Res. 38, 1999, 4389–4401.
Torra V., Y. Narukawa “On a comparison between Mahalanobis distance and Choquet integral: The Choquet–Mahalanobis operatör”, Information Sciences, 190, 2012, 56-63.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Mühendislik

Bölüm

Araştırma Makalesi

Yazarlar

Gür Emre Güraksın ^*
Türkiye

Harun Uğuz
Türkiye

Yayımlanma Tarihi

15 Mart 2018

Gönderilme Tarihi

3 Ocak 2018

Kabul Tarihi

6 Mart 2018

Yayımlandığı Sayı

Yıl 2018 Cilt: 4 Sayı: 1

DOI

https://doi.org/10.22399/ijcesen.374222

IZ

https://izlik.org/JA58YE38PD

Kaynak Göster

RIS / Bibtex

APA

Güraksın, G. E., & Uğuz, H. (2018). Comparison of Different Training Data Reduction Approaches for Fast Support Vector Machines Based on Principal Component Analysis and Distance Based Measurements. International Journal of Computational and Experimental Science and Engineering, 4(1), 1-5. https://doi.org/10.22399/ijcesen.374222

AMA

1.Güraksın GE, Uğuz H. Comparison of Different Training Data Reduction Approaches for Fast Support Vector Machines Based on Principal Component Analysis and Distance Based Measurements. IJCESEN. 2018;4(1):1-5. doi:10.22399/ijcesen.374222

Chicago

Güraksın, Gür Emre, ve Harun Uğuz. 2018. “Comparison of Different Training Data Reduction Approaches for Fast Support Vector Machines Based on Principal Component Analysis and Distance Based Measurements”. International Journal of Computational and Experimental Science and Engineering 4 (1): 1-5. https://doi.org/10.22399/ijcesen.374222.

EndNote

Güraksın GE, Uğuz H (01 Mart 2018) Comparison of Different Training Data Reduction Approaches for Fast Support Vector Machines Based on Principal Component Analysis and Distance Based Measurements. International Journal of Computational and Experimental Science and Engineering 4 1 1–5.

IEEE

[1]G. E. Güraksın ve H. Uğuz, “Comparison of Different Training Data Reduction Approaches for Fast Support Vector Machines Based on Principal Component Analysis and Distance Based Measurements”, IJCESEN, c. 4, sy 1, ss. 1–5, Mar. 2018, doi: 10.22399/ijcesen.374222.

ISNAD

Güraksın, Gür Emre - Uğuz, Harun. “Comparison of Different Training Data Reduction Approaches for Fast Support Vector Machines Based on Principal Component Analysis and Distance Based Measurements”. International Journal of Computational and Experimental Science and Engineering 4/1 (01 Mart 2018): 1-5. https://doi.org/10.22399/ijcesen.374222.

JAMA

1.Güraksın GE, Uğuz H. Comparison of Different Training Data Reduction Approaches for Fast Support Vector Machines Based on Principal Component Analysis and Distance Based Measurements. IJCESEN. 2018;4:1–5.

MLA

Güraksın, Gür Emre, ve Harun Uğuz. “Comparison of Different Training Data Reduction Approaches for Fast Support Vector Machines Based on Principal Component Analysis and Distance Based Measurements”. International Journal of Computational and Experimental Science and Engineering, c. 4, sy 1, Mart 2018, ss. 1-5, doi:10.22399/ijcesen.374222.

Vancouver

1.Gür Emre Güraksın, Harun Uğuz. Comparison of Different Training Data Reduction Approaches for Fast Support Vector Machines Based on Principal Component Analysis and Distance Based Measurements. IJCESEN. 01 Mart 2018;4(1):1-5. doi:10.22399/ijcesen.374222

Cited By

Sentiment Analysis of Shared Tweets on Global Warming on Twitter with Data Mining Methods: A Case Study on Turkish Language

Computational Intelligence and Neuroscience

https://doi.org/10.1155/2020/1904172