Research Article
BibTex RIS Cite

Comparison of Different Training Data Reduction Approaches for Fast Support Vector Machines Based on Principal Component Analysis and Distance Based Measurements

Year 2018, Volume: 4 Issue: 1, 1 - 5, 15.03.2018
https://doi.org/10.22399/ijcesen.374222

Abstract

Support vector
machine is a supervised learning algorithm, which is recommended for
classification and nonlinear function approaches. Support vector machines
require remarkable amount of memory with time consuming process for large data
sets in the training process. The main reason for this is the solving a
constrained quadratic programming problem within the algorithm. In this paper,
we proposed three approaches for identifying the non-critical points in
training set and remove these non-critical points from the original training
set for speeding up the training process of support vector machine. For this
purpose, we used principal component analysis, Mahalanobis distance and
Euclidean distance based measurements for the elimination of non-critical
training instances in the training set. We compared the proposed methods in
terms of computational time and classification accuracy between each other and
conventional support vector machine. Our experimental results show that
principal component analysis and Mahalanobis distance based proposed methods
have positive effects on computational time without degrading the
classification results than the Euclidean distance based proposed method and
conventional support vector machine.

References

  • Vapnik V. “The nature of statistical learning theory” (Springer-Verlag, New York, 1995).
  • Cervantes J., X. Li, W. Yu, “Support vector classification for large data sets by reducing training data with change of classes”, International conference on systems, man. and cybernetics, IEEE, 2008, 2609-2614.
  • Javed I., M.N. Ayyaz, W. Mehmood “Efficient training data reduction for SVM based handwritten digits’ recognition”, International conference on electrical engineering, 2007, 1-4.
  • Koggalage R., S. Halgamuge “Reducing the number of training samples for fast support vector machine classification”, Neural information processing - letters and reviews, vol. 2, no. 3, 2004, 57-65.
  • Fortuna J., D. Capson, “Improved support vector classification using PCA and ICA feature space modification”, Pattern recognition, 37, 2004, 1117-1129.
  • Cao L. J., K.S. Chua, W.K. Chong, H.P. Lee, Q.M. Gu “A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine”, Neurocomputing, 55, 2003, 321-336.
  • Subasi A., M.I. Gursoy “EEG signal classification using PCA, ICA, LDA and support vector machines”, Expert systems with applications, vol. 37, no. 12, 2010, 8659-8666.
  • Gertych A., A. Zhang, J. Sayre, S.P. Kurkowska, H.K. Huang “Bone age assessment of children using a digital hand atlas”, Computerized Medical Imaging and Graphics, 31, 2007, 322-331.
  • Güraksın G. E., H. Uğuz, Ö.K. Baykan “Bone age determination in young children from newborn to 6 year-old using support vector machines”, Turkish Journal of Electrical Engineering and Computer Sciences, 24, 1693-1708.
  • Frank A., A. Asuncion “UCI Machine Learning Repository” [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 2010.
  • Edla D. R., V. Gondlekar, V. Gauns “HK-Means:A Heuristic Approach to Initialize and estimate the number of clusters in biological data”, ICCESEN2016, ACTA PHYSICA POLONICA A, 130(1), 2016,78-82.
  • Grimaldi M., P. Cunningham, A. Kokaram “An evaluation of alternative feature selection strategies and ensemble techniques for classifying music”, Workshop in Multimedia Discovery and Mining, ECML/PKDD03, Dubrovnik, Croatia, 2003.
  • Tamura H., K. Tanno “Midpoint validation method for support vector machines with margin adjustment technique”, International Journal of Innovative Computing, Information and Control, 5(11(A)), 2009, 4025-4032.
  • Yüksel A. S., Ş.F. Çankaya, İ.S. Üncü “Design of a machine learning based predictive analytic system for spam problem”, ICCESEN2016, ACTA PHYSICA POLONICA A, 132 (2), 2017, 500-504.
  • Cömert Z., A.F. Kocamaz “Comparison of Machine Learning Technişques for Fetal Heart Rate Classification”, ICCESEN2016, ACTA PHYSICA POLONICA A, 132(3), 2017, 451-454.
  • Uğuz H. “A Biomedical System Based on Artificial Neural Network and Principal Component Analysis for Diagnosis of the Heart Valve Diseases”, J. Med. Syst., 36, 2012, 61-72.
  • Lindsay I., A. Smith “A tutorial on principal components analysis”, < http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial>, 2002.
  • Valle S., W. Li, S.J. Qin “Selection of the number of principal components: The variance of the reconstruction error criterion with a comparison to other methods”, Ind. Eng. Chem. Res. 38, 1999, 4389–4401.
  • Torra V., Y. Narukawa “On a comparison between Mahalanobis distance and Choquet integral: The Choquet–Mahalanobis operatör”, Information Sciences, 190, 2012, 56-63.
Year 2018, Volume: 4 Issue: 1, 1 - 5, 15.03.2018
https://doi.org/10.22399/ijcesen.374222

Abstract

References

  • Vapnik V. “The nature of statistical learning theory” (Springer-Verlag, New York, 1995).
  • Cervantes J., X. Li, W. Yu, “Support vector classification for large data sets by reducing training data with change of classes”, International conference on systems, man. and cybernetics, IEEE, 2008, 2609-2614.
  • Javed I., M.N. Ayyaz, W. Mehmood “Efficient training data reduction for SVM based handwritten digits’ recognition”, International conference on electrical engineering, 2007, 1-4.
  • Koggalage R., S. Halgamuge “Reducing the number of training samples for fast support vector machine classification”, Neural information processing - letters and reviews, vol. 2, no. 3, 2004, 57-65.
  • Fortuna J., D. Capson, “Improved support vector classification using PCA and ICA feature space modification”, Pattern recognition, 37, 2004, 1117-1129.
  • Cao L. J., K.S. Chua, W.K. Chong, H.P. Lee, Q.M. Gu “A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine”, Neurocomputing, 55, 2003, 321-336.
  • Subasi A., M.I. Gursoy “EEG signal classification using PCA, ICA, LDA and support vector machines”, Expert systems with applications, vol. 37, no. 12, 2010, 8659-8666.
  • Gertych A., A. Zhang, J. Sayre, S.P. Kurkowska, H.K. Huang “Bone age assessment of children using a digital hand atlas”, Computerized Medical Imaging and Graphics, 31, 2007, 322-331.
  • Güraksın G. E., H. Uğuz, Ö.K. Baykan “Bone age determination in young children from newborn to 6 year-old using support vector machines”, Turkish Journal of Electrical Engineering and Computer Sciences, 24, 1693-1708.
  • Frank A., A. Asuncion “UCI Machine Learning Repository” [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 2010.
  • Edla D. R., V. Gondlekar, V. Gauns “HK-Means:A Heuristic Approach to Initialize and estimate the number of clusters in biological data”, ICCESEN2016, ACTA PHYSICA POLONICA A, 130(1), 2016,78-82.
  • Grimaldi M., P. Cunningham, A. Kokaram “An evaluation of alternative feature selection strategies and ensemble techniques for classifying music”, Workshop in Multimedia Discovery and Mining, ECML/PKDD03, Dubrovnik, Croatia, 2003.
  • Tamura H., K. Tanno “Midpoint validation method for support vector machines with margin adjustment technique”, International Journal of Innovative Computing, Information and Control, 5(11(A)), 2009, 4025-4032.
  • Yüksel A. S., Ş.F. Çankaya, İ.S. Üncü “Design of a machine learning based predictive analytic system for spam problem”, ICCESEN2016, ACTA PHYSICA POLONICA A, 132 (2), 2017, 500-504.
  • Cömert Z., A.F. Kocamaz “Comparison of Machine Learning Technişques for Fetal Heart Rate Classification”, ICCESEN2016, ACTA PHYSICA POLONICA A, 132(3), 2017, 451-454.
  • Uğuz H. “A Biomedical System Based on Artificial Neural Network and Principal Component Analysis for Diagnosis of the Heart Valve Diseases”, J. Med. Syst., 36, 2012, 61-72.
  • Lindsay I., A. Smith “A tutorial on principal components analysis”, < http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial>, 2002.
  • Valle S., W. Li, S.J. Qin “Selection of the number of principal components: The variance of the reconstruction error criterion with a comparison to other methods”, Ind. Eng. Chem. Res. 38, 1999, 4389–4401.
  • Torra V., Y. Narukawa “On a comparison between Mahalanobis distance and Choquet integral: The Choquet–Mahalanobis operatör”, Information Sciences, 190, 2012, 56-63.
There are 19 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Research Articles
Authors

Gür Emre Güraksın

Harun Uğuz

Publication Date March 15, 2018
Submission Date January 3, 2018
Acceptance Date March 6, 2018
Published in Issue Year 2018 Volume: 4 Issue: 1

Cite

APA Güraksın, G. E., & Uğuz, H. (2018). Comparison of Different Training Data Reduction Approaches for Fast Support Vector Machines Based on Principal Component Analysis and Distance Based Measurements. International Journal of Computational and Experimental Science and Engineering, 4(1), 1-5. https://doi.org/10.22399/ijcesen.374222