A single pairwise model for classication using online learning with kernels

Engin Tas

Research Article

A single pairwise model for classication using online learning with kernels

Year 2017, Volume: 46 Issue: 3, 547 - 557, 01.06.2017

Engin Tas

Abstract

Any binary or multi-class classication problem can be transformed into a pairwise prediction problem. This expands the data and brings an advantage of learning from a richer set of examples, in the expense of increasing costs when the data is in higher dimensions. Therefore, this study proposes to adopt an online support vector machine to work with pairs of examples. This modified algorithm is suitable for large data sets due to its online nature and it can also handle the sparsity structure existing in the data. Performances of the pairwise setting and the direct setting are compared in two problems from different domains. Results indicate that the pairwise setting outperforms the direct setting significantly. Furthermore, a general framework is designed to use this pairwise approach in a multi-class classication task. Result indicate
that this single pairwise model achieved competitive classication rates even in large-scaled datasets with higher dimensionality.

Keywords

online learning, pairwise learning, support vector machines, kernel methods, multi-class classification

References

Anlauf, J. and Biehl, M. (2007). The adatron: an adaptive perceptron algorithm. EPL (Europhysics Letters), 10(7), 687.
Basilico, J. and Hofmann, T. (2004). Unifying collaborative and content-based ltering. In Proceedings of the twenty-rst international conference on Machine learning, ICML '04, pages 9, New York, NY, USA, 2004. ACM. ISBN 1-58113-838-5. doi: 10.1145/1015330. 1015394. URL http://doi.acm.org/10.1145/1015330.1015394.
Ben-Hur, A. and Noble, W. (2005). Kernel methods for predicting protein-protein interactions. Bioinformatics, 21(suppl 1), i38i46.
Bordes, A., Ertekin, S., Weston, J., and Bottou, L. (2005). Fast kernel classifiers with online and active learning. Journal of Machine Learning Research, 6, 15791619.
Boser, B., Guyon, I., and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fth annual workshop on Computational learning theory, pages 144152. ACM.
Bottou, L. and LeCun, Y. (2004). Large scale online learning. In Thrun, S., Saul, L., and Schölkopf, B., editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA. URL http://leon.bottou.org/papers/bottou-lecun-2004.
Dietterich, T. and Bakiri, G. (1995). Solving multiclass learning problems via errorcorrecting output codes. Journal of Articial Intelligence Research, 2(263), 286.
Francois, D., Wertz, V., and Verleysen, M. (2005). About the locality of kernels in highdimensional spaces. In Proceedings of ASMDA 2005, International Symposium on Applied Stochastic Models and Data Analysis, pages 238245. URL http://hdl.handle.net/2078. 1/93830.
Freund, Y. and Schapire, R. (1999). Large margin classification using the perceptron algorithm. Machine learning, 37(3), 277296.
Friedman, J. (1996). Another approach to polychotomous classification. Technical report, Technical report, Stanford University, Department of Statistics.
Gentile, C. (2002). A new approximate maximal margin classification algorithm. J. Mach. Learn. Res., 2, 213242. ISSN 1532-4435. URL http://portal.acm.org/citation.cfm?id= 944790.944811.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2 edition. ISBN 0387927091. URL http: //www-stat.stanford.edu/~tibs/ElemStatLearn/.
Kashima, H., Oyama, S., Yamanishi, Y., and Tsuda, K. (2009). On Pairwise Kernels: An Efficient Alternative and Generalization Analysis. In Proceedings of the 13th Pacic-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD '09, pages 10301037, Berlin, Heidelberg, 2009. Springer-Verlag. ISBN 978-3-642-01306-5. URL http: //dx.doi.org/10.1007/978-3-642-01307-2\_110.
Li, Y. and Long, P. (2002). The relaxed online maximum margin algorithm. Machine Learn- ing, 46(1), 361387.
Mayoraz, E. and Alpaydin, E. (1999). Support vector machines for multi-class classification. Engineering Applications of Bio-Inspired Articial Neural Networks, pages 833842.
Minsky, M. and Seymour, P. (1969). Perceptrons.
Oyama, S. and Manning, C. D. (2004). Using feature conjunctions across examples for learning pairwise classiers. In 15th European Conference on Machine Learning (ECML2004). URL http://ilpubs.stanford.edu:8090/767/.
Platt, J. C. (1999). 12 fast training of support vector machines using sequential minimal optimization.
Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386.
Schölkopf, B. and Smola, A. (2001). Learning with kernels: Support vector machines, regu- larization, optimization, and beyond. MIT press.
Shalev-Shwartz, S., Singer, Y., and Srebro, N. (2007). Pegasos: Primal estimated subgradient solver for svm. In Proceedings of the 24th international conference on Machine learning, pages 807814. ACM.
Vert, J.-P. and Kanehisa, M. (2003). Graph-driven features extraction from microarray data using diusion kernels and kernel cca. Advances in Neural Information Processing System, 5, 14251432.
Vert, J.-P., Qiu, J., and Noble, W. (2007). A new pairwise kernel for biological network inference with support vector machines. BMC Bioinformatics, 8(Suppl 10), S8.
von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S. G., Fields, S., and Bork, P. (2002). Comparative assessment of large-scale data sets of proteinprotein interactions. Nature, 417(6887), 399403.
Weston, J. and Watkins, C. (1999). Support vector machines for multi-class pattern recognition. In Proceedings of the seventh European symposium on articial neural networks, volume 4, pages 219224.
Xu, W. (2011). Towards optimal one pass large scale learning with averaged stochastic gradient descent. arXiv preprint arXiv:1107.2490.
Rifkin, R. and Klautau, A. (2004). In defense of one-vs-all classication. Journal of machine learning research, volume 5, pages 101141.

Year 2017, Volume: 46 Issue: 3, 547 - 557, 01.06.2017

Engin Tas

Abstract

References

Anlauf, J. and Biehl, M. (2007). The adatron: an adaptive perceptron algorithm. EPL (Europhysics Letters), 10(7), 687.
Basilico, J. and Hofmann, T. (2004). Unifying collaborative and content-based ltering. In Proceedings of the twenty-rst international conference on Machine learning, ICML '04, pages 9, New York, NY, USA, 2004. ACM. ISBN 1-58113-838-5. doi: 10.1145/1015330. 1015394. URL http://doi.acm.org/10.1145/1015330.1015394.
Ben-Hur, A. and Noble, W. (2005). Kernel methods for predicting protein-protein interactions. Bioinformatics, 21(suppl 1), i38i46.
Bordes, A., Ertekin, S., Weston, J., and Bottou, L. (2005). Fast kernel classifiers with online and active learning. Journal of Machine Learning Research, 6, 15791619.
Boser, B., Guyon, I., and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fth annual workshop on Computational learning theory, pages 144152. ACM.
Bottou, L. and LeCun, Y. (2004). Large scale online learning. In Thrun, S., Saul, L., and Schölkopf, B., editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA. URL http://leon.bottou.org/papers/bottou-lecun-2004.
Dietterich, T. and Bakiri, G. (1995). Solving multiclass learning problems via errorcorrecting output codes. Journal of Articial Intelligence Research, 2(263), 286.
Francois, D., Wertz, V., and Verleysen, M. (2005). About the locality of kernels in highdimensional spaces. In Proceedings of ASMDA 2005, International Symposium on Applied Stochastic Models and Data Analysis, pages 238245. URL http://hdl.handle.net/2078. 1/93830.
Freund, Y. and Schapire, R. (1999). Large margin classification using the perceptron algorithm. Machine learning, 37(3), 277296.
Friedman, J. (1996). Another approach to polychotomous classification. Technical report, Technical report, Stanford University, Department of Statistics.
Gentile, C. (2002). A new approximate maximal margin classification algorithm. J. Mach. Learn. Res., 2, 213242. ISSN 1532-4435. URL http://portal.acm.org/citation.cfm?id= 944790.944811.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2 edition. ISBN 0387927091. URL http: //www-stat.stanford.edu/~tibs/ElemStatLearn/.
Kashima, H., Oyama, S., Yamanishi, Y., and Tsuda, K. (2009). On Pairwise Kernels: An Efficient Alternative and Generalization Analysis. In Proceedings of the 13th Pacic-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD '09, pages 10301037, Berlin, Heidelberg, 2009. Springer-Verlag. ISBN 978-3-642-01306-5. URL http: //dx.doi.org/10.1007/978-3-642-01307-2\_110.
Li, Y. and Long, P. (2002). The relaxed online maximum margin algorithm. Machine Learn- ing, 46(1), 361387.
Mayoraz, E. and Alpaydin, E. (1999). Support vector machines for multi-class classification. Engineering Applications of Bio-Inspired Articial Neural Networks, pages 833842.
Minsky, M. and Seymour, P. (1969). Perceptrons.
Oyama, S. and Manning, C. D. (2004). Using feature conjunctions across examples for learning pairwise classiers. In 15th European Conference on Machine Learning (ECML2004). URL http://ilpubs.stanford.edu:8090/767/.
Platt, J. C. (1999). 12 fast training of support vector machines using sequential minimal optimization.
Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386.
Schölkopf, B. and Smola, A. (2001). Learning with kernels: Support vector machines, regu- larization, optimization, and beyond. MIT press.
Shalev-Shwartz, S., Singer, Y., and Srebro, N. (2007). Pegasos: Primal estimated subgradient solver for svm. In Proceedings of the 24th international conference on Machine learning, pages 807814. ACM.
Vert, J.-P. and Kanehisa, M. (2003). Graph-driven features extraction from microarray data using diusion kernels and kernel cca. Advances in Neural Information Processing System, 5, 14251432.
Vert, J.-P., Qiu, J., and Noble, W. (2007). A new pairwise kernel for biological network inference with support vector machines. BMC Bioinformatics, 8(Suppl 10), S8.
von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S. G., Fields, S., and Bork, P. (2002). Comparative assessment of large-scale data sets of proteinprotein interactions. Nature, 417(6887), 399403.
Weston, J. and Watkins, C. (1999). Support vector machines for multi-class pattern recognition. In Proceedings of the seventh European symposium on articial neural networks, volume 4, pages 219224.
Xu, W. (2011). Towards optimal one pass large scale learning with averaged stochastic gradient descent. arXiv preprint arXiv:1107.2490.
Rifkin, R. and Klautau, A. (2004). In defense of one-vs-all classication. Journal of machine learning research, volume 5, pages 101141.

There are 27 citations in total.

Details

Primary Language	English
Subjects	Mathematical Sciences
Journal Section	Statistics
Authors	Engin Tas
Publication Date	June 1, 2017
Published in Issue	Year 2017 Volume: 46 Issue: 3

Cite

APA	Tas, E. (2017). A single pairwise model for classication using online learning with kernels. Hacettepe Journal of Mathematics and Statistics, 46(3), 547-557.
AMA	Tas E. A single pairwise model for classication using online learning with kernels. Hacettepe Journal of Mathematics and Statistics. June 2017;46(3):547-557.
Chicago	Tas, Engin. “A Single Pairwise Model for classication Using Online Learning With Kernels”. Hacettepe Journal of Mathematics and Statistics 46, no. 3 (June 2017): 547-57.
EndNote	Tas E (June 1, 2017) A single pairwise model for classication using online learning with kernels. Hacettepe Journal of Mathematics and Statistics 46 3 547–557.
IEEE	E. Tas, “A single pairwise model for classication using online learning with kernels”, Hacettepe Journal of Mathematics and Statistics, vol. 46, no. 3, pp. 547–557, 2017.
ISNAD	Tas, Engin. “A Single Pairwise Model for classication Using Online Learning With Kernels”. Hacettepe Journal of Mathematics and Statistics 46/3 (June 2017), 547-557.
JAMA	Tas E. A single pairwise model for classication using online learning with kernels. Hacettepe Journal of Mathematics and Statistics. 2017;46:547–557.
MLA	Tas, Engin. “A Single Pairwise Model for classication Using Online Learning With Kernels”. Hacettepe Journal of Mathematics and Statistics, vol. 46, no. 3, 2017, pp. 547-5.
Vancouver	Tas E. A single pairwise model for classication using online learning with kernels. Hacettepe Journal of Mathematics and Statistics. 2017;46(3):547-5.

Article Files

Full Text

https://dergipark.org.tr/en/pub/hujms

A single pairwise model for classication using online learning with kernels

Abstract

Keywords

References

Abstract

References

Details

Cite

A single pairwise model for classication using online learning with kernels