The goal of this paper is to demonstrate via extensive simulation that implicit robustness can substantially outperform explicit robust inthe pattern recognition of contaminated high dimension low sample size data.Our work specially demonstrates via extensive computational simulations and applications to real life data, that random subspace ensemble learning machines, although not explicitly structurally designed as a robustness-inducing supervised learning paradigms, outperforms the structurally robustness-seekingclassiers on high dimension low sample size datasets. Random forest (RF),which is arguably the most commonly used random subspace ensemble learning method, is compared to various robust extensions/adaptations of the discriminant analysis classier, and our work reveals that RF, although not inherently designed to be robust to outliers, substantially outperforms the existing techniques specically designed to achieve robustness. Specically, by exploring different scenarios of the sample size n and the input space dimensionality palong with the corresponding capacity κ = n/p with κ < 1, we demonstratethrough extensive simulations that regardless of the contamination rate ϵ, RF predictively outperforms the explicitly robustness-inducing classication techniques when the intrinsic dimensionality of the data is large
Primary Language | English |
---|---|
Journal Section | Research Articles |
Authors | |
Publication Date | August 1, 2017 |
Published in Issue | Year 2017 Volume: 66 Issue: 2 |
Communications Faculty of Sciences University of Ankara Series A1 Mathematics and Statistics.
This work is licensed under a Creative Commons Attribution 4.0 International License.