A New Hybrid Regression Model for Undersized Sample Problem
Abstract
In traditional statistics, it is assumed that the
number of samples which are available for study is more than number of well
selected variables. Nowadays, in many fields, while the number of samples
expressed in tens or hundreds, the single observation may have thousands even
millions dimensions. The classical statistical techniques are not designed to
be able to cope with this kind of data sets. Many of multivariate statistical
techniques such as principal component analysis, factor analysis, classifiation
and cluster analysis and the prediction of regression coefficients need
estimation of the sample variance-covariance matrix or its inverse. When the
number of observations is much smaller than the number of features (or
variables), the usual sample covariance matrix degenerates and it can not be
inverted. This is one of the biggest encountered obstacle to the classical
statistical methods. To remedy the manifestation of the singular covariance
matrices in high dimensional data, Hybrid Covariance Estimators (HCE) has been
developed by Pamukcu et al.(2015). HCE has overcome the singularity problem of
the covariance matrix and, thus, the multivariate statistical analysis for high
dimensional data sets has been made possible. One of the most important process
in statistical analysis using HCE is to select the appropriate covariance
structure for the data set since HCE can in fact be obtained with many
different covariance structures. It can be selected by using the information
criteria such as Akaike Information Criteria, Information Complexity Criteria
which are well known as model selection criteria. In this study, we introduce a new regression
model with HCE and information criteria for n<<p undersized high
dimensional data. We demonstrate our approach on simulation studies with
different scenarious for p/n ratios. We use AIC,CAIC and ICOMP criteria to
select appropriate HCE structure and compare the results with classical
regression analysis.
Keywords
References
- 1. Donoho, D.L.; High dimensional data analysis: The curses and blessings of dimensionality. statweb.stanford.edu/~donoho/Lectures/AMS2000/Curses.pdf. 2000
- 2. Cunningham, P.; Dimension Reduction. Technical Re-port.UCD-CSI-2007-7. University College Dublin. 2007
- 3. Fiebig, D.G.; On the maximum entropy approach to undersized samples. Applied Mathematics and Computation. 1984; 14, 301-312
- 4. Stein, C.; Estimation of covariance matrix. Rietz Lecture. 39th Annual Meeting IMS. Atlanta, Georgia. 1975.
- 5. Chen, Y.; Robust shrinkage estimation of high dimensional covariance matrices. IEEE Workshop on Sensor Array and Mul-tichannel Signal Processing (SAM). 2010
- 6. Ledoit, O. ; Wolf, M. A well conditioned estimator for large dimensional covariance matrices. Journal of Multivariate Analysis. 2004; 88, 365-411
- 7. Pamukçu, E.; Bozdogan, H., Çalık, S. A Novel Hybrid Dimen-sion Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification. Computational and Mathematical Methods in Medicine. Volume 2015 (2015), Article ID 370640, 14 pages
- 8. Erbaş, Ü.; Entropi İlkelerinin Boyut İndirgeme Uygulamaları. Doktora tezi. Marmara Üniversitesi Sosyal Bilimler Enstitüsü. İstanbul. 2010
Details
Primary Language
English
Subjects
Engineering
Journal Section
Research Article
Authors
Publication Date
September 30, 2017
Submission Date
September 22, 2017
Acceptance Date
May 29, 2017
Published in Issue
Year 2017 Volume: 13 Number: 3