LASSO Estimator in Logistic Regression for Small Data Sets

Aslı Yaman; Mehmet Ali Cengiz

Research Article

LASSO Estimator in Logistic Regression for Small Data Sets

Year 2021, Volume: 4 Issue: 1, 69 - 72, 15.01.2021

Abstract

Variable selection is an important subject in regression analysis. In regression analysis, the LASSO (Least Absolute Shrinkage and Selection Operator) provides sparse solutions to lead to variable selection. LASSO is a useful tool to achieve the shrinkage and variable selection simultaneously and the LASSO penalty term can shrink the parameter estimates toward exactly to zero. It is used generally in large data sets but in this article, we consider the variable selection problem for the multivariate Bernoulli logistic models adopting some information criteria especially in small data sets. Results of simulation were compared according to the four different criteria used for model selection.

Keywords

LASSO , Bernoulli distribution , Logistic regression , Feature selection

Project Number

PYO. SCIENCE. 1904.17.002

References

Tibshirani R. “Regression shrinkage and selection via the lasso”. Journal of the Royal Statistical Society. Series B (Methodological), 267-288, 1996.
Tibshirani R. “Regression shrinkage and selection via the lasso: a retrospective”. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73 (3), 273-282, 2011.
Donoho DL, Johnstone JM. “Ideal spatial adaptation by wavelet shrinkage”. Biometrika, 81 (3), 425-455, 1994.
Wu TT, Lange K. “Coordinate descent algorithms for lasso penalized regression”. The Annals of Applied Statistics, 224-244, 2008.
Efron B, Hastie T, Johnstone I, Tibshirani, R. “Least angle regression". The Annals of statistics, 32 (2), 407-499, 2004.
Friedman J, Hastie T, Höfling H, Tibshirani R. “Pathwise coordinate optimization”. The Annals of Applied Statistics, 1 (2), 302-332, 2007.
Dai B. MVB: Multivariate Bernoulli log-linear model. R package version, 1, 2013.
Dai B. Multivariate Bernoulli distribution models. Technical Report, Department of Statistics, University of Wisconsin, Madison, WI 53706, 2012.
Akaike H. Information theory and an extension of the maximum likelihood principle. Proc. 2nd Inter. Symposium on Information Theory, 267- 281, Budapest, 1973.
Schwarz G. “Estimating the dimension of a model”. The Annals of Statistics, 6 (2), 461- 464, 1978.
Xiang D, Wahba G. “A generalized approximate cross validation for smoothing splines with non-Gaussian data”. Statistica Sinica, 675-692, 1996.

Lojistik Regresyon Modellerinde Küçük Veri Setleri İçin LASSO Tahmincisi

Year 2021, Volume: 4 Issue: 1, 69 - 72, 15.01.2021

Aslı Yaman , Mehmet Ali Cengiz

Abstract

Değişken seçimi, regresyon analizinde kullanılan önemli bir konudur. Regresyon analizinde, LASSO (En Küçük Mutlak Daralma ve Seçim Operatörü) değişken seçimine benzer olarak seyrek çözümler sunmaktadır. LASSO, daraltma ve değişken seçimi işlemlerini aynı anda yapabilen kullanışlı bir araçtır ve LASSO ceza kriteri, parametre tahminlerini tam olarak sıfır değerine indirebilir. Genellikle büyük veri kümelerinde kullanılır fakat bu çalışmada, özellikle küçük veri setlerinde bazı bilgi kriterlerini kullanarak çok değişkenli Bernoulli lojistik modelleri için değişken seçim problemi ele alınmıştır. Model seçiminde kullanılan dört farklı bilgi kritere göre elde edilen simülasyon sonuçları karşılaştırılmıştır.

Keywords

LASSO , Bernoulli dağılımı , Lojistik regresyon , Değişken seçimi

Supporting Institution

Ondokuz Mayıs Üniversitesi

Project Number

PYO. SCIENCE. 1904.17.002

References

Tibshirani R. “Regression shrinkage and selection via the lasso”. Journal of the Royal Statistical Society. Series B (Methodological), 267-288, 1996.
Tibshirani R. “Regression shrinkage and selection via the lasso: a retrospective”. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73 (3), 273-282, 2011.
Donoho DL, Johnstone JM. “Ideal spatial adaptation by wavelet shrinkage”. Biometrika, 81 (3), 425-455, 1994.
Wu TT, Lange K. “Coordinate descent algorithms for lasso penalized regression”. The Annals of Applied Statistics, 224-244, 2008.
Efron B, Hastie T, Johnstone I, Tibshirani, R. “Least angle regression". The Annals of statistics, 32 (2), 407-499, 2004.
Friedman J, Hastie T, Höfling H, Tibshirani R. “Pathwise coordinate optimization”. The Annals of Applied Statistics, 1 (2), 302-332, 2007.
Dai B. MVB: Multivariate Bernoulli log-linear model. R package version, 1, 2013.
Dai B. Multivariate Bernoulli distribution models. Technical Report, Department of Statistics, University of Wisconsin, Madison, WI 53706, 2012.
Akaike H. Information theory and an extension of the maximum likelihood principle. Proc. 2nd Inter. Symposium on Information Theory, 267- 281, Budapest, 1973.
Schwarz G. “Estimating the dimension of a model”. The Annals of Statistics, 6 (2), 461- 464, 1978.
Xiang D, Wahba G. “A generalized approximate cross validation for smoothing splines with non-Gaussian data”. Statistica Sinica, 675-692, 1996.

There are 11 citations in total.

Details

Primary Language	English
Journal Section	Articles
Authors	Aslı Yaman 0000-0003-2886-6765 Mehmet Ali Cengiz 0000-0002-1271-2588
Project Number	PYO. SCIENCE. 1904.17.002
Publication Date	January 15, 2021
Published in Issue	Year 2021 Volume: 4 Issue: 1

Cite

APA	Yaman, A., & Cengiz, M. A. (2021). LASSO Estimator in Logistic Regression for Small Data Sets. Veri Bilimi, 4(1), 69-72.