Visual research on the trustability of classical variable selection methods in Cox regression

Nihal Ata Tutkun; Yasemin Kayhan Atılgan

doi:10.15672/hujms.630402

Research Article

Year 2020, Volume: 49 Issue: 2, 869 - 886, 02.04.2020

Nihal Ata Tutkun , Yasemin Kayhan Atılgan

https://doi.org/10.15672/hujms.630402

Cited By: 1

Abstract

References

[1] H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control AC 19, 716-723, 1974.
[2] N. Ata and M.T. Sozer, Cox regression models with nonproportional hazards applied to lung cancer survival data, Hacet. J. Math. Stat. 36 (2), 157-167, 2007.
[3] Y.K. Atilgan, Robust coplot analysis, Comm. Statist. Simulation Comput. 45 (5), 1763-1775, 2016.
[4] Y.K. Atilgan and E.L. Atilgan, RobCoP: A Matlab Package for Robust CoPlot Analysis, Open Journal of Statistics 7, 23-35, 2017.
[5] T. Bednarski, On sensitivity of Coxs estimator, Statistics and Decisions 7, 215-228, 1989.
[6] D.M. Bravata, K.G. Shojania, I. Oklin and A. Raveh A tool for visualizing multivariate data in medicine, Stat. Med. 27 (12), 2234-2247, 2007.
[7] D. Collett,Modeling Survival Data in Medical Research, 2nd Ed. New York: Chapman @ Hall/ CRS A CRC Press Company, 2003.
[8] D.R.Cox, Regression Models and Life Tables, J. R. Stat. Soc. Ser. B. Stat. Methodol. 34 (2), 187-220, 1972.
[9] S. Derksen and H.J. Keselman, Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables, Brit. J. Math. Stat. Psy. 45 (2), 265-282, 1992.
[10] J. Fan and R. Li, Variable selection for Cox’s proportional hazards model and frailty model, Ann. Statist. 3, 74-99, 2002.
[11] D. Faraggi and R. Simon, Bayesian variable selection method for censored survival data, Biometrics 54, 1475-1485, 1998.
[12] P.A. Forero and G.B. Giannakis, Robust multi-dimensional scaling via outlier sparsity control, Robust multi-dimensional scaling via outlier sparsity control, 1183-1187, 2011.
[13] Jr F. Harrell and K.L. Lee, Regression Modelling Strategies for Improved Prognostic Prediction, Stat. Med. 3, 143-152, 1984.
[14] G. Heinze, C. Wallisch and D. Dunkler, Variable selection - A review and recommendations for the practicing statistician, Biom J. 60 (3), 431-449, 2018.
[15] M.H. Katz,Multivariable Analysis: A Practical Guide for Clinicians and Public Health Researchers, Third Edition, Cambridge University Press, New York, 2011.
[16] J.M. Krall, V.A. Uthoff and J.B. Harley, A step-up procedure for selecting variables associated with survival, Biometrics 31, 49-57, 1975.
[17] H. Liang, and G. Zou, Improved AIC selection strategy for survival analysis, Comput. Statist. Data Anal. 52 (5), 2538-2548, 2008.
[18] A. Nardi and M. Schemper, New residuals for Cox regression and their application to outlierscreening, Biometrics 55, 523-529, 1999.
[19] C.L. Mallows Nardi and M. Schemper, Some comments on Cp, Technometrics 15, 661-675, 1973.
[20] N. Mantel, Why stepdown procedures in variable selection, Technometrics 12, 621-625, 1970.
[21] P.J. Rousseeuw and A.M. Leroy, Robust Regression and Outlier Detection, New York: Wiley Interscience, 1987.
[22] K.L. Sainani, Multivariate regression: The pitfalls of automated variable selection, Am. J. Phys. Med. Rehabil. 5, 791-794, 2013.
[23] G. Schwarz, Estimating the dimension of a model, Ann. Statist. 6, 461-464, 1978.
[24] G. Shevlyakov and P. Smirnov, Robust estimation of the correlation coefficient: an attempt of survey, Austrian J. Stat. 40, 147-156, 2011.
[25] R. Tibshirani, The lasso method for variable selection in the Cox model, Stat. Med. 16, 385-395, 1997.
[26] C.T. Volinsky and A.E. Raftery, Bayesian information criterion for censored survival models, Biometrics 56, 256-262, 2000.

Visual research on the trustability of classical variable selection methods in Cox regression

Year 2020, Volume: 49 Issue: 2, 869 - 886, 02.04.2020

Nihal Ata Tutkun , Yasemin Kayhan Atılgan

https://doi.org/10.15672/hujms.630402

Cited By: 1

Abstract

Multivariate models such as the Cox regression model, if developed carefully, are powerful tools for making prognostic prediction which are frequently used in studies of clinical outcomes. Many applications require a large number of variables to be modelled by using a relatively small patient sample. Determination of the important variables in a model is critical to understand the behaviour of phenomena as the independent variables contribute the most to the outcome. From a practical perspective, a small subset of independent variables are usually selected from a large data set without the loss of any predictive efficiency. Automatic variable selection algorithms in scientific studies are commonly used for obtaining interpretable and practically applicable models. However, the careless use of these methods may lead to statistical problems. The performance of the generated models may be poor due to the violation of assumption, omission of the important variables, problems of overfitting, and the problem of multicollinearity and outliers. In order to enhance the accuracy of a model, it is essential to explore the data and its main characteristics before making any statistical inference. This study suggests an approach for acquiring a trustworthy model selection procedure for survival data by performing classical variables selection methods, accompanied by a graphical visualization method, namely robust coplot. Thus, it enables us to investigate the discrimination of observations, clusters of the variables and clusters of the observations that are highly characterized by a particular variable in a one graph. We present an application of combined method, as an integral part of statistical modelling, on survival data on multiple myeloma to show how coplot results are used in automatic variable selection algorithm in Cox regression model-building.

Keywords

Cox regression model , graphical visualization , multidimensional scaling , robust coplot , variable selection

References

[1] H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control AC 19, 716-723, 1974.
[2] N. Ata and M.T. Sozer, Cox regression models with nonproportional hazards applied to lung cancer survival data, Hacet. J. Math. Stat. 36 (2), 157-167, 2007.
[3] Y.K. Atilgan, Robust coplot analysis, Comm. Statist. Simulation Comput. 45 (5), 1763-1775, 2016.
[4] Y.K. Atilgan and E.L. Atilgan, RobCoP: A Matlab Package for Robust CoPlot Analysis, Open Journal of Statistics 7, 23-35, 2017.
[5] T. Bednarski, On sensitivity of Coxs estimator, Statistics and Decisions 7, 215-228, 1989.
[6] D.M. Bravata, K.G. Shojania, I. Oklin and A. Raveh A tool for visualizing multivariate data in medicine, Stat. Med. 27 (12), 2234-2247, 2007.
[7] D. Collett,Modeling Survival Data in Medical Research, 2nd Ed. New York: Chapman @ Hall/ CRS A CRC Press Company, 2003.
[8] D.R.Cox, Regression Models and Life Tables, J. R. Stat. Soc. Ser. B. Stat. Methodol. 34 (2), 187-220, 1972.
[9] S. Derksen and H.J. Keselman, Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables, Brit. J. Math. Stat. Psy. 45 (2), 265-282, 1992.
[10] J. Fan and R. Li, Variable selection for Cox’s proportional hazards model and frailty model, Ann. Statist. 3, 74-99, 2002.
[11] D. Faraggi and R. Simon, Bayesian variable selection method for censored survival data, Biometrics 54, 1475-1485, 1998.
[12] P.A. Forero and G.B. Giannakis, Robust multi-dimensional scaling via outlier sparsity control, Robust multi-dimensional scaling via outlier sparsity control, 1183-1187, 2011.
[13] Jr F. Harrell and K.L. Lee, Regression Modelling Strategies for Improved Prognostic Prediction, Stat. Med. 3, 143-152, 1984.
[14] G. Heinze, C. Wallisch and D. Dunkler, Variable selection - A review and recommendations for the practicing statistician, Biom J. 60 (3), 431-449, 2018.
[15] M.H. Katz,Multivariable Analysis: A Practical Guide for Clinicians and Public Health Researchers, Third Edition, Cambridge University Press, New York, 2011.
[16] J.M. Krall, V.A. Uthoff and J.B. Harley, A step-up procedure for selecting variables associated with survival, Biometrics 31, 49-57, 1975.
[17] H. Liang, and G. Zou, Improved AIC selection strategy for survival analysis, Comput. Statist. Data Anal. 52 (5), 2538-2548, 2008.
[18] A. Nardi and M. Schemper, New residuals for Cox regression and their application to outlierscreening, Biometrics 55, 523-529, 1999.
[19] C.L. Mallows Nardi and M. Schemper, Some comments on Cp, Technometrics 15, 661-675, 1973.
[20] N. Mantel, Why stepdown procedures in variable selection, Technometrics 12, 621-625, 1970.
[21] P.J. Rousseeuw and A.M. Leroy, Robust Regression and Outlier Detection, New York: Wiley Interscience, 1987.
[22] K.L. Sainani, Multivariate regression: The pitfalls of automated variable selection, Am. J. Phys. Med. Rehabil. 5, 791-794, 2013.
[23] G. Schwarz, Estimating the dimension of a model, Ann. Statist. 6, 461-464, 1978.
[24] G. Shevlyakov and P. Smirnov, Robust estimation of the correlation coefficient: an attempt of survey, Austrian J. Stat. 40, 147-156, 2011.
[25] R. Tibshirani, The lasso method for variable selection in the Cox model, Stat. Med. 16, 385-395, 1997.
[26] C.T. Volinsky and A.E. Raftery, Bayesian information criterion for censored survival models, Biometrics 56, 256-262, 2000.

There are 26 citations in total.

Details

Primary Language	English
Subjects	Statistics
Journal Section	Statistics
Authors	Nihal Ata Tutkun 0000-0001-5204-680X Yasemin Kayhan Atılgan 0000-0002-2612-7216
Publication Date	April 2, 2020
Published in Issue	Year 2020 Volume: 49 Issue: 2

Cite

APA	Ata Tutkun, N., & Kayhan Atılgan, Y. (2020). Visual research on the trustability of classical variable selection methods in Cox regression. Hacettepe Journal of Mathematics and Statistics, 49(2), 869-886. https://doi.org/10.15672/hujms.630402
AMA	Ata Tutkun N, Kayhan Atılgan Y. Visual research on the trustability of classical variable selection methods in Cox regression. Hacettepe Journal of Mathematics and Statistics. April 2020;49(2):869-886. doi:10.15672/hujms.630402
Chicago	Ata Tutkun, Nihal, and Yasemin Kayhan Atılgan. “Visual Research on the Trustability of Classical Variable Selection Methods in Cox Regression”. Hacettepe Journal of Mathematics and Statistics 49, no. 2 (April 2020): 869-86. https://doi.org/10.15672/hujms.630402.
EndNote	Ata Tutkun N, Kayhan Atılgan Y (April 1, 2020) Visual research on the trustability of classical variable selection methods in Cox regression. Hacettepe Journal of Mathematics and Statistics 49 2 869–886.
IEEE	N. Ata Tutkun and Y. Kayhan Atılgan, “Visual research on the trustability of classical variable selection methods in Cox regression”, Hacettepe Journal of Mathematics and Statistics, vol. 49, no. 2, pp. 869–886, 2020, doi: 10.15672/hujms.630402.
ISNAD	Ata Tutkun, Nihal - Kayhan Atılgan, Yasemin. “Visual Research on the Trustability of Classical Variable Selection Methods in Cox Regression”. Hacettepe Journal of Mathematics and Statistics 49/2 (April2020), 869-886. https://doi.org/10.15672/hujms.630402.
JAMA	Ata Tutkun N, Kayhan Atılgan Y. Visual research on the trustability of classical variable selection methods in Cox regression. Hacettepe Journal of Mathematics and Statistics. 2020;49:869–886.
MLA	Ata Tutkun, Nihal and Yasemin Kayhan Atılgan. “Visual Research on the Trustability of Classical Variable Selection Methods in Cox Regression”. Hacettepe Journal of Mathematics and Statistics, vol. 49, no. 2, 2020, pp. 869-86, doi:10.15672/hujms.630402.
Vancouver	Ata Tutkun N, Kayhan Atılgan Y. Visual research on the trustability of classical variable selection methods in Cox regression. Hacettepe Journal of Mathematics and Statistics. 2020;49(2):869-86.

Cited By

Selection of Temporal Lags for Predicting Riverflow Series from Hydroelectric Plants Using Variable Selection Methods

Energies

https://doi.org/10.3390/en13164236

Article Files

Full Text

For more information about the journal, please visit: https://dergipark.org.tr/en/pub/hujms