Research Article
BibTex RIS Cite

Değişken Seçiminde Yeni Bir Yaklaşım Olarak Artık-Değer Modellemesi

Year 2024, Issue: 10, 86 - 95
https://doi.org/10.52693/jsas.1525029

Abstract

Araştırma verilerinin derinliği ve genişliği arttıkça, istatistiksel model oluşturmada değişken seçimi hala üstesinden gelinmesi gereken zorluklara sahiptir. Bu zorluğu azaltmaya yardımcı olmak için, değişken seçiminde artık modelleme adı verilen ve tahmin edicilerin sayısından bağımsız olarak uygulanabilen yeni bir yaklaşım sunuyoruz. İleri, geri ve aşamalı değişken seçimi yaklaşımlarının istatistiksel gücünü ve tip-1 hata tutma oranını, bilinen tahmin edicileri kontrol eden önerilen modelleme stratejisiyle karşılaştırıyoruz. Artık Modellemede, her bir öngörücü modele tek bir öngörücü olarak girer ve bunun sonucunda ortaya çıkan artıklar bir sonraki öngörücü için bağımlı değişken olur ve bu böyle devam eder. Bu modelleri, farklı örneklem büyüklükleri ve anlamlı ve anlamsız tahmin edicilerin çeşitli kombinasyonları ile farklı senaryolar altında karşılaştırıyoruz. Literatürden bilinen tahmin ediciler mevcut olduğunda, bu bilinen tahmin edicileri kontrol eden yeni anlamlı tahmin edicilerin belirlenmesinde, Artık Modelleme, özellikle tahmin edici sayısı arttıkça, kullanılan diğer değişken seçim yöntemlerine kıyasla daha yüksek istatistiksel güç göstermektedir. Ayrıca, parametre tahmininde daha az yanlılık ve daha düşük standart hatalara sahiptir. Artık Modelleme için Tip-1 hata nominal seviyesinde kalırken, ileri, geri ve aşamalı değişken seçimi yaklaşımları Tip-1 Hata oranlarını biraz düşürmüştür. Bilinen önemli tahmin edicilerin varlığında çoklu tahmin edicilerle uğraşırken, Artık Modelleme istatistiksel güç kaybına veya Tip-1 Hata Oranının artmasına neden olmadan pratik bir çözüm sunmaktadır.

References

  • [1] W. Sauerbrei, A. Perperoglou, M. Schmid, M. Abrahamowicz, H. Becher, H. Binder, D. Dunkler, F. E. Harrell, P. Royston, and G. Heinze, "State of the art in selection of variables and functional forms in multivariable analysis—outstanding issues," Diagnostic and Prognostic Research, vol. 4, pp. 1–8, Dec. 2020. doi: 10.1186/s41512-020-00074-3.
  • [2] M. Z. Chowdhury and T. C. Turin, "Variable selection strategies and its importance in clinical prediction modelling," Family Medicine and Community Health, vol. 8, no. 1, 2020. doi: 10.1136/fmch-2019-000262.
  • [3] G. Claeskens, "Statistical model choice," Annual Review of Statistics and Its Application, vol. 3, pp. 233–256, Jun. 2016. doi: 10.1146/annurev-statistics-041715-033659.
  • [4] G. Heinze, C. Wallisch, and D. Dunkler, "Variable selection–a review and recommendations for the practicing statistician," Biometrical Journal, vol. 60, no. 3, pp. 431–449, May 2018. doi: 10.1002/bimj.201700067.
  • [5] Y. Wang, Q. Chen, N. Zhang, and Y. Wang, "Conditional residual modeling for probabilistic load forecasting," IEEE Transactions on Power Systems, vol. 33, no. 6, pp. 7327–7330, Aug. 2018. doi: 10.1109/TPWRS.2018.2819668.
  • [6] P. S. de Mattos Neto, G. D. Cavalcanti, D. S. O. Santos Júnior, and E. G. Silva, "Hybrid systems using residual modeling for sea surface temperature forecasting," Scientific Reports, vol. 12, no. 1, p. 487, Jan. 2022. doi: 10.1038/s41598-021-04342-8.
  • [7] M. Goodwin, "Residual modeling in music analysis-synthesis," in 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, May 1996, vol. 2, pp. 1005–1008. IEEE. doi: 10.1109/ICASSP.1996.543310.
  • [8] X. Weng, Y. Li, L. Chi, and Y. Mu, "High-capacity convolutional video steganography with temporal residual modeling," in Proceedings of the 2019 on International Conference on Multimedia Retrieval, Jun. 2019, pp. 87–95. doi: 10.1145/3323873.3325047.
  • [9] R. Tibshirani, "Regression shrinkage and selection via the Lasso," Journal of the Royal Statistical Society: Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996. doi: 10.1111/j.2517-6161.1996.tb02080.x.
  • [10] A. E. Hoerl and R. W. Kennard, "Ridge regression: biased estimation for nonorthogonal problems," Technometrics, vol. 12, no. 1, pp. 55–67, Feb. 1970. doi: 10.1080/00401706.1970.10488634.
  • [11] H. Zou and T. Hastie, "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 2, pp. 301–320, 2005. doi: 10.1111/j.1467-9868.2005.00503.x.
  • [12] J. Friedman, T. Hastie, and R. Tibshirani, "Regularization paths for generalized linear models via coordinate descent," Journal of Statistical Software, vol. 33, no. 1, pp. 1–22, 2010. doi: 10.18637/jss.v033.i01.
  • [13] G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning with Applications in R. New York: Springer, 2013, ch. 6. doi: 10.1007/978-1-4614-7138-7.

Residual Modelling as a New Approach for Variable Selection

Year 2024, Issue: 10, 86 - 95
https://doi.org/10.52693/jsas.1525029

Abstract

Variable selection in statistical model building still has challenges to overcome as the depth and breadth of the research data is expanding. To help reduce this challenge, we introduce a new approach in variable selection, called residual modeling, which can be applicable regardless of the number of predictors. We compare the statistical power and type-1 error retainment of the forward, backward, and stepwise variable selection approaches with the proposed modeling strategy controlling for known predictors. In Residual Modeling, each predictor enters the model as a single predictor, whose resulting residuals become the dependent variable for the next predictor, and so on. We compare these models under different scenarios with varying sample sizes and various combinations of significant and insignificant predictors. When there exist known predictors from the literature, in identifying new significant predictors controlling for these known predictors, Residual Modelling shows higher statistical power especially as the number of predictors increases compared to the other variable selection methods used. It also has reduced bias in parameter estimation and reduced standard errors. The Type-1 error was retained at its nominal level for Residual Modelling while forward, backward, and stepwise variable selection approaches had slightly reduced Type-1 Error rates. When dealing with multiple predictors in the presence of known significant predictors, Residual Modelling offers a practical solution without causing loss of statistical power or increased Type-1 Error Rate.

Ethical Statement

Our research protocol was approved by Istanbul Medipol University Ethics Committee (Application number: 10840098-604.01.01-E.53819)

Supporting Institution

TUBİTAK-BİDEB-2232 International Fellowship for Outstanding Researchers (Award No: 118C306)

Thanks

This study was partially funded by TUBITAK Directorate of Science Fellowships and Grant Programmes (BİDEB)-2232 International Fellowship for Outstanding Researchers. We also thank the Turkish Republic Ministry of Commerce and Turkish Statistical Institute for data sharing. The opinions raised in this article solely belong to its authors, and does not represent the position of TUBITAK, Turkish Republic Ministry of Commerce and Turkish Statistical Institute in any shape or form.

References

  • [1] W. Sauerbrei, A. Perperoglou, M. Schmid, M. Abrahamowicz, H. Becher, H. Binder, D. Dunkler, F. E. Harrell, P. Royston, and G. Heinze, "State of the art in selection of variables and functional forms in multivariable analysis—outstanding issues," Diagnostic and Prognostic Research, vol. 4, pp. 1–8, Dec. 2020. doi: 10.1186/s41512-020-00074-3.
  • [2] M. Z. Chowdhury and T. C. Turin, "Variable selection strategies and its importance in clinical prediction modelling," Family Medicine and Community Health, vol. 8, no. 1, 2020. doi: 10.1136/fmch-2019-000262.
  • [3] G. Claeskens, "Statistical model choice," Annual Review of Statistics and Its Application, vol. 3, pp. 233–256, Jun. 2016. doi: 10.1146/annurev-statistics-041715-033659.
  • [4] G. Heinze, C. Wallisch, and D. Dunkler, "Variable selection–a review and recommendations for the practicing statistician," Biometrical Journal, vol. 60, no. 3, pp. 431–449, May 2018. doi: 10.1002/bimj.201700067.
  • [5] Y. Wang, Q. Chen, N. Zhang, and Y. Wang, "Conditional residual modeling for probabilistic load forecasting," IEEE Transactions on Power Systems, vol. 33, no. 6, pp. 7327–7330, Aug. 2018. doi: 10.1109/TPWRS.2018.2819668.
  • [6] P. S. de Mattos Neto, G. D. Cavalcanti, D. S. O. Santos Júnior, and E. G. Silva, "Hybrid systems using residual modeling for sea surface temperature forecasting," Scientific Reports, vol. 12, no. 1, p. 487, Jan. 2022. doi: 10.1038/s41598-021-04342-8.
  • [7] M. Goodwin, "Residual modeling in music analysis-synthesis," in 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, May 1996, vol. 2, pp. 1005–1008. IEEE. doi: 10.1109/ICASSP.1996.543310.
  • [8] X. Weng, Y. Li, L. Chi, and Y. Mu, "High-capacity convolutional video steganography with temporal residual modeling," in Proceedings of the 2019 on International Conference on Multimedia Retrieval, Jun. 2019, pp. 87–95. doi: 10.1145/3323873.3325047.
  • [9] R. Tibshirani, "Regression shrinkage and selection via the Lasso," Journal of the Royal Statistical Society: Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996. doi: 10.1111/j.2517-6161.1996.tb02080.x.
  • [10] A. E. Hoerl and R. W. Kennard, "Ridge regression: biased estimation for nonorthogonal problems," Technometrics, vol. 12, no. 1, pp. 55–67, Feb. 1970. doi: 10.1080/00401706.1970.10488634.
  • [11] H. Zou and T. Hastie, "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 2, pp. 301–320, 2005. doi: 10.1111/j.1467-9868.2005.00503.x.
  • [12] J. Friedman, T. Hastie, and R. Tibshirani, "Regularization paths for generalized linear models via coordinate descent," Journal of Statistical Software, vol. 33, no. 1, pp. 1–22, 2010. doi: 10.18637/jss.v033.i01.
  • [13] G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning with Applications in R. New York: Springer, 2013, ch. 6. doi: 10.1007/978-1-4614-7138-7.
There are 13 citations in total.

Details

Primary Language English
Subjects Biostatistics
Journal Section Research Articles
Authors

Aslı Nurefşan Koçak 0009-0000-5367-7443

Muhammet Furkan Daşdelen 0000-0003-2251-2093

Mehmet Koçak 0000-0002-3386-1734

Early Pub Date December 24, 2024
Publication Date
Submission Date July 30, 2024
Acceptance Date November 24, 2024
Published in Issue Year 2024 Issue: 10

Cite

IEEE A. N. Koçak, M. F. Daşdelen, and M. Koçak, “Residual Modelling as a New Approach for Variable Selection”, JSAS, no. 10, pp. 86–95, December 2024, doi: 10.52693/jsas.1525029.