Determination of Best Variable Set for Count Models by Particle Swarm Optimization
Yıl 2019,
Cilt: 23 Sayı: Özel, 76 - 83, 01.03.2019
Haydar Koç
,
Tuba Koç
,
Emre Dünder
Öz
In most scientific
studies quantitative data are used which take non-negative integer values,
called count data. Count data are also used frequently in the context of
regression analysis, which is one of the most basic analysis methods of
statistical analysis. The regression models in which the dependent variable can
be expressed by integers are defined as count models. In this study, the model
selection in the context of count models was investigated by using classical
selection methods and PSO algorithm. Applications were made on both simulation
and real data. As a result, it has been shown that PSO algorithm can be used as
an alternative method for PSO algorithm selection for count models when the
number of model variables increases and the correlation values between
independent variables increases as compared to classical methods.
Kaynakça
- [1] George, E. I. 2000. The variable selection problem. Journal of the American Statistical Association, 95(2000), 1304-1308.
- [2] Bozdogan, H. 2004. Intelligent statistical data mining with information complexity and genetic algorithms. Statistical data mining and knowledge discovery(2004), 15-16.
- [3] Lee, K. Y., & El-Sharkawi, M. A. 2008. Modern heuristic optimization techniques: theory and applications to power systems. John Wiley & Sons.
- [4] Drezner, Z., Marcoulides, G. A., & Salhi, S. 1999. Tabu search model selection in multiple regression analysis. Communications in Statistics-Simulation and Computation, 28(1999), 346-367.
- [5] Örkcü, H. H. 2013. Subset selection in multiple linear regression models: a hybrid of genetic and simulated annealing algorithms. Applied Mathematics and Computation, 23(2013), 11018-11028.
- [6] Pacheco, j., Casado, S., & Nunez, L. A. 2009. Variable selection method based on Tabu search for logistic regression models. European Journal of Operational Research, 199(2009), 506-511.
- [7] Unler, A., & Murat, A. 2010. A discrete particle swarm optimization method for feature selection in binary classification problems. European Journal of Operational Research, 206(2010), 528-539.
- [8] Sakate, D. M., Kashid, D. N., & Shirke, D. T. 2011. Subset Selection in Poisson Regression. Journal of Statistical Theory and Practice, 5(2011), 207-219.
- [9] McLeod, A. I., & Xu, C. 2010. R-project. org/package= bestglm. http://CRAN (Erişim Tarihi: 19.10.2017)
- [10] Calcagno, V., & Mazancourt, C. 2010. glmulti: an R package for easy automated model selection with (generalized) linear models. Journal of Statistical Software, 34(2010), 1-29.
- [11] Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) (1996), 267-288.
- [12] Zou, H., & Hastie, T. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2005), 301-320.
- [13] Zou, H. 2006. The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(2006), 1418-1429.
- [14] Zhao, W., Zhang, R., Lv, Y., & Liu, J. 2014. Variable selection for varying dispersion beta regression model. Journal of Applied Statistics, 41(2014), 95-108.
- [15] Bayer, F. M., & Cribari-Neto, F. 2014. Bootstrap-based model selection criteria for beta regressions. TEST(2014), 1-20.
- [16] Bozdogan, H. 1987. Model selection and Akaike's information criterion (AIC): The general theory and its analytical extensions. Psychometrika(1987), 345-370.
- [17] Hurvich, C. M., & Tsai, C. L. 1989. Regression and time series model selection in small samples. Biometrika(1989), 297-307.
- [18] Bollen, K. A., Ray, S., Zavisca, J., & Harden, J. J. 2012. A comparison of Bayes factor approximation methods including two new methods. Sociological Methods & Research,
41(2012), 294-324.
- [19] Bozdogan, H. 2000. Akaike's information criterion and recent developments in information complexity. Journal of mathematical psychology, 44(2000), 62-91.
- [20] Bozdogan, H. 2010. A new class of information complexity (ICOMP) criteria with an application to customer profiling and segmentation. Journal of the School of Business Administration(2010), 370-398.
- [21] Deniz, E., Akbilgic, O., & Howe, J. A. (2011). Model selection using information criteria under a new estimation method: least squares ratio. Journal of Applied Statistics, 2043-2050.
- [22] Pamukçu, E., Bozdogan, H., & Çalık, S. 2015. A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification. Computational and mathematical methods in medicine(2015), Article ID 370640, 14 pages.
- [23] Cameron, A. C., & Trivedi, P. K. 1998. Regression Analysis of Count Data. Cambridge University Press.
- [24] Jansakul, N., & Hinde, J. P. 2002. Score tests for zero-inflated Poisson models. Computational, 40(2002), 75-96.
- [25] Eberhart, R., & Kennedy, J. 1995. A new optimizer using particle swarm theory. In Micro Machine and Human Science. Proceedings of the Sixth International Symposium on IEEE., 39-43.
- [26] Özsağlam, M. Y., & Çunkaş, M. 2008. Optimizasyon Problemlerinin Çözümü için Parçaçık Sürü Optimizasyonu Algoritması. Politeknik Dergisi(2008), 11.
- [27] Hilbe, J. M. 2016. COUNT: Functions, Data and Code for Count Data. R package version 1.3.4. https://CRAN.R-project.org/package=COUNT
- [28] Koç, H., Dünder, E., Gümüştekin, S., Koç, T., & Cengiz, M. A. 2018. Particle swarm optimization-based variable selection in Poisson regression analysis via information complexity-type criteria. Communications in Statistics-Theory and Methods, (2018) 47(21), 5298-5306.
Parçacık Sürü Optimizasyonu Yöntemi ile Sayım Modelleri için En Uygun Değişken Kümesinin Belirlenmesi
Yıl 2019,
Cilt: 23 Sayı: Özel, 76 - 83, 01.03.2019
Haydar Koç
,
Tuba Koç
,
Emre Dünder
Öz
Birçok
bilimsel çalışmada sayım verisi olarak adlandırılan negatif olmayan tamsayı
değerleri alan nicel veriler kullanılmaktadır. İstatistiğin en temel analiz yöntemlerinden biri olan regresyon
analizi kapsamında da sayım verileri oldukça sık kullanılmaktadır. Bağımlı
değişkenin tamsayı ile ifade edilebildiği regresyon modelleri sayım modelleri
olarak tanımlanır. Bu çalışmada sayım modelleri kapsamında model seçimi
incelendi. Sayım modellerinde model seçimi için klasik seçim yöntemleri ve PSO
algoritması kullanıldı. Uygulamalar hem simülasyon hem de gerçek veriler
üzerinde yapıldı. Sonuç olarak klasik yöntemlerle kıyaslandığında PSO
algoritmasının, modeldeki değişken sayısı arttıkça ve bağımsız değişkenler
arasındaki korelasyon değerleri yükseldikçe daha iyi sonuçlar verdiği ve sayım
modelleri için PSO algoritmasının değişken seçiminde alternatif bir yöntem
olarak kullanılabileceği gösterilmiştir.
Kaynakça
- [1] George, E. I. 2000. The variable selection problem. Journal of the American Statistical Association, 95(2000), 1304-1308.
- [2] Bozdogan, H. 2004. Intelligent statistical data mining with information complexity and genetic algorithms. Statistical data mining and knowledge discovery(2004), 15-16.
- [3] Lee, K. Y., & El-Sharkawi, M. A. 2008. Modern heuristic optimization techniques: theory and applications to power systems. John Wiley & Sons.
- [4] Drezner, Z., Marcoulides, G. A., & Salhi, S. 1999. Tabu search model selection in multiple regression analysis. Communications in Statistics-Simulation and Computation, 28(1999), 346-367.
- [5] Örkcü, H. H. 2013. Subset selection in multiple linear regression models: a hybrid of genetic and simulated annealing algorithms. Applied Mathematics and Computation, 23(2013), 11018-11028.
- [6] Pacheco, j., Casado, S., & Nunez, L. A. 2009. Variable selection method based on Tabu search for logistic regression models. European Journal of Operational Research, 199(2009), 506-511.
- [7] Unler, A., & Murat, A. 2010. A discrete particle swarm optimization method for feature selection in binary classification problems. European Journal of Operational Research, 206(2010), 528-539.
- [8] Sakate, D. M., Kashid, D. N., & Shirke, D. T. 2011. Subset Selection in Poisson Regression. Journal of Statistical Theory and Practice, 5(2011), 207-219.
- [9] McLeod, A. I., & Xu, C. 2010. R-project. org/package= bestglm. http://CRAN (Erişim Tarihi: 19.10.2017)
- [10] Calcagno, V., & Mazancourt, C. 2010. glmulti: an R package for easy automated model selection with (generalized) linear models. Journal of Statistical Software, 34(2010), 1-29.
- [11] Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) (1996), 267-288.
- [12] Zou, H., & Hastie, T. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2005), 301-320.
- [13] Zou, H. 2006. The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(2006), 1418-1429.
- [14] Zhao, W., Zhang, R., Lv, Y., & Liu, J. 2014. Variable selection for varying dispersion beta regression model. Journal of Applied Statistics, 41(2014), 95-108.
- [15] Bayer, F. M., & Cribari-Neto, F. 2014. Bootstrap-based model selection criteria for beta regressions. TEST(2014), 1-20.
- [16] Bozdogan, H. 1987. Model selection and Akaike's information criterion (AIC): The general theory and its analytical extensions. Psychometrika(1987), 345-370.
- [17] Hurvich, C. M., & Tsai, C. L. 1989. Regression and time series model selection in small samples. Biometrika(1989), 297-307.
- [18] Bollen, K. A., Ray, S., Zavisca, J., & Harden, J. J. 2012. A comparison of Bayes factor approximation methods including two new methods. Sociological Methods & Research,
41(2012), 294-324.
- [19] Bozdogan, H. 2000. Akaike's information criterion and recent developments in information complexity. Journal of mathematical psychology, 44(2000), 62-91.
- [20] Bozdogan, H. 2010. A new class of information complexity (ICOMP) criteria with an application to customer profiling and segmentation. Journal of the School of Business Administration(2010), 370-398.
- [21] Deniz, E., Akbilgic, O., & Howe, J. A. (2011). Model selection using information criteria under a new estimation method: least squares ratio. Journal of Applied Statistics, 2043-2050.
- [22] Pamukçu, E., Bozdogan, H., & Çalık, S. 2015. A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification. Computational and mathematical methods in medicine(2015), Article ID 370640, 14 pages.
- [23] Cameron, A. C., & Trivedi, P. K. 1998. Regression Analysis of Count Data. Cambridge University Press.
- [24] Jansakul, N., & Hinde, J. P. 2002. Score tests for zero-inflated Poisson models. Computational, 40(2002), 75-96.
- [25] Eberhart, R., & Kennedy, J. 1995. A new optimizer using particle swarm theory. In Micro Machine and Human Science. Proceedings of the Sixth International Symposium on IEEE., 39-43.
- [26] Özsağlam, M. Y., & Çunkaş, M. 2008. Optimizasyon Problemlerinin Çözümü için Parçaçık Sürü Optimizasyonu Algoritması. Politeknik Dergisi(2008), 11.
- [27] Hilbe, J. M. 2016. COUNT: Functions, Data and Code for Count Data. R package version 1.3.4. https://CRAN.R-project.org/package=COUNT
- [28] Koç, H., Dünder, E., Gümüştekin, S., Koç, T., & Cengiz, M. A. 2018. Particle swarm optimization-based variable selection in Poisson regression analysis via information complexity-type criteria. Communications in Statistics-Theory and Methods, (2018) 47(21), 5298-5306.