EFFECT OF BOOTSTRAPPING ON GAUSSIAN MIXTURE MODEL
Year 2025,
Volume: 11 Issue: 2, 182 - 193, 30.12.2025
Mehmet Ali Kaygusuz
,
Maruf Gögebakan
,
Vilda Purutcuoglu
Abstract
Gaussian mixture model is a probabilistic model where all the data points are assumed to be generated from a mixture of a finite number of Gaussian distributions with unknown parameters. This model typically deploys in unsupervised machine learning and has common applications in different fields such as bioinformatics, financial econometrics and deep learning. In this study, we combine bootstrap methods with Gaussian mixture model in order to investigate whether they enable to improve the model accuracy, specifically, when the number of parameters changes with respect to the number of observations. In the detection of optimal Gaussian mixture model, we perform likelihood ratio test due to its advantage in computational efficiency and high accuracy, and compare its performance with consistent Akaike information criterion with Fisher information matrix under distinct simulation scenarios.
Ethical Statement
The authors declare that this document does not require ethics committee approval or any special permission. Our study does not cause any harm to the environment and does not involve the use of animal or human subjects.
Supporting Institution
None
Project Number
No financial support exists.
References
-
McLachlan, G. J. & Rathnayake, S. On the number of components in a Gaussian mixture model. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4 (5), 341-355, 2014.
-
Kaygusuz, M. A. & Purutçuoğlu, V. Comparative study by adding bootstrapping stage in the construction of biological networks. Journal of Dynamics and Games, 12 (2), 118-133, 2025.
-
Lu, J. A survey on Bayesian inference for Gaussian mixture model. arXiv preprint arXiv:2108.11753, 2021.
-
Marin, S., Loong, B., & Westveld, A. BOB: Bayesian Optimized Bootstrap with applications to Gaussian mixture models. arXiv preprint arXiv:2311.03644, 2023.
-
McLachlan, G. J. On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Journal of the Royal Statistical Society: Series C, 36 (3), 318-324, 1987.
-
Feng, Z. D. & McCulloch, C. E. Using bootstrap likelihood ratios in finite mixture models. Journal of the Royal Statistical Society: Series B, 58 (3), 609-617,1996.
-
Dziak, J. J., Lanza, S. T., & Tan, X. Effect size, statistical power, and sample size requirements for the bootstrap likelihood ratio test in latent class analysis. Structural Equation Modeling: A Multidisciplinary Journal, 21 (4), 534-552, 2014.
-
Tekle, F. B., Gudicha, D. W., & Vermunt, J. K. Power analysis for the bootstrap likelihood ratio test for the number of classes in latent class models. Advances in Data Analysis and Classification, 10, 209-224, 2016.
-
Lee, S., Rajan, S., Jeon, G., Chang, J. H., Dajani, H. R., & Groza, V. Z. Oscillometric blood pressure estimation by combining nonparametric bootstrap with Gaussian mixture model. Computers in Biology and Medicine, 85, 112-124, 2017.
-
Fong, E., Lyddon, S., & Holmes, C. Scalable nonparametric sampling from multimodal posteriors with the posterior bootstrap. In International Conference on Machine Learning (pp. 1952-1962). PMLR, 2019.
-
Wang, S., Shin, M., & Bai, R. Fast bootstrapping nonparametric maximum likelihood for latent mixture models. IEEE Signal Processing Letters, 31, 870-874, 2024.
-
Field, C. A., Pang, Z., & Welsh, A. H. Bootstrapping robust estimates for clustered data. Journal of the American Statistical Association, 105(492), 1606-1616, 2010.
-
Andrews, J. L. Addressing overfitting and underfitting in Gaussian model-based clustering. Computational Statistics and Data Analysis, 127, 160-171, 2018.
-
Han, J., & Liu, Q. Bootstrap model aggregation for distributed statistical learning. Advances in Neural Information Processing Systems, 29, 2016.
-
Choi, J., Park, H., & Hwang, I. Bootstrapped Gaussian mixture model-based data-driven forward stochastic reachability analysis. IEEE Control Systems Letters, 8, 1-6, 2023.
-
Dias, J. G. & Vermunt, J. K. A bootstrap-based aggregate classifier for model-based clustering. Computational Statistics, 23, 643-659, 2008.
-
Taushanov, Z. & Berchtold, A. Bootstrap validation of the estimated parameters in mixture models used for clustering. Journal de la Société Française de Statistique, 160 (1), 114-129, 2019.
-
Cui, F. & Walker, S. G. A Bayesian Bootstrap for mixture models. Bayesian Analysis, 1 (1), 1-28, 2024.
-
Scrucca, L. Assessing uncertainty in Gaussian mixtures-based entropy estimation. Communications in Statistics-Simulation and Computation, 1-23, 2025.
-
Külah, E., Çetinkaya, Y. M., Özer, A. G., & Alemdar, H. COVID-19 forecasting using shifted Gaussian mixture model with similarity-based estimation. Expert Systems with Applications, 214, 119034, 2023.
-
Mostafa, B., Hassan, R., Mohammed, H., & Tawfik, M. Gaussian mixture models for training Bayesian convolutional neural networks. Evolutionary Intelligence, 1-22, 2024.
-
Scrucca, L., Fop, M., Murphy, T. B., & Raftery, A. E. mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R journal, 8 (1), 289, 2016.
-
Saltı, M., Kangal, E. E., & Zengin, B. Machine learning approach towards telemarketing estimation. Middle East Journal of Science, 10 (1), 21-40, 2024.
-
Er, M. F. & Bilgin, T. T. Enhancing multi-class text classification with apriori-based feature selection. Middle East Journal of Science, 10 (1), 41-57, 2024.
-
Kaya, A., Gümüş, R., & Aydın, Ö. Time series outlier analysis for model data and human-induced risks in Covid-19 symptoms detection, Middle East Journal of Science, 7 (2), 123-136, 2021.
-
Bilgin, T. T., Kunduracı, M. S., Metin, A., Doğru, M., & Nayir, E. Application artificial intelligence techniques for defect prevention and quality control in arc welding processes: a comprehensive review. Middle East Journal of Science, 10 (2), 179-206, 2024.
-
Aslan, E. & Özüpak, Y. Advanced skin cancer detection using convolutional neural networks and transfer learning, Middle East Journal of Science, 10 (2), 167-178, 2024.
-
Hastie, T. & Tibshirani, R. Discriminant analysis by Gaussian mixtures. Journal of the Royal Statistical Society: Series B, 58 (1), 155-176, 1996.
-
Bensmail, H. & Gilles, C. Regularized Gaussian discriminant analysis through eigenvalue decomposition. Journal of the American Statistical Association, 91 (436), 1743-1748,1996.
-
Efron, B. & Tibshirani, R. J. An introduction to the bootstrap. Chapman and Hall/CRC, 1994.
-
Melnykov, V. & Melnykov, I. Initializing the EM algorithm in Gaussian mixture models with an unknown number of components. Computational Statistics and Data Analysis, 56 (6), 1381-1395, 2012.
-
O’Hagan, A., Murphy, T. B., Scrucca, L., & Gormley, I. C. Investigation of parameter uncertainty in clustering using a Gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap. Computational Statistics, 34 (4), 1779-1813, 2019.
-
Bozdoğan, H. Model selection and Akaike's information criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52 (3), 345-370, 1987.
-
Bozdoğan, H. A new class of information complexity (ICOMP) criteria with an application to customer profiling and segmentation. İstanbul University Journal of the School of Business, 39 (2), 370-398, 2010.
-
Kaygusuz, M. A. & Purutçuoğlu, V. Model selection criteria with bootstrap algorithms: applications in biological networks. Artificial intelligence for Data Driven Techniques, DeGruyter, 2021.
-
Celeux, G. & Govaert, G. Gaussian parsimonious clustering models. Pattern Recognition, 28 (5), 781–793, 1995.
-
Banfield, J. D. & Raftery, A. E. Model-based Gaussian and non-Gaussian clustering. Biometrics, 49 (3), 803–821, 1993.
-
Fraley, C. & Raftery, A. E. Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97 (458), 611–631, 2002.
-
Thorndike, R. L. Who belongs in the family? Psychometrika, 18 (4), 267–276, 1953.
-
Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. OPTICS: Ordering Points To Identify the Clustering Structure, ACM SIGMOD International Conference on Management of Data, 49–6, 1999.
-
Kriegel, H. P., Kröger, P., Sander, J., & Zimek, A. Density-based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1 (3), 231–240, 2011.
-
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In The Second International Conference on Knowledge Discovery and Data Mining (KDD-96). Simoudis, E., Han, J. & Fayyad, U. M. (eds.), 226–231, 1996.