Vine copula graphical models in the construction of biological networks
Year 2021,
, 1172 - 1184, 06.08.2021
Hajar Farnoudkia
Vilda Purutcuoglu
Abstract
The copula Gaussian graphical model (CGGM) is one of the major mathematical models for high dimensional biological networks which provides a graphical representation, especially, for sparse networks. Basically, this model uses a regression of the Gaussian graphical model (GGM) whose precision matrix describes the conditional dependence between the variables to estimate the coefficients of the linear regression model. The Bayesian inference for the model parameters is used to overcome the dimensional limitation of GGM under sparse networks and small sample sizes. But from the application in bench-mark data sets, it is seen that although CGGM is successful in certain systems, it may not fit well for non-normal multivariate observations. In this study, we propose the vine copulas to relax the strict normality assumption of CGGM and to describe networks from a variety of copulas’ alternates besides the Gaussian copula. Accordingly, we evaluate the best fitted bivariate copula distribution for every pairwise gene and compute the estimated adjacency matrix which denotes the presence of an edge between the corresponding genes. We assess the performance of our proposed approach in three network data via distinct accuracy measures by comparing the outputs with the results of the CGGM.
Supporting Institution
European Union 7th Framework Project
Thanks
The second author thanks the COSTNET Project (No: CA15109) for their support.
References
- [1] M. Ağraz and V. Purutçuoğlu, Extended lasso-type MARS (LMARS) model in the
description of biological network, J. Stat. Comput. Simul. 89 (1), 1-14, 2019.
- [2] Ö.S. Alp, E. Büyükbebeci, A. İşcanog, F.Y. Özkurt, P. Taylan and G.W. Weber,
CMARS and GAM & CQP-modern optimization methods applied to international
credit default prediction, J. Comput. Appl. Math. 235 (16), 4639-4651, 2011.
- [3] S.K. Alparslan-Gök, S. Miquel and S.H. Tijs, Cooperation under interval uncertainty,
Math. Methods Oper. Res. 69 (1), 99-109, 2009.
- [4] E. Ayyıldız, M. Ağraz and V. Purutçuoğlu, MARS as an alternative approach of
Gaussian graphical model for biochemical networks, J. Appl. Stat. 44 (16), 2858-2876,
2017.
- [5] E. Ayyıldız and V. Purutçuoğlu, Modeling of various biological networks via
LCMARS, J. Comput. Sci. 28, 148-154, 2018.
- [6] B. Bahçivancı, V. Purutçuoğlu, E. Purutçuoğlu and Y. Ürün, Estimation of gynecologic
cancer networks via target proteins, J. Multidiscip. Eng. Sci. Technol. 5 (12),
9296-9302, 2018.
- [7] E.C. Brechmann and U. Schepmeier, Modeling dependence with C- and D-vine copulas:
The R package CDVine, J. Stat. Softw. 52 (3), 1-25, 2013.
- [8] C. Czado, U. Schepsmeier and A. Min, Maximum likelihood estimation of mixed Cvines
with application to exchange rates, Stat. Model. 12 (3), 229-255, 2012.
- [9] A. Çevik, G.W. Weber, B.M. Eyüboğlu, K.K. Oğuz and Alzheimers Disease Neuroimaging
Initiative, Voxel-MARS: a method for early detection of Alzheimers disease
by classification of structural brain MRI, Ann. Oper. Res. 258 (1), 31-57, 2017.
- [10] E.A. Demirci, Inference of large-scale networks via statistical approaches, PhD thesis,
Middle East Technical University, 2019.
- [11] J. Dissmann, E.C. Brechmann, C. Czado and D. Kurowicka, Selecting and estimating
regular vine copulae and application to financial returns, Comput. Statist. Data Anal.
59, 52-69, 2013.
- [12] A. Dobra and A. Lenkoski, Copula Gaussian graphical models and their application
to modeling functional disability data, Ann. Appl. Stat. 5 (2A), 969-993, 2011.
- [13] H. Farnoudkia and V. Purutçuoğlu, Copula Gaussian graphical modeling of biological
networks and Bayesian inference of model parameters, Scientia Iranica 26 (4), 2495-
2505, 2019.
- [14] B. Fellinghauer, P. Bühlmann, M. Ryffel, M. Von Rhein and J.D. Reinhardt, Stable
graphical model estimation with random forests for discrete, continuous, and mixed
variables, Comput. Statist. Data Anal. 64, 132-152, 2013.
- [15] J. Gebert, N. Radde and G.W. Weber, Modelling gene regulatory networks with piecewise
linear differential equations, Challenges of Continuous Optimization in Theory
and Applications of European Journal of Operational Research 181 (3), 1148-1165,
2007.
- [16] B. Häussling Löwgren, J. Weigert, E. Esche and J.U. Repke, Uncertainty analysis for
data-driven chance-constrained optimization, Sustainability 12 (6), 2450, 2020.
- [17] P.D. Hoff, Extending the rank likelihood for semiparametric copula estimation, Ann.
Appl. Stat. 1 (1), 265-283, 2007.
- [18] A. Karacayir, Short term electricity Load forecasting with multiple linear regression
and artificial neural network, MSc. Term Project Report/Thesis, Middle East Technical
University, 2012.
- [19] I. Kojadinovic and J. Yan, Modeling multivariate distributions with continuous margins
using the copula R package, J. Stat. Softw. 34 (9), 1-20, 2010.
- [20] D. Koller and N. Friedman, Probabilistic Graphical Models Principles and Techniques,
MIT Press, Massachusetts, 2009.
- [21] E. Kropat, G.W. Weber and B. Akteke-Öztürk, Eco-finance networks under uncertainty,
in: Proceedings of the International Conference on Engineering Optimization,
Rio de Janeiro, Brazil, 2008.
- [22] S. Kuter, B.B. Ciftci and G.W. Weber, Snow cover mapping from satellite data by artificial
neural networks and support vector machines - An OR contribution to land-use,
water management and development, International Conference on OR for Development
ICORD 2017, Quebec, Canada, July 13-14, 2017.
- [23] S. Kuter, G.W. Weber and Z. Akyurek, Artificial neural networks vs. multivariate
adaptive regression splines for sub-pixel snow mapping from satellite data, Workshop
on the State of the Art and Future Development, Poznan, Poland, July 3-6, 2016.
- [24] A. Mohammadi and E.C. Wit, BDgraph: Bayesian structure learning of graphs in R,
Bayesian Analysis 10 (1), 109-138, 2015.
- [25] J.M. Mulvey, R.J. Vanderbei and S.A. Zenios, Robust optimization of large-scale systems,
Operations Research 43 (2), 264-281, 1995.
- [26] M.A. Nielsen, Neural Networks and Deep Learning, Determination Press, San Francisco,
CA, 2015.
- [27] A. Özmen, Robust Optimization of Spline Models and Complex Regulatory Networks,
Springer International Publishing, Switzerland, 2016.
- [28] A. Özmen, İ. Batmaz and G.W. Weber, Precipitation modeling by polyhedral RCMARS
and comparison with MARS and CMARS, Environ. Model. Assess. 19 (5),
425-435, 2014.
- [29] A. Özmen, G.W. Weber, İ. Batmaz and E. Kropat, RCMARS: Robustification of
CMARS with different scenarios under polyhedral uncertainty set, Commun. Nonlinear
Sci. Numer. Simul. 16 (12), 4780-4787, 2011.
- [30] A. Özmen, G.W. Weber and E. Kropat, Robustification of conic generalized partial
linear models under polyhedral uncertainty, Methods 20 (21), 22, 2012.
- [31] H. Parkinson, M. Kapushesky, M. Shojatalab, N. Abeygunawardena, R. Coulson, A.
Farne, E. Holloway, N. Kolesnykov, P. Lilja, M. Lukk and R. Mani, ArrayExpressa public database of microarray experiments and gene expression profiles, Nucleic Acids
Res 35 (suppl-1), D747-D750, 2007.
- [32] V. Purutcuoglu and H. Farnoudkia, Copula Gaussian graphical modelling of biological
networks and Bayesian inference of model parameters, Scientia Iranica 26 (4), 2495-
2505, 2019.
- [33] V. Purutçuoğlu and H. Farnoudkia, Gibbs sampling in inference of copula gaussian
graphical model adapted to biological networks, Acta Physica Polonica A 132 (3),
2017.
- [34] Y. Rahmatallah, F. Emmert-Streib and G. Glazko, Gene sets net correlations analysis
(GSNCA): A multivariate differential coexpression test for gene sets, Bioinformatics
30 (3), 360368, 2014.
- [35] K. Sachs, O. Perez, D. Pe’er, D.A. Lauenburger and G.P. Nolan, Causal proteinsignaling
networks derived from multiparameter single-cell data, Science 308 (5721),
523-529, 2005.
- [36] E. Savku and G.W. Weber, A stochastic maximum principle for a Markov regimeswitching
jump-diffusion model with delay and an application to finance, J. Optim.
Theory Appl. 179 (2), 696-721, 2018.
- [37] D. Seçilmiş and V. Purutçuoğlu, Modeling of biochemical networks via classification
and regression tree methods, Mathematical Methods in Engineering, 87-102, 2019.
- [38] I. Shmulevich, E.R. Dougherty and K. Seungchan, Sparse inverse covariance estimation
with the graphical lasso, Bioinformatics 18, 261274, 2002.
- [39] J. Stöber, H.G. Hong, C. Czado and P. Ghosh, Comorbidity of chronic diseases in the
elderly: Patterns identified by a copula design for mixed responses, Comput. Statist.
Data Anal. 88, 28-39, 2015.
- [40] V. Strijov, G.W. Weber, R. Weber and S.O. Akyuz, Editorial of the special issue in
data analysis and intelligent optimization with applications, Machine Learning 101,
1-4, 2015.
- [41] E. Todorov, Stochastic optimal control and estimation methods adapted to the noise
characteristics of the sensorimotor system, Neural Comput. 17 (5), 1084-1108, 2005.
- [42] G. Üstünkar, S.Ö. Akyüz, G.W. Weber and Y.A. Son, Analysis of SNP-complex
disease association by a novel feature selection method, in: Operations Research Proceedings
2010, Springer, Berlin, Heidelberg, 21-26, 2011.
- [43] H. Wang and S. Zhengzi, Efficient Gaussian graphical model determination under
G-Wishart prior distributions, Electron. J. Stat. 6, 168-198, 2012.
- [44] G.W. Weber, Z. Çavuşoğlu and A. Özmen, Predicting default probabilities in emerging
markets by new conic generalized partial linear models and their optimization,
Optimization 61 (4), 443-457, 2012.
- [45] J. Whittaker, Graphical Models in Applied Multivariate Statistics, Wiley Publishing,
1990.
- [46] F. Yerlikaya-Özkurt, C. Vardar-Acar, Y. Yolcu-Okur and G.W. Weber, Estimation
of the Hurst parameter for fractional Brownian motion using the CMARS method, J.
Comput. Appl. Math. 259, 843-850, 2014.
Year 2021,
, 1172 - 1184, 06.08.2021
Hajar Farnoudkia
Vilda Purutcuoglu
References
- [1] M. Ağraz and V. Purutçuoğlu, Extended lasso-type MARS (LMARS) model in the
description of biological network, J. Stat. Comput. Simul. 89 (1), 1-14, 2019.
- [2] Ö.S. Alp, E. Büyükbebeci, A. İşcanog, F.Y. Özkurt, P. Taylan and G.W. Weber,
CMARS and GAM & CQP-modern optimization methods applied to international
credit default prediction, J. Comput. Appl. Math. 235 (16), 4639-4651, 2011.
- [3] S.K. Alparslan-Gök, S. Miquel and S.H. Tijs, Cooperation under interval uncertainty,
Math. Methods Oper. Res. 69 (1), 99-109, 2009.
- [4] E. Ayyıldız, M. Ağraz and V. Purutçuoğlu, MARS as an alternative approach of
Gaussian graphical model for biochemical networks, J. Appl. Stat. 44 (16), 2858-2876,
2017.
- [5] E. Ayyıldız and V. Purutçuoğlu, Modeling of various biological networks via
LCMARS, J. Comput. Sci. 28, 148-154, 2018.
- [6] B. Bahçivancı, V. Purutçuoğlu, E. Purutçuoğlu and Y. Ürün, Estimation of gynecologic
cancer networks via target proteins, J. Multidiscip. Eng. Sci. Technol. 5 (12),
9296-9302, 2018.
- [7] E.C. Brechmann and U. Schepmeier, Modeling dependence with C- and D-vine copulas:
The R package CDVine, J. Stat. Softw. 52 (3), 1-25, 2013.
- [8] C. Czado, U. Schepsmeier and A. Min, Maximum likelihood estimation of mixed Cvines
with application to exchange rates, Stat. Model. 12 (3), 229-255, 2012.
- [9] A. Çevik, G.W. Weber, B.M. Eyüboğlu, K.K. Oğuz and Alzheimers Disease Neuroimaging
Initiative, Voxel-MARS: a method for early detection of Alzheimers disease
by classification of structural brain MRI, Ann. Oper. Res. 258 (1), 31-57, 2017.
- [10] E.A. Demirci, Inference of large-scale networks via statistical approaches, PhD thesis,
Middle East Technical University, 2019.
- [11] J. Dissmann, E.C. Brechmann, C. Czado and D. Kurowicka, Selecting and estimating
regular vine copulae and application to financial returns, Comput. Statist. Data Anal.
59, 52-69, 2013.
- [12] A. Dobra and A. Lenkoski, Copula Gaussian graphical models and their application
to modeling functional disability data, Ann. Appl. Stat. 5 (2A), 969-993, 2011.
- [13] H. Farnoudkia and V. Purutçuoğlu, Copula Gaussian graphical modeling of biological
networks and Bayesian inference of model parameters, Scientia Iranica 26 (4), 2495-
2505, 2019.
- [14] B. Fellinghauer, P. Bühlmann, M. Ryffel, M. Von Rhein and J.D. Reinhardt, Stable
graphical model estimation with random forests for discrete, continuous, and mixed
variables, Comput. Statist. Data Anal. 64, 132-152, 2013.
- [15] J. Gebert, N. Radde and G.W. Weber, Modelling gene regulatory networks with piecewise
linear differential equations, Challenges of Continuous Optimization in Theory
and Applications of European Journal of Operational Research 181 (3), 1148-1165,
2007.
- [16] B. Häussling Löwgren, J. Weigert, E. Esche and J.U. Repke, Uncertainty analysis for
data-driven chance-constrained optimization, Sustainability 12 (6), 2450, 2020.
- [17] P.D. Hoff, Extending the rank likelihood for semiparametric copula estimation, Ann.
Appl. Stat. 1 (1), 265-283, 2007.
- [18] A. Karacayir, Short term electricity Load forecasting with multiple linear regression
and artificial neural network, MSc. Term Project Report/Thesis, Middle East Technical
University, 2012.
- [19] I. Kojadinovic and J. Yan, Modeling multivariate distributions with continuous margins
using the copula R package, J. Stat. Softw. 34 (9), 1-20, 2010.
- [20] D. Koller and N. Friedman, Probabilistic Graphical Models Principles and Techniques,
MIT Press, Massachusetts, 2009.
- [21] E. Kropat, G.W. Weber and B. Akteke-Öztürk, Eco-finance networks under uncertainty,
in: Proceedings of the International Conference on Engineering Optimization,
Rio de Janeiro, Brazil, 2008.
- [22] S. Kuter, B.B. Ciftci and G.W. Weber, Snow cover mapping from satellite data by artificial
neural networks and support vector machines - An OR contribution to land-use,
water management and development, International Conference on OR for Development
ICORD 2017, Quebec, Canada, July 13-14, 2017.
- [23] S. Kuter, G.W. Weber and Z. Akyurek, Artificial neural networks vs. multivariate
adaptive regression splines for sub-pixel snow mapping from satellite data, Workshop
on the State of the Art and Future Development, Poznan, Poland, July 3-6, 2016.
- [24] A. Mohammadi and E.C. Wit, BDgraph: Bayesian structure learning of graphs in R,
Bayesian Analysis 10 (1), 109-138, 2015.
- [25] J.M. Mulvey, R.J. Vanderbei and S.A. Zenios, Robust optimization of large-scale systems,
Operations Research 43 (2), 264-281, 1995.
- [26] M.A. Nielsen, Neural Networks and Deep Learning, Determination Press, San Francisco,
CA, 2015.
- [27] A. Özmen, Robust Optimization of Spline Models and Complex Regulatory Networks,
Springer International Publishing, Switzerland, 2016.
- [28] A. Özmen, İ. Batmaz and G.W. Weber, Precipitation modeling by polyhedral RCMARS
and comparison with MARS and CMARS, Environ. Model. Assess. 19 (5),
425-435, 2014.
- [29] A. Özmen, G.W. Weber, İ. Batmaz and E. Kropat, RCMARS: Robustification of
CMARS with different scenarios under polyhedral uncertainty set, Commun. Nonlinear
Sci. Numer. Simul. 16 (12), 4780-4787, 2011.
- [30] A. Özmen, G.W. Weber and E. Kropat, Robustification of conic generalized partial
linear models under polyhedral uncertainty, Methods 20 (21), 22, 2012.
- [31] H. Parkinson, M. Kapushesky, M. Shojatalab, N. Abeygunawardena, R. Coulson, A.
Farne, E. Holloway, N. Kolesnykov, P. Lilja, M. Lukk and R. Mani, ArrayExpressa public database of microarray experiments and gene expression profiles, Nucleic Acids
Res 35 (suppl-1), D747-D750, 2007.
- [32] V. Purutcuoglu and H. Farnoudkia, Copula Gaussian graphical modelling of biological
networks and Bayesian inference of model parameters, Scientia Iranica 26 (4), 2495-
2505, 2019.
- [33] V. Purutçuoğlu and H. Farnoudkia, Gibbs sampling in inference of copula gaussian
graphical model adapted to biological networks, Acta Physica Polonica A 132 (3),
2017.
- [34] Y. Rahmatallah, F. Emmert-Streib and G. Glazko, Gene sets net correlations analysis
(GSNCA): A multivariate differential coexpression test for gene sets, Bioinformatics
30 (3), 360368, 2014.
- [35] K. Sachs, O. Perez, D. Pe’er, D.A. Lauenburger and G.P. Nolan, Causal proteinsignaling
networks derived from multiparameter single-cell data, Science 308 (5721),
523-529, 2005.
- [36] E. Savku and G.W. Weber, A stochastic maximum principle for a Markov regimeswitching
jump-diffusion model with delay and an application to finance, J. Optim.
Theory Appl. 179 (2), 696-721, 2018.
- [37] D. Seçilmiş and V. Purutçuoğlu, Modeling of biochemical networks via classification
and regression tree methods, Mathematical Methods in Engineering, 87-102, 2019.
- [38] I. Shmulevich, E.R. Dougherty and K. Seungchan, Sparse inverse covariance estimation
with the graphical lasso, Bioinformatics 18, 261274, 2002.
- [39] J. Stöber, H.G. Hong, C. Czado and P. Ghosh, Comorbidity of chronic diseases in the
elderly: Patterns identified by a copula design for mixed responses, Comput. Statist.
Data Anal. 88, 28-39, 2015.
- [40] V. Strijov, G.W. Weber, R. Weber and S.O. Akyuz, Editorial of the special issue in
data analysis and intelligent optimization with applications, Machine Learning 101,
1-4, 2015.
- [41] E. Todorov, Stochastic optimal control and estimation methods adapted to the noise
characteristics of the sensorimotor system, Neural Comput. 17 (5), 1084-1108, 2005.
- [42] G. Üstünkar, S.Ö. Akyüz, G.W. Weber and Y.A. Son, Analysis of SNP-complex
disease association by a novel feature selection method, in: Operations Research Proceedings
2010, Springer, Berlin, Heidelberg, 21-26, 2011.
- [43] H. Wang and S. Zhengzi, Efficient Gaussian graphical model determination under
G-Wishart prior distributions, Electron. J. Stat. 6, 168-198, 2012.
- [44] G.W. Weber, Z. Çavuşoğlu and A. Özmen, Predicting default probabilities in emerging
markets by new conic generalized partial linear models and their optimization,
Optimization 61 (4), 443-457, 2012.
- [45] J. Whittaker, Graphical Models in Applied Multivariate Statistics, Wiley Publishing,
1990.
- [46] F. Yerlikaya-Özkurt, C. Vardar-Acar, Y. Yolcu-Okur and G.W. Weber, Estimation
of the Hurst parameter for fractional Brownian motion using the CMARS method, J.
Comput. Appl. Math. 259, 843-850, 2014.