Year 2022,
Volume: 1 Issue: 1, 7 - 32, 31.12.2022
Hakan Demirtaş
,
Kübra Coşar
,
Mutlu Altuntaş
References
- [1] Demirtas H. A method for multivariate ordinal data generation given marginal distributions and correlations. Journal of Statistical Computation and Simulation, 2006; 76: 1017- 1025.
- [2] Demirtas H, Doganay B. Simultaneous generation of binary and normal data with specified marginal and association structures. Journal of Biopharmaceutical Statistics, 2012; 22: 223-236.
- [3] Demirtas H, Yavuz Y. Concurrent generation of ordinal and normal data. Journal of Biopharmaceutical Statistics, 2015; 25: 635-650.
- [4] Amatya A, Demirtas H. Simultaneous generation of multivariate mixed data with Poisson and normal marginals. Journal of Statistical Computation and Simulation, 2015a; 85: 3129-3139.
- [5] Emrich JL, Piedmonte MR. A method for generating high-dimensional multivariate binary variates. The American Statistician, 1991; 45: 302—304.
- [6] Demirtas H, Hedeker D. A practical way for computing approximate lower and upper correlation bounds. The American Statistician, 2011; 65: 104-109.
- [7] Demirtas H, Hedeker D. Computing the point-biserial correlation under any underlying continuous distribution. Communications in Statistics-Simulation and Computation, 2016; Anat. J. Pharm. Sci 2022; 45: 2744-2751.
- [8] Demirtas H, Ahmadian R, Atis S, Can FE, Ercan I. A nonnormal look at polychoric correlations: Modeling the change in correlations before and after discretization. Computational Statistics, 2016; 31: 1385-1401.
- [9] Ferrari PA, Barbiero A. Simulating ordinal data. Multivariate Behavioral Research, 2012; 47: 566-589.
- [10] Yahav I, Shmueli G. On generating multivariate Poisson data in management science applications. Applied Stochastic Models in Business and Industry, 2012; 28: 91—102.
- [11] Fleishman AI. A method for simulating non-normal distributions. Psychometrika, 1978; 43: 521-532.
- [12] Vale CD, Maurelli VA. Simulating multivariate nonnormal distributions. Psychometrika, 1983; 48: 465-471.
- [13] Headrick T.C. Power Method Polynomials and Other Transformations. Boca Raton. 2010; FL: Chapman and Hall/CRC.
- [14] Demirtas H, Hedeker D, Mermelstein J.M. Simulation of massive public health data by power polynomials. Statistics in Medicine, 2012; 31: 3337-3346.
- [15] Demirtas H. Concurrent generation of binary and nonnormal continuous data through fifth order power polynomials. Communications in Statistics-Simulation and Computation, 2017a; 46: 344-357.
- [16] Demirtas H. On accurate and precise generation of generalized Poisson vari- ates. Communications in Statistics-Simulation and Computation, 2017b; 46: 489-499.
- [17] Nelsen RB. An Introduction to Copulas. 2006; Berlin, Germany: Springer.
- [18] Higham N.J. Computing the nearest correlation matrix - a problem from finance. IMA Journal of Numerical Analysis, 2002; 22: 329-343.
- [19] Demirtas H. Simulation-driven inferences for multiply imputed longitudinal datasets. Statistica Neerlandica, 2004a; 58: 466-482.
- [20] Demirtas H. Assessment of relative improvement due to weights within generalized estimating equations framework for incomplete clinical trials data. Journal of Biopharmaceutical Statistics, 2004b; 14: 1085-1098.
- [21] Demirtas H. Multiple imputation under Bayesianly smoothed pattern-mixture models for non-ignorable drop-out. Statistics in Medicine, 2005a; 24: 2345-2363.
- [22] Demirtas H. Bayesian analysis of hierarchical pattern-mixture models for clinical trials data with attrition and comparisons to commonly used ad-hoc and model- based approaches. Journal of Biopharmaceutical Statistics, 2005b; 15: 383-402.
- [23] Demirtas H. Practical advice on how to impute continuous data when the ultimate interest centers on dichotomized outcomes through pre-specified thresholds. Communications in Statistics-Simulation and Computation, 2007a; 36: 871-889.
- [24] Demirtas H. The design of simulation studies in medical statistics. Statistics in Medicine, 2007b; 26: 3818-3821.
- [25] Demirtas H. On imputing continuous data when the eventual interest pertains to ordinalized outcomes via threshold concept. Computational Statistics and Data Analysis, 2008; 52: 2261-2271.
- [26] Demirtas H. Multiple imputation under the generalized lambda distribution. Journal of Biopharmaceutical Statistics, 2009a; 19: 77-89.
- [27] Demirtas H. A distance-based rounding strategy for post-imputation ordinal data. Journal of Applied Statistics, 2010a; 37: 489-500.
- [28] Demirtas H, Hedeker D. Gaussianizationbased quasi-imputation and expansion strategies for incomplete correlated binary responses. Statistics in Medicine, 2007; 26: 782-799.
- [29] Demirtas H, Hedeker D. An imputation strategy for incomplete longitudinal ordinal data. Statistics in Medicine, 2008a; 27: 4086-4093.
- [30] Demirtas H, Hedeker D. Multiple imputation under power polynomials. Communications in Statistics-Simulation and Computation, 2008b; 37: 1682-1695.
- [31] Demirtas H, Hedeker D. Imputing continuous data under some non-Gaussian distributions. Statistica Neerlandica, 2008c; 62: 193-205.
- [32] Demirtas H, Schafer JL. On the performance of random-coefficient patternmixture models for non-ignorable drop-out. Statistics in Medicine, 2003; 22: 2253-2575.
- [33] Demirtas H, Arguelles LM, Chung H, Hedeker D. On the performance of bias reduction techniques for variance estimation in approximate Bayesian bootstrap imputation. Computational Statistics and Data Analysis, 2007; 51: 4064-4068.
- [34] Demirtas H, Freels SA, Yucel RM. Plausibility of multivariate normality assumption when multiply imputing nonGaussian continuous outcomes: A simulation assessment. Journal of Statistical Computation and Simulation, 2008; 78: 69-84.
- [35] Inan G, Demirtas H, Gao R. PoisBinOrd: Data Generation with Poisson, Binary, and Ordinal Components, 2021; URL:http://CRAN.Rproject.org/package=PoisBinOrd
- [36] R Development Core Team. R: A Language and Environment for Statistical Computing. Foundation for Statistical Computing. 2023: Vienna, Austria.
- [37] Amatya A, Demirtas H. MultiOrd: An R package for generating correlated ordinal data. Communications in Statistics-Simulation and Computation, 2015b; 44: 1683-1691.
- [38] Amatya A, Demirtas H. OrdNor: An R package for concurrent generation of ordinal and normal data. Journal of Statistical Software, 2015c; 68: 1-14.
- [39] Amatya A, Demirtas H. Concurrent generation of multivariate mixed data with variables of dissimilar types. Journal of Statistical Computation and Simulation, 2016; 86: 3595-3607.
- [40] Amatya A, Demirtas H. PoisNor: An R package for generation of multivariate data with Poisson and normal marginals. Communications in Statistics-Simulation and Computation, 2017; 46: 2241-2253.
- [41] Demirtas H. Rounding strategies for multiply imputed binary data. Biometrical Journal, 2009b; 51: 677-688.
- [42] Demirtas H. Multiple imputation for longitudinal data under a Bayesian multilevel model. Communications in Statistics-Theory and Methods, 2009c; 38: 2812-2828.
- [43] Demirtas H. An application of multiple imputation under the two generalized parametric families. Journal of Data Science, 2010b; 8: 443-455.
- [44] Demirtas H. Joint generation of binary and nonnormal continuous data. Journal of Biometrics and Biostatistics, 2014; 5: 1-9.
- [45] Demirtas H. A multiple imputation framework for massive multivariate data of different variable types: A Monte-Carlo technique. In ICSA Book Series in Statistics, John Dean Chen and Ding-Geng (Din) Chen (Eds): Monte-Carlo Simulation- Based Statistical Modeling. Singapore: Springer, 2017c: 143-162.
- [46] Demirtas H. Inducing any feasible level of correlation to bivariate data with any marginals. The American Statistician, 2019; 73: 273-277.
- [47] Demirtas H, Gao R. Mixed data generation packages and related computational tools in R. Communications in StatisticsSimulation and Computation, 2022; 51: 4520-4563.
- [48] Demirtas H, Amatya A, Doganay B. BinNor: An R package for concur- rent generation of binary and normal data. Communications in Statistics-Simulation and Computation, 2014; 43: 569-579.
- [49] Demirtas H, Allozi R, Hu Y, Inan G, Ozbek L. Joint generation of binary, ordinal, count, and normal data with specified marginal and association structures in Monte-Carlo
simulations.In ICSA Book Series in Statistics, John Dean Chen and Ding-Geng (Din) Chen (Eds): Monte-Carlo Simulation-Based Statistical Modeling. Singapore: Springer, 2017: 3-15.
- [50] Gao R, Demirtas H. CorrToolBox: an R package for modeling correlational transformations in discretization contexts. Communications in Statistics-Simulation and Computation, 2023.
- [51] Li H, Demirtas H, Chen R. RNGforGPD: An R package for generation of univariate and multivariate generalized Poisson data. The R Journal, 2020; 12: 120-133.
- [52] Demirtas H, Amatya A, Pugach O, Cursio J, Shi F, Morton D, Doganay B. Accuracy versus convenience: a simulation-based comparison of two continuous imputation models for incomplete ordinal longitudinal clinical trials data. Statistics and Its Interface, 2009; 2: 449-456.
- [53] Demirtas H, Vardar-Acar, C. Anatomy of correlational magnitude transforma- tions in latency and discretization contexts in MonteCarlo studies. In ICSA Book Series in Statistics, John Dean Chen and Ding-Geng (Din) Chen (Eds): Monte-Carlo SimulationBased Statistical Modeling. Singapore: Springer, 2017; 59-84.
- [54] Demirtas H. A note on the relationship between the phi coefficient and the tetrachoric correlation under nonnormal underlying distributions. The American Statistician, 2016; 70: 143-148.
Concurrent Generation of Binary, Ordinal, and Count Data with Specified Marginal and Associational Quantities in Pharmaceutical Sciences
Year 2022,
Volume: 1 Issue: 1, 7 - 32, 31.12.2022
Hakan Demirtaş
,
Kübra Coşar
,
Mutlu Altuntaş
Abstract
This manuscript is concerned with establishing a unified framework for concurrently generating data
sets that include three major kinds of variables (i.e., binary, ordinal, and count) when the marginal distributions and
a feasible association structure are specified for simulation purposes. The simulation paradigm has been commonly
utilized in pharmaceutical practice. A central aspect of every simulation study is the quantification of the model
components and parameters that jointly define a scientific process. When this quantification goes beyond the
deterministic tools, researchers often resort to random number generation (RNG) in finding simulation-based
solutions to address the stochastic nature of the problem. Although many RNG algorithms have appeared in the
literature, a major limitation is that most of them were not devised to simultaneously accommodate all variable types
mentioned above. Thus, these algorithms provide only an incomplete solution, as real data sets include variables of
different kinds. This work represents an important augmentation of the existing methods as it is a systematic attempt
and comprehensive investigation for mixed data generation. We provide an algorithm that is designed for generating
data of mixed marginals; illustrate its operational, logistical, and computational details; and present ideas on how it
can be extended to span more sophisticated distributional settings in terms of a broader range of marginal features
and associational quantities.
References
- [1] Demirtas H. A method for multivariate ordinal data generation given marginal distributions and correlations. Journal of Statistical Computation and Simulation, 2006; 76: 1017- 1025.
- [2] Demirtas H, Doganay B. Simultaneous generation of binary and normal data with specified marginal and association structures. Journal of Biopharmaceutical Statistics, 2012; 22: 223-236.
- [3] Demirtas H, Yavuz Y. Concurrent generation of ordinal and normal data. Journal of Biopharmaceutical Statistics, 2015; 25: 635-650.
- [4] Amatya A, Demirtas H. Simultaneous generation of multivariate mixed data with Poisson and normal marginals. Journal of Statistical Computation and Simulation, 2015a; 85: 3129-3139.
- [5] Emrich JL, Piedmonte MR. A method for generating high-dimensional multivariate binary variates. The American Statistician, 1991; 45: 302—304.
- [6] Demirtas H, Hedeker D. A practical way for computing approximate lower and upper correlation bounds. The American Statistician, 2011; 65: 104-109.
- [7] Demirtas H, Hedeker D. Computing the point-biserial correlation under any underlying continuous distribution. Communications in Statistics-Simulation and Computation, 2016; Anat. J. Pharm. Sci 2022; 45: 2744-2751.
- [8] Demirtas H, Ahmadian R, Atis S, Can FE, Ercan I. A nonnormal look at polychoric correlations: Modeling the change in correlations before and after discretization. Computational Statistics, 2016; 31: 1385-1401.
- [9] Ferrari PA, Barbiero A. Simulating ordinal data. Multivariate Behavioral Research, 2012; 47: 566-589.
- [10] Yahav I, Shmueli G. On generating multivariate Poisson data in management science applications. Applied Stochastic Models in Business and Industry, 2012; 28: 91—102.
- [11] Fleishman AI. A method for simulating non-normal distributions. Psychometrika, 1978; 43: 521-532.
- [12] Vale CD, Maurelli VA. Simulating multivariate nonnormal distributions. Psychometrika, 1983; 48: 465-471.
- [13] Headrick T.C. Power Method Polynomials and Other Transformations. Boca Raton. 2010; FL: Chapman and Hall/CRC.
- [14] Demirtas H, Hedeker D, Mermelstein J.M. Simulation of massive public health data by power polynomials. Statistics in Medicine, 2012; 31: 3337-3346.
- [15] Demirtas H. Concurrent generation of binary and nonnormal continuous data through fifth order power polynomials. Communications in Statistics-Simulation and Computation, 2017a; 46: 344-357.
- [16] Demirtas H. On accurate and precise generation of generalized Poisson vari- ates. Communications in Statistics-Simulation and Computation, 2017b; 46: 489-499.
- [17] Nelsen RB. An Introduction to Copulas. 2006; Berlin, Germany: Springer.
- [18] Higham N.J. Computing the nearest correlation matrix - a problem from finance. IMA Journal of Numerical Analysis, 2002; 22: 329-343.
- [19] Demirtas H. Simulation-driven inferences for multiply imputed longitudinal datasets. Statistica Neerlandica, 2004a; 58: 466-482.
- [20] Demirtas H. Assessment of relative improvement due to weights within generalized estimating equations framework for incomplete clinical trials data. Journal of Biopharmaceutical Statistics, 2004b; 14: 1085-1098.
- [21] Demirtas H. Multiple imputation under Bayesianly smoothed pattern-mixture models for non-ignorable drop-out. Statistics in Medicine, 2005a; 24: 2345-2363.
- [22] Demirtas H. Bayesian analysis of hierarchical pattern-mixture models for clinical trials data with attrition and comparisons to commonly used ad-hoc and model- based approaches. Journal of Biopharmaceutical Statistics, 2005b; 15: 383-402.
- [23] Demirtas H. Practical advice on how to impute continuous data when the ultimate interest centers on dichotomized outcomes through pre-specified thresholds. Communications in Statistics-Simulation and Computation, 2007a; 36: 871-889.
- [24] Demirtas H. The design of simulation studies in medical statistics. Statistics in Medicine, 2007b; 26: 3818-3821.
- [25] Demirtas H. On imputing continuous data when the eventual interest pertains to ordinalized outcomes via threshold concept. Computational Statistics and Data Analysis, 2008; 52: 2261-2271.
- [26] Demirtas H. Multiple imputation under the generalized lambda distribution. Journal of Biopharmaceutical Statistics, 2009a; 19: 77-89.
- [27] Demirtas H. A distance-based rounding strategy for post-imputation ordinal data. Journal of Applied Statistics, 2010a; 37: 489-500.
- [28] Demirtas H, Hedeker D. Gaussianizationbased quasi-imputation and expansion strategies for incomplete correlated binary responses. Statistics in Medicine, 2007; 26: 782-799.
- [29] Demirtas H, Hedeker D. An imputation strategy for incomplete longitudinal ordinal data. Statistics in Medicine, 2008a; 27: 4086-4093.
- [30] Demirtas H, Hedeker D. Multiple imputation under power polynomials. Communications in Statistics-Simulation and Computation, 2008b; 37: 1682-1695.
- [31] Demirtas H, Hedeker D. Imputing continuous data under some non-Gaussian distributions. Statistica Neerlandica, 2008c; 62: 193-205.
- [32] Demirtas H, Schafer JL. On the performance of random-coefficient patternmixture models for non-ignorable drop-out. Statistics in Medicine, 2003; 22: 2253-2575.
- [33] Demirtas H, Arguelles LM, Chung H, Hedeker D. On the performance of bias reduction techniques for variance estimation in approximate Bayesian bootstrap imputation. Computational Statistics and Data Analysis, 2007; 51: 4064-4068.
- [34] Demirtas H, Freels SA, Yucel RM. Plausibility of multivariate normality assumption when multiply imputing nonGaussian continuous outcomes: A simulation assessment. Journal of Statistical Computation and Simulation, 2008; 78: 69-84.
- [35] Inan G, Demirtas H, Gao R. PoisBinOrd: Data Generation with Poisson, Binary, and Ordinal Components, 2021; URL:http://CRAN.Rproject.org/package=PoisBinOrd
- [36] R Development Core Team. R: A Language and Environment for Statistical Computing. Foundation for Statistical Computing. 2023: Vienna, Austria.
- [37] Amatya A, Demirtas H. MultiOrd: An R package for generating correlated ordinal data. Communications in Statistics-Simulation and Computation, 2015b; 44: 1683-1691.
- [38] Amatya A, Demirtas H. OrdNor: An R package for concurrent generation of ordinal and normal data. Journal of Statistical Software, 2015c; 68: 1-14.
- [39] Amatya A, Demirtas H. Concurrent generation of multivariate mixed data with variables of dissimilar types. Journal of Statistical Computation and Simulation, 2016; 86: 3595-3607.
- [40] Amatya A, Demirtas H. PoisNor: An R package for generation of multivariate data with Poisson and normal marginals. Communications in Statistics-Simulation and Computation, 2017; 46: 2241-2253.
- [41] Demirtas H. Rounding strategies for multiply imputed binary data. Biometrical Journal, 2009b; 51: 677-688.
- [42] Demirtas H. Multiple imputation for longitudinal data under a Bayesian multilevel model. Communications in Statistics-Theory and Methods, 2009c; 38: 2812-2828.
- [43] Demirtas H. An application of multiple imputation under the two generalized parametric families. Journal of Data Science, 2010b; 8: 443-455.
- [44] Demirtas H. Joint generation of binary and nonnormal continuous data. Journal of Biometrics and Biostatistics, 2014; 5: 1-9.
- [45] Demirtas H. A multiple imputation framework for massive multivariate data of different variable types: A Monte-Carlo technique. In ICSA Book Series in Statistics, John Dean Chen and Ding-Geng (Din) Chen (Eds): Monte-Carlo Simulation- Based Statistical Modeling. Singapore: Springer, 2017c: 143-162.
- [46] Demirtas H. Inducing any feasible level of correlation to bivariate data with any marginals. The American Statistician, 2019; 73: 273-277.
- [47] Demirtas H, Gao R. Mixed data generation packages and related computational tools in R. Communications in StatisticsSimulation and Computation, 2022; 51: 4520-4563.
- [48] Demirtas H, Amatya A, Doganay B. BinNor: An R package for concur- rent generation of binary and normal data. Communications in Statistics-Simulation and Computation, 2014; 43: 569-579.
- [49] Demirtas H, Allozi R, Hu Y, Inan G, Ozbek L. Joint generation of binary, ordinal, count, and normal data with specified marginal and association structures in Monte-Carlo
simulations.In ICSA Book Series in Statistics, John Dean Chen and Ding-Geng (Din) Chen (Eds): Monte-Carlo Simulation-Based Statistical Modeling. Singapore: Springer, 2017: 3-15.
- [50] Gao R, Demirtas H. CorrToolBox: an R package for modeling correlational transformations in discretization contexts. Communications in Statistics-Simulation and Computation, 2023.
- [51] Li H, Demirtas H, Chen R. RNGforGPD: An R package for generation of univariate and multivariate generalized Poisson data. The R Journal, 2020; 12: 120-133.
- [52] Demirtas H, Amatya A, Pugach O, Cursio J, Shi F, Morton D, Doganay B. Accuracy versus convenience: a simulation-based comparison of two continuous imputation models for incomplete ordinal longitudinal clinical trials data. Statistics and Its Interface, 2009; 2: 449-456.
- [53] Demirtas H, Vardar-Acar, C. Anatomy of correlational magnitude transforma- tions in latency and discretization contexts in MonteCarlo studies. In ICSA Book Series in Statistics, John Dean Chen and Ding-Geng (Din) Chen (Eds): Monte-Carlo SimulationBased Statistical Modeling. Singapore: Springer, 2017; 59-84.
- [54] Demirtas H. A note on the relationship between the phi coefficient and the tetrachoric correlation under nonnormal underlying distributions. The American Statistician, 2016; 70: 143-148.