A Short Note on Obtaining Item Parameter Estimates of IRT Models with Bayesian Estimation in Mplus

Parameter estimation of Item Response Theory (IRT) models can be applied using both Bayesian and nonBayesian methods. Although maximum likelihood estimation (MLE), a non-Bayesian method, has predominated since the 1970s, there is an increasing use of Bayesian methods, due to their capability for estimating complex models and for their implementation in commercially available software. In view of the recent increase in the popularity of these methods, a comparison between model parameter estimates from the two types of methods would be useful for practitioners. In this study, we compare MLE and Bayesian estimation, two popular methods for obtaining parameter estimates for dichotomous IRT models, using the MLE and Bayes estimator options as implemented in the Mplus software package. Results indicated Bayesian and MLE estimates differed only slightly, clearly demonstrating the consistency between estimates from the two methods. Further, Bayes estimator option in Mplus can be a viable and relatively easy to use tool for calibrations of IRT models.


INTRODUCTION
Item response theory (IRT) models have been used for testing over the last half-century (van der Linden & Hambleton, 2013). Parameter estimation is considered one of the important processes of IRT modeling. Estimates of IRT model parameters have typically been done using methods such as maximum likelihood estimation (MLE;  and Markov chain Monte Carlo estimation (MCMC; Patz & Junker, 1999a). MLE methods are based on a frequentist approach, and MCMC is a Bayesian method. MLE-based estimation methods have been widely used in IRT modeling since the development of software such as BILOG , MULTILOG (Thissen, 1991) and PARSCALE (Muraki & Bock, 1996). Implementations of MCMC methods for estimation of IRT models began to be reported in the early 1990s (e.g., Albert, 1992;Albert & Chib, 1993). MLE generally requires large samples to produce reliable results (e.g., Asparouhov & Muthén, 2010a;Meuleman & Billiet, 2009), a condition not necessarily required by Bayesian methods.
In this regard, there are often advantages with Bayesian methods that can overcome some of the problems associated with MLE methods. Lee and Song (2004) note that Bayesian methods can provide asymptotically distribution-free estimates, as well as more accurate results with smaller samples with non-normal ability distribution (see also Gao, & Chen, 2005). Fox (2010) suggests that, unlike MLE, Bayesian methods enable the use of additional information for estimating model parameters in addition to providing smaller standard errors than those of marginal maximum likelihood estimate, when reasonable prior information is available. Thus, MCMC implementations, such as Metropolis-Hastings and Gibbs sampling, became increasingly popular for IRT modeling after 1990s, particularly for and Lee (2002, p.340), "a single sample from the joint posterior distribution, π(ω|Y), is approximated, by sampling each parameter from its conditional posterior distribution." A cycle is completed after sampling the first set of parameters. The parameters of the previous cycle serve as conditional values for the next cycle. We need to iterate the cycles t times until convergence is achieved. Mean, mode, or median of each parameter's marginal posterior distribution can be used as the final estimates (Baker, 1998). Readers are referred to Albert (1992) and Baker (1998) for more details on Gibbs Sampling.
One characteristic of the implementation of MCMC algorithms is that they can be quite technically complex. Fortunately, implementations of Bayesian estimation algorithms are often available from authors of published articles (Curtis, 2010). Early applications of Bayesian IRT estimation were mainly implemented in WINBUGS (Spiegelhalter, Thomas, Best, & Lunn, 2003). Thus, most of the Bayesian estimation programs have been written in the BUGS language (Curtis, 2010). Over the years, other software packages have also been developed for Bayesian estimation of IRT including JAGS (Plummer, 2003), STAN (Stan Development Team, 2016), and OPENBUGS (Spiegelhalter, Thomas, Best, & Lunn, 2010), which is the more recent version of WinBUGS. These packages are designed primarily for Bayesian estimation. Bayesian algorithms have become available in several general purpose software packages including SAS (e.g., Proc MCMC; SAS Institute, 2017), S-PLUS, R packages including DPpackage , mlirt (Fox, 2007), MCMCpack (Martin, Quinn, & Park, 2011), pscl (Jackman et al., 2017) and MATLAB (The MathWorks Inc., 2010).
Mplus (Muthén, & Muthén, 1998-2019 recently implemented a Bayesian MCMC algorithm option for latent variable models (Asparouhov & Muthén, 2010). Mplus is a general latent variable modeling program that implements the estimation of several statistical model families, including structural equation models, latent class analysis, factor analysis, mixture models with both single-and multi-level data structures. Mplus also estimates a range of IRT models (Embretson, & Reise, 2000) including the one-, two-and three-parameter logistic models (1PL, 2PL, and 3PL; Birnbaum, 1968), the fourparameter logistic model (4PL; Barton & Lord, 1981), the partial credit model (PCM; Masters & Wright, 1997), and the generalized partial credit model (GPCM; . Multi-level and mixture extensions of these models are also possible within Mplus. A relatively small number of studies have reported results use of Mplus for Bayesian estimation of IRT models (e.g., Luo & Dimitrov, 2019). Luo and Dimitrov (2019) have shown how to obtain estimates for MCMC/EAP of the ability parameters when using Bayes estimator in Mplus. Their study has focused on ability parameters. However, studies investigating the Bayesian estimation of item parameters of IRT models using Mplus do not yet appear to have been reported. As the popularity of the Mplus software package increases, the user may want to learn how to create syntaxes to be able to estimate item parameters of IRT models using Mplus software package. A didactic paper on this issue would be very helpful for practitioners. To this end, the purpose of this study is to introduce Bayes estimator of Mplus and to demonstrate its application in obtaining item parameter estimates of dichotomous IRT models.

Empirical Example
Estimation of dichotomous IRT models on a sample data set using the Bayesian estimation algorithm implemented in Mplus will be demonstrated in this part. In addition, Bayesian estimates were compared with ML based estimates as implemented in Mplus.

Data
Section 6 of the Law School Admission Test data set (LSAT6; Bock & Lieberman, 1970) was analyzed in this study. The LSAT6 data set consists of the item responses of 1,000 examinees to five multiple choice five-choice items on Figure Classification. Multiple choice items are coded as 0 and 1 for wrong and right answers, respectively. LSAT6 data set was preferred in this study as it has been studied in many IRT studies and it is available with most of the IRT software packages. This data set is known to be useful example of testing out IRT procedures and showing the use of Bayesian estimation.

IRT Models
Only Rasch, 1PL, and 2PL dichotomous models were considered in the present study as the Bayes estimator is not currently available in Mplus for other IRT models (e.g., 3PL, 4PL, and polytomous models). Under the 2PL IRT model, the probability of a correct response to an item can be given as follows: where θs is the person ability parameter for examinee s, and βi and αi are the difficulty and discrimination parameters, respectively, for item i. The 1PL model can be written from Equation 1 by setting the item discrimination parameters to a constant value (e.g., the average item discrimination). Similarly, the Rasch model equation may be written by setting the item discrimination parameters to 1. Given that conditional independence assumption, the conditional likelihood of a response pattern would be given as where ( ) = 1 − ( ).

Estimation
Parameter estimates of the model in Equation 1 can be obtained with either MLE or Bayesian estimation. The MLE algorithm focuses on finding the ability-level estimates that maximize the log-likelihood function with iterative procedures like Newton-Raphson (Embretson & Reise, 2000). Bayesian estimation integrates the prior distribution into the likelihood function. In Mplus, Bayesian estimation is on the probit scale, and MLE is on the logistic scale. When using the probit model, the posterior distribution of θs can be defined as where p(θs) is the prior. As can be seen in Equation 3 (right side), prior distributions required to define the posterior distributions. These are used with the specified model in determining the posterior. The central tendency statistics of the posterior distribution can be reported as the final estimates.
The ESTIMATOR option should be defined as "ESTIMATOR=BAYES;" to obtain Bayes estimates in Mplus. Other Bayesian related options in the ANALYSIS command include ALGORITHM, BCONVERGENCE, BITERATIONS, BSEED, CHAINS, FBITERATIONS, POINT, PREDICTOR, PRIOR, STVALUES, and THIN. Descriptions of these options can be found in Mplus User's Guide (Muthén & Muthén, 1998-2019. The non-informative priors are the default for Bayes estimator in Mplus. However, users can specify informative priors such as inverse Gamma, Inverse Wishart, and Dirichlet (Muthén & Asparouhov, 2013).

Analyses
Three dichotomous IRT models (i.e., Rasch, 1PL, and 2PL models) were fit to the LSAT6 data set using MLE and Bayesian options in Mplus. For comparison, the MLE estimator was used as implemented in the Mplus software. Bayesian estimation with Mplus was applied with non-informative default priors N(0, 5) (Muthén & Muthén, 1998-2019 for item difficulty and discrimination parameters. A detailed list of the alternative prior distributions can be found in Mplus User's Guide (Muthén & Muthén, 1998-2019. The Mplus syntaxes for Bayesian estimation of the three models are presented in Figures 1 to 3, respectively. All model parameters were estimated from the posterior distribution. LSAT6 data have been previously analyzed with MCMC by Kim (2003). In that study, 5,000 iterations were used as burn-in based on Gelman and Rubin diagnostic information. In this study, the same number of iterations was used for burn-in stage. Thus, a total of 10,000 MCMC iterations were run; the first 5,000 iterations were used as burn-in. Posterior means of the sampled values for each parameter were taken as parameter estimates.
In the Mplus syntax for the Bayesian estimation of the Rasch model (see Figure 1), the TITLE command was used to describe the problem. The FILE option was added for the data set (FILE = LSAT6.txt;). As the LSAT6 data set consists of five dichotomous items, the "NAMES = item1-item5;" option was used to label these five items under the VARIABLE command. The CATEGORICAL option was used to specify these items as categorical (CATEGORICAL = item1-item5;). USEVAR option was used to specify to use all of the items in the analysis (USEVAR = item1-item5;). The ANALYSIS command, ESTIMATOR = BAYES; option was specified to obtain Bayesian estimates. The Bayes estimator in Mplus employs Gibbs sampling (line 10) with two processors (line 11) to run two parallel chains (line 12). FBITERATIONS option was used to set the number of iterations to be 10,000 (line 13). That is, the posterior is based on the last half (i.e., 5,000). STVAL=ML; option was used to specify starting value information as ML-based values. In the syntax, we specified the mean of the posterior distribution to be reported in the output (POINT=MEAN;). THIN indicates the number of iterations to be used for estimating the posterior. In this example, thinning was set to be 5 (see line 16), which means every 5 th iteration is used for estimating the posterior. The MODEL command (see line 17) indicates a general factor f1 (see line 18) and use the command "by" to indicate the relationship between items and factor f1. As can be seen in Figure 1, f1 by item1@0.587544 item2-item5@0.587544; (line 18) was used to link five items (items1-item5) with factor f1. The @0.587544 part was used to fix item discrimination parameters at 1, which enable us to obtain a model based on Rasch framework. f1@1 option was used fix factor variance to one. In addition, the mean of f1 was fixed at zero by writing [f1@0];. Specifications between lines 17 and 20 were used to obtain Rasch model. The PLOT command was also used to create plots after estimation. As mentioned in the Mplus User's Guide, TYPE = PLOT3; can be used to request graphical displays of several results. STANDARDIZED option was used to obtain standardized solution, and TECH8 was added to request diagnostic information regarding model convergence in the OUTPUT section (line 24). TECH8 also shows the total number of iterations, including the discards. Mplus reports the Potential Scale Reduction (PSR) computed based on Gelman and Rubin's convergence diagnostic (Gelman & Rubin, 1992) when included TECH8 under the OUTPUT command.

270
In addition, a posterior predictive checking (PPC) statistics and its p-value (PPP) can be obtained for model fit assessment in Mplus. This statistics is based on the usual χ2 test, which is computed using the replicated and the observed data in MCMC iteration t. The χ2 fit function difference between these two values is calculated using every 5 th iteration. For an excellent-fitting model, 95% CI for the difference in χ2 value should include zero (Wang & Wang, 2020). Poor fit is observed with low PPP values (e.g. <0.05).   Figure 1 and Figure 2 is that the MODEL section was modified to be able to estimate 1PL model. The f1 by item1-item5* (loading); option (line 18) was used to fix all discrimination parameters to be equal, and the [item1$1-item5$1*] option (line 19) was used to freely estimate the item difficulty parameters.  The Mplus syntax created for Bayesian estimation of 2PL model is presented in Figure 3. Since most parts of this syntax are similar to the syntax of the previous model, only the different parts are explained here. The most noticeable difference between the syntaxes in Figure 1 and Figure 3 is is that the MODEL section was modified to be able to estimate 2PL model. Lines 18 and 19, f1 by item1-item5; and [item1$1-item5$1*] options were used to freely estimate the item discrimination and difficulty parameters, respectively.

272
ML estimates of item parameters of three IRT models with LSAT6 data set were also obtained with Mplus software package for comparison. In order to transform the Mplus derived parameter estimates (i.e., location and thresholds) to typical IRT estimates (i.e., discrimination and difficulty), we followed the transformation formula provided by Asparouhov and Muthén (2016, p.6). Given that µ represents factor mean and denotes the factor variance, item discrimination parameter ( ) and the item difficulty parameter ( ) can be calculated as below using the location ( ) and threhold ( ) parameters from Mplus output: Forero and Maydeu-Olivares (2009) use the normal and logistic versions of the IRT models to place parameter estimates and standard errors in the same metric (within .01 units). The constant D (i.e., 1.702) proposed by Haley (1952) was used in this study as well to convert normal ogive parameters to the logistic parameters.

RESULTS
Tables 1-3 contain approximate posterior means using the MCMC algorithm for the five item parameters for the three IRT models fit the LSAT6 data set. Bayesian estimation and the MLE estimates of item parameters for Rasch, 1PL, and 2PL models were compared. In order to rescale parameters obtained from the Bayesian estimation, all of the parameters were multiplied by 1.7, since the Bayesian estimation in Mplus uses the probit link function.
Traditional fit indices cannot be used when IRT models are estimated with Bayesian estimation. In this case, the convergence of the Markov chain should be checked using Bayesian diagnostic statistics. Mplus reports the potential scale reduction (PSR) for convergence assessment (Asparouhov & Muthén, 2010b). PSR values between 1 and 1.1 indicate good convergence (Wang & Wang, 2020). In this study, PSR values were found to be 1.024, 1.018, and 1.042 for Rasch, 1PL, and 2PL models, respectively. Under the MCMC estimation, the PPC statistics can be used to assess model fit. Mplus provides a χ2 fit function difference between observed values and replicated estimates, a p-value (PPP), scatter plot and histogram. χ2 fit function differences were found to be [-14.585, 22.300], [-18.232, 16.381], and [-17.844, 17.061] for Rasch, 1PL, and 2PL models, respectively. Associated p-values were 0.346, 0.545, and 0.505, respectively. All of the 95% CI's include zero, and associated p-values are high indicating that three IRT models fit the data very well. In addition, plots based posterior predictive checking (see Figures A3 and A6) also indicate that the fit of the model is reasonable.  Table 3 presents the item parameter estimates of the 2PL IRT model in Mplus with the two estimators.
As can be seen, the Bayes estimator in Mplus also yielded similar estimates to those obtained with MLE (see Table 3). For item discrimination parameters, the Pearson correlation value between Bayes estimates and ML estimates was .87. For item difficulty parameters, the Pearson correlation value between Bayes estimates and ML estimates was .999. As shown in Table 3, the transformed Bayesian estimates differed in the first and second decimal places compared to the MLE estimates.
Several plots are possible in the "View plots" submenu under the "Plots" menu of Mplus screen. A screen opens and shows several options that can be used to request graphical displays of several results.
As an example, Bayesian related plots for item 1 estimated with the 1PL IRT model are presented in the Appendix (see Figures A1-A7).

DISCUSSION and CONCLUSION
Dichotomous IRT models are typically estimated with both Bayesian and MLE estimation algorithms. Bayesian estimation of IRT models is sometimes preferable to MLE as MLE needs numerical integration, which can be slow or prohibitive depending on the numbers of dimensions of integration as a function of the numbers of latent variables. Tutorials do not yet appear to be presented in the psychometric literature for estimating dichotomous IRT models in Mplus using Bayesian estimation. In this paper, we provide a simplified step-by-step method for the estimation of dichotomous IRT models with Bayesian estimation.
Specifically, three dichotomous IRT models were analyzed using the five-item LSAT6 data set. Parameter estimates of these five items were compared for MLE and Bayesian estimations. Results suggested that there were some differences between item parameter estimates from MLE and Bayesian estimation. This is consistent with the results of previous research that showed comparable estimation results for MLE and MCMC (e.g., Kieftenbeld & Natesan, 2012;Luo, 2018;Paek, Cui, Öztürk Gübeş, & Yang, 2018;Wollack, Bolt, Cohen, & Lee, 2002). For instance, Luo (2018) found very close estimates between MCMC and robust ML (MLR) estimation of 2PL testlet model in Mplus. Kieftenbeld and Natesan (2012) have found little difference between the estimates obtained from MML and MCMC estimation of the graded response model. Similarly, Wollack et al. (2002) have also demonstrated that item parameter estimates from MMLE and MCMC methods were very similar under the nominal response model.
Bayesian estimation with non-informative priors should give asymptotically similar estimates as MLE.
As shown in this study, non-informative priors are the default for the Bayes estimator in Mplus, although several informative priors can be defined. The prior distribution is one of the most important aspects of Bayesian estimation. This is because prior distributions can substantially affect the posterior distribution especially for small sample sizes (Natesan, Nandakumar, Minka, & Rubright, 2016). Only noninformative priors were used in this study. However, it is possible to add informative or slightly informative priors to the estimation in Mplus software. Further studies may reveal the differential effect of using different priors in Mplus.  Figure A1. Bayesian posterior parameter trace plots.