Investigating The Effect of Exposure-Control Strategies on Item Selection Methods in MCAT

This study aims to investigate the effect of different item exposure controlling strategies on item selection methods in the context of multidimensional computerized adaptive testing (MCAT). Additionally, this study aims to examine to what extend the restrictive threshold (RT) and the restrictive progressive (RPG) exposure methods suppress the item exposure rates and increase the exposure rates of underexposed items without losing psychometric precision in MCAT. For this purpose, the performance of four item selection methods with and without exposure controls are evaluated and compared so as to determine how results differ when item exposure controlling strategies are applied with Monte-Carlo simulation method. The four item selection methods employed in this study are D-optimality, Kullback–Leibler information (KLP), the minimized error variance of linear combination score with equal weight (V1), the composite score with optimized weight (V2). On the other hand, the maximum priority index (MPI) method proposed for unidimensional CAT and two other item exposure control methods, that are RT and RPG methods proposed for cognitive diagnostic CAT, are adopted. The results show that: (1) KLP, D-optimality, and V1 performed better in recovering domain scores, and all outperformed V2 with respect to precision; (2) although V1 and V2 offer improved item bank usage rates, KLP, D-optimality, V1, and V2 produced an unbalanced distribution of item exposure rates; (3) all exposure control strategies improved the exposure uniformity greatly and with very little loss in psychometric precision; (4) RPG and MPI performed similarly in exposure control, and outperformed RT exposure control method.


INTRODUCTION
The fact that test items are chosen sequentially and adaptively in computerized adaptive testing (CAT) has broken the traditional testing mode in which thousands of people respond to the same items at the same time.Nowadays, CAT is increasingly favored by test practitioners and researchers for its higher efficiency, shorter test time, and lower pressure compared to paper and pencil (P&P) testing.Another more fascinating characteristic of CAT is that different item response models can be applied, including unidimensional, multidimensional, and cognitive diagnostic models.
Multidimensional computer adaptive testing (MCAT) possesses the advantages of both multidimensional item response theory (MIRT) and CAT.On the one hand, a large number of studies based on different test conditions have declared that MCAT provides higher efficiency than unidimensional CAT.For example, Segall (1996) employed simulated data based on nine adaptive power tests of the Armed Services Vocational Aptitude Battery (ASVAB) to show that MCAT reduced by about one-third the number of items required to generate equal or higher reliability with similar precision to unidimensional CAT.Luecht (1996) demonstrated that MCAT can reduce the number of items for tests with content constraints by 25-40%.Further, Wang and Chen (2004) illustrated the higher efficiency of MCAT compared with unidimensional CAT under different latent trait correlations, latent numbers, and scoring levels.On the other hand, the fact that several ability profiles are estimated simultaneously indicates the ability of MCAT to offer detailed diagnostic information regarding domain scores and overall scores.The advantages of multi-dimensionality and high efficiency make MCAT better suited to real tests than unidimensional CAT.Hence, many studies on MCAT have considered real item banks, such as Terra Nova (Yao, 2010), American College Testing (ACT) (Veldkamp & van der Linden, 2002), and ASVAB (Segall, 1996;Yao, 2012Yao, , 2014a)).
Since Bloxom and Vale (1987) extended unidimensional CAT to MCAT, it has received increasing attention, and several breakthroughs have been reported in the last decade.Among the studies on ability estimation methods, the testing stopping rule, and item replenishing, item selection rules have become popular because of their important role in affecting the test quality and psychometric precision.Thus, most researchers focus on proposing new item selection indices to decrease errors in ability estimation.However, Yao (2014a) pointed out that most item selection methods tend to select a particular type of item, leading to the problem of unbalanced item utility.She also gave an example of the Kullback-Leibler index, which prefers items that have either a high discriminator at each dimension or significantly different discriminators among different dimensions.As another example, the D-optimality index tends to select items with a high discrimination in only one dimension (Wang, Chang, & Boughton, 2011).Nowadays, CAT is increasingly used in many kinds of tests.Hence, item exposure control is important in the application of MCAT, especially for its application to high-stakes tests.Furthermore, few studies have investigated this problem in MCAT.Hence, the goal of the present study is to examine the performance of some exposure control techniques along with item selection methods in MCAT.
To date, many of the exposure control methods used in unidimensional CAT have been generalized to MCAT.For example, Finkelman, Nering and Roussos (2009) extended the Sympson-Hetter (S-H) (Sympson & Hetter, 1985) and Stocking-Lewis (S-L) (Stocking & Lewis, 1998) methods to MCAT.They found that all the S-H, generalized S-H, and generalized S-L methods do well in controlling the maximum item exposure rates.However, simulation experiments to create the exposure control parameters are time-consuming.Furthermore, there still exist some underexposed items.In addition, Yao (2014a) compared S-H with the fix-rate procedure.The fix-rate procedure is similar to the maximum priority index (MPI) method proposed by Cheng and Chang (2009) for unidimensional CAT.She showed that the S-H method performs better in terms of test precision, whereas the latter gives a higher item bank usage and controls the maximum item exposure rate well.

The
-stratification method (Lee, Ip, & Fuh, 2008) is based on the principle of the astratification method (Chang & Ying, 1999).The item bank is stratified according to the absolute value of -stratification method is effective in combating overused items and increasing the item bank usage.However, this method cannot guarantee that no items are overexposed.Thus, Huebner, Wang, Quinlan, and Seubert (2015) stratification with the item eligibility method (van der Linden & Veldkamp, 2007) with the aim of enhancing the balance of item exposure.This combination method improves the exposure rates of underused items and suppresses the observed maximum item exposure rate.However, these two methods are restricted to tests with two dimensions.Constructing a suitable functional of the discrimination parameter for tests with more than two dimensions remains an important research problem.
It is well known that the uniformity of item exposure rates is affected by the numbers of overexposed and underexposed items.Of the above mentioned exposure control methods used in MCAT, the S-H, generalized S-H, generalized S-L, fix-rate, and item eligibility methods perform well in suppressing the maximum item exposure rates, and the  utility of underexposed items.Although the combination method used by Huebner, et al. (2015) performs well in both aspects, it is only suitable for tests with two dimensions.
The uniformity of item exposure rates and measurement precision are the two most important considerations during the application of MCAT to practical tests, especially for high-stakes tests.Because they always trade-off with one another, practitioners hope to find some item selection method that not only guarantees test precision, but also decreases the maximum item exposure rate while increasing the exposure rate of underexposed items.However, there are no methods that can effectively balance item exposure rates for tests with more than two dimensions.In addition, there are two other exposure control methods that have not been studied for MCAT: the restrictive threshold (RT) method and the restrictive progressive (RPG) method.It has been reported that they perform well in balancing the item exposure rate of cognitive diagnostic CAT (Wang, Chang, & Huebner, 2011).Therefore, the focus of the present study is whether RT and RPG can simultaneously suppress the maximum item exposure rates and increase the exposure rates of underexposed items without losing psychometric precision in MCAT.Further, their performance is compared with that of the MPI method.

METHOD
A Monte Carlo simulation study was conducted to evaluate and compare the effectiveness of the above exposure control methods.Matlab (version7.10.0.499) was used to write MCAT codes and run the simulation conditions.

Design of Simulation Study
Item bank construction: Although Stocking (1994) suggests that the pool should contain at least 12 times as many items as the test length, many simulation studies on MCAT have used a more restrictive item bank.For example, the item bank used by van der Linden (1999) contained 500 items while the test length was 50; Lee, et al. (2008) used an item bank of 480 items with test lengths of 30 and 60; and the item banks described in Veldkamp and van der Linden (2002) and Mulder and van der Linden (2009) contained fewer than 200 items while the test length was greater than 30.Thus, it is reasonable to construct an item bank of 450 items for a test length of 30.
To simplify the experimental conditions, most simulation studies generate item parameters and item responses according to M-2PL or M-3PL with the assumption that there are two or three dimensions (van der Linden, 1999;Veldkamp & van der Linden, 2002;Lee et al., 2008;Mulder & van der Linden, 2009;Finkelman et al., 2009;Wang, Chang, & Boughton, 2013;Wang & Chang, 2011).Hence, without loss of generality, the items in our simulation contained three dimensions, and the item parameters of the M-2PL model were generated in a similar way to those of Yao and Richard (2006) and Wang and Chang (2011) Examinees and item responses: All 5000 examinees were simulated uniformly from a multivariate normal distribution, as in previous researches (Wang & Chang, 2011;Yao, Pommerich, & Segall, 2014;Wang et al., 2013).Three levels of correlation were considered in the experiments.The mean ability was [0, 0, 0] and the variance-covariance matrix was:

Estimation of ability:
The initial abilities were selected from the standard multivariate normal distribution.MAP was used to update the domain abilities during the test, and multivariate standardized normality was applied as the prior distribution.
Evaluation criteria: The bias and mean square error (MSE) of each dimension were used to evaluate the precision of the ability estimations.The formula for bias and MSE are as follows: To assess the effect of exposure rates, we used (a) the number of items never administered and the number of items with exposure rates greater than 0.2, (b) the 2  statistic, and (c) the test overlap rate.The formula 2  statistic is as follows: . ) ( In the following sections, we first introduce the MIRT model employed in this study and the ability estimation method.Then, some item selection indices and exposure control strategies are described.The performance of four item selection indices with and without each of the three exposure control strategies under different latent trait correlation levels are examined through a series of simulation experiments.The results, conclusions, and discussion are given in the final two sections.

Multidimensional Two-Parameter Logistic (M-2PL) Model
MIRT models are usually classified as compensatory or non-compensatory based on whether a strong ability can compensate for other weak profiles.Bolt and Lall (2003) reported that both types are able to fit the data generated by non-compensatory models, but non-compensatory models cannot match the data generated from compensatory models.Thus, because of the advantages of compensatory models and the wide usage of MCAT in dealing with dichotomous items (van der Linden, 1999;Veldkamp & van der Linden, 2002;Mulder & van der Linden, 2010), the M-2PL model was adopted to simulate item parameters and generate item responses.
For some item j , M-2PL includes a scalar difficulty parameter , the item response function can then be described as: where

Ability Estimation Method: Maximum a Posteriori (MAP) Estimation
In this study, MAP is adopted for its competitive precision and easier computation compared to expected a posteriori (EAP) ability estimation method in MIRT.Yao (2014b) compared MAP, expected a posteriori (EAP), and maximum likelihood estimation (MLE) in a simulation experiment using item parameters estimated from the ASVAB Armed Forces Qualification Test.She pointed out that: (a) MLE generates smaller bias and larger root mean square error (RMSE), whereas MAP and EAP using strong prior information or standard normal priors produced higher precision in the recovery of ability, while EAP estimation takes a longer time than MAP.Recently, Huebner, et al. (2015) compared EAP with MLE in MCAT, and proved that EAP always produces more stable results and lower mean square error in the ability estimators than MLE.
denotes the likelihood function.Hence, the goal of MAP is to find the mode that maximizes the posterior density function Furthermore, Newton-Raphson iteration can be used to solve this equation (for more details see, Yao, 2014b).

Item Selection Methods and Exposure Control Strategies
To simplify the description, we first introduce some notation.
N represents the number of examinees, and L is the test length.Set R refers to the item bank, which has a capacity of M .Set express the remainder of the item bank and the temporary estimator after administering the first 1  k items, respectively.

Item Selection Methods
The following four indices are chosen as item selection criteria based on the consideration of computation complexity and running time.

D-optimality:
The Fisher information of each item in MIRT is no longer a number, but a matrix.Specifically, the Fisher information for the jth item in M-2PL is items have been administered, the estimators form an ellipse or sphere V as quickly as possible, Segall (1996) proposed that the kth item should maximize the determinant of the posterior test Fisher information matrix.Thus, the Bayesian item selection rule is expressed as Posterior expected Kullback-Leibler information (KLP): This method is obtained by weighting the KL information according to the posterior distribution of ability.That is, the kth item is selected according to The integral interval is generally narrowed to simplify the computation, and ( 9) is replaced with Minimum error variance of the linear combination score with optimized weight (V2): The weight that minimizes the SEM of the composite ability is named the optimal weight.Yao (2012) proved the existence of the optimized weight, and derived its formula as: .
located on the oth row and lth column.The procedure of V2 involves finding the optimal weight vector, then calculating SEM for each candidate item according to the optimal weight.Finally, the item with the lowest SEM is selected from the remainder pool.Note that the optimal weight is updated after administering each item.Thus, the only difference between V2 and V1 is in the determination of the weight used to compute

Item Exposure Controlling Methods
The RT and RPG methods proposed by Wang, et al. (2011) are two exposure control methods used in cognitive diagnostic CAT.Both can be easily generalized to MCAT.
The RT method: In the RT method, a shadow item bank is constructed at the beginning of each test by removing all overexposed items from the original item bank.Each item is then selected at random from the candidate item set constructed beforehand.Let "Index" denote the value of the item selection indices.
. Larger values of  give a shorter information interval length.As a result, the measurement precision is improved by decreasing the uniformity of the item exposure distribution.In summary,  is used to balance the requirements of item exposure rate control and measurement precision.In this study,  = 0.5 is favored.
The RPG method: The kth (k = 1, 2, …, L) item is selected according to formula (12) for D- optimality and KLP, and according to formula (13) for V1 and V2.These two formulas are as follows that SEM is always very large for the first several items, and decreases rapidly to less than 1000.Thus, it is better to set C to be greater than 1000.
The maximum priority index method (MPI): According to Cheng and Chang (2009), the priority index (PI) of item j with the requirement of the maximum exposure rate is expressed as where i n represents the administration frequency of item j , and " index " refers to the D-optimality or KLP index.Finally, the task of the MPI method is to identify the item with the largest PI.The role of C is similar to that in RPG.For V1 and V2, j PI should be changed accordingly, that is

Results of Ability Estimation
The ability estimations obtained from different MCAT algorithms were compared with respect bias and MSE statistics.Figure 1 depicts mean bias of the three ability dimensions under each item selection method and item exposure control methods with differing correlation between dimensions.

Figure 1. Mean Bias of the Three Ability Dimensions Under Each Item Selection Method
Figure 1 shows that the differences in bias between two arbitrary dimensions of each method were negligible regardless of item selection and exposure control methods.Moreover, one can observe from Figure 1 that the bias associated with D-optimality, V1, and V2 were similar, while greater than the bias produced by KLP which indicates that KLP outperformed other item selection method and effect of item exposure controlling methods on KLP and other ability estimation methods were negligible small.
Figure 2 presents the distribution of the MSEs of each ability dimension across the different item selection and exposure controlling methods at each correlation level.  2 shows that, for each dimension, KLP produces the smallest MSE and it was followed by D-optimality, V1, and V2.Generally, it is easy to sort the item selection methods into descending order of KLP, D-optimality, V1, and V2 according to their measurement precision.All three item exposure strategies led to an increase in MSE except for V2 item selection method.The MSE of V2 was larger than that of V2-RT in most of the cases.The decreased measurement precision may result from the characteristics of V2 in improving the item bank utility.
Overall, measurement precision tends to decrease when an exposure controlling method is employed The effects of item exposure control methods on the psychometric precision were checked through three aspects.First, from Figure 1, the item exposure strategies had no significant effect on the bias, since the biases produced by the same item selection methods using different exposure control methods were similar.Furthermore, when the item exposure control methods were combined with D-optimality, KLP, or V2, their performance differed considerably in terms of the measurement precision.However, all the item exposure control methods yielded similar measurement precision when combined with V1.In addition, a higher level of ability correlation seems to narrow the gap in the precision generated by different exposure control methods when combined with the same item selection method.
Finally, the RT exposure controlling method always produced the lowest MSE values, thus, giving higher measurement precision compared to RPG and MPI.Although their precision under different item selection indices varied to some degree, RPG and MPI performed similarly.The performance of RT and RPG was in accordance with that reported by Wang et al. (2011).Overall, the general order of different exposure control methods sorted by decreasing measurement precision was RT, RPG, and MPI, respectively.

Results of Item Exposure Rates
The item exposure rates and chi-square statistics associated with each item selection method with and without exposure controlling were presented in Table 1 and distribution of these statistics across different conditions were depicted in Figure 3 and Figure 4, respectively.
First, it is easy to infer from Table 1 that the exposure rates were distributed unevenly for Doptimality, KLP, V1, and V2.For instance, D-optimality and KLP yielded the largest test overlap and overexposed item rates and the lowest item bank usage rates which were depicted in Figure 3.
Although the number of never-reached items in V1 and V2 was close to 0, and the test overlap rates and 2  values were smaller than those of D-optimality and KLP, yet, these exposure rate control methods still produced unsatisfactory item exposure rate distribution.These characteristics can be clearly observed in Figure 4(a), where the exposure rates are depicted in ascending order for each of the four item selection indices.In addition, the results for V1 and V2 obtained from this study coincide with those reported by Yao (2014a).Second, all the exposure control methods improved the uniformity of exposure rates substantially in terms of increasing item bank usage and decreasing the overexposed item rates, test overlap rates, and 2  statistics.Although MPI performed similarly, RPG outperformed the other methods in most cases.It is apparent that all the item exposure distributions followed the same pattern when different item selection indices were combined with the same exposure control method.Hence, Figure 4(b) only illustrates the exposure rate distributions of the exposure control strategies combined with KLP.
In addition, different characteristics of the item exposure rate distribution were observed in different item exposure control methods.One can observe from Figure 3 that the item bank usage rate reaches 100% for all methods except KLP-MPI condition.In other words, all item exposure methods improve the item bank usage substantially.Checking the overexposed items, both RPG and MPI produced more overexposed items than RT under most test conditions.Generally, RT was able to control the item exposure rates to be lower than the allowable maximum value, whereas both RPG and MPI resulted in some items with exposure rates greater than 0.2.Further, it is worth pointing out some special findings when it comes to discussing certain exposure control methods.First, compared to D-MPI, V1-MPI, and V2-MPI, KLP-MPI generated a more unbalanced item exposure rate distribution.Second, when RPG was used with V1 or V2, there were always one or two items exposed to everyone taking the test.The internal results of V1-RPG and V2-RPG revealed that many error variance values in Matlab were labeled "NaN" in the case of choosing the first or second item.In other words, it can be inferred that the overexposed items in V1-RPG and V2-RPG were mainly due to the non-distinctive item information matrix in V1 and V2.Furthermore, the test overlap rate and 2  of V1-RPG and V2-RPG were affected by the first one or two administered items accordingly.Overall, although the item exposure control strategies produced different patterns of item exposure rates, they all considerably improved the balance of the item exposure distribution.This can be seen from comparing Figure 4(a) and 4(b).In addition, one can infer from the results that there appear to be trade-off between the measurement precision and employing the item exposure controlling methods.

CONCLUSIONS AND DISCUSSIONS
Many studies have acknowledged the advantages of CAT over P&P tests and computer-based tests with respect to the decrease in test length, increase in measurement precision, and better model fits.Along with the obvious advantages of MCAT, choosing the most appropriate item selection rule is a vital step for a successful application (Wang & Chang, 2011).Although the proposed item selection methods yield good results in precision, they are vulnerable to the issue of dealing with overexposed items (those that are used too often) and underexposed items (used too rarely).As a solution to this problem, different item exposure control methods have been adopted and used together with different item selection methods.
This study has examined the performance of four item selection methods combined with different exposure control methods in MCAT.Simulations showed that V2 outperformed D-optimality, KLP, and V1 with respect to higher item bank usage rates, fewer overexposed items, and lower test overlap rates.Generally, the results of all item selection methods without using item exposure control were unsatisfactory with respect to item exposure statistics.The results also indicate that without using item exposure control, the item selection indices could be sorted in order of psychometric precision as KLP, D-optimality, V1, and V2.In addition, when using item exposure control methods, the measurement precision tended to decrease for all item selection method.
When the item exposure rate distribution obtained from different item exposure control methods were compared, the RPG and MPI outperformed the other methods in most cases, while the RT method showed the worst performance.Furthermore, each item exposure control method yielded the same exposure rate pattern under different item selection methods.When it comes to comparing the measurement precision, the performance of the different exposure control methods could be ordered as RT, RPG, and MPI.This kind of trade-off between measurement precision, utility of item bank, and evenness of item exposure rate has been observed in many studies (Chang & Twu, 1998).In other words, the measurement precision needs to be sacrificed, to some extent, to keep the exposure rate at the desired value.
Both the present study and the work of Wang et al. (2011) showed that the measurement precision of the RT method was higher than that of the RPG method under the same test conditions, and the RT method performed slightly worse than RPG in the evenness of the item exposure distribution.In conclusion, among the three exposure control methods examined in this study, both RT and RPG offer balanced precision and item exposure control, whereas MPI performed well in controlling the item exposure rate with a noticeable loss in precision.
Several issues regarding item selection methods for MCAT deserve further investigation.First, although D-optimality, V1, and V2 are much faster than KLP, the run-time usually increases with the number of test dimensions.As a consequence, time-consuming methods can hinder the practice of MCAT in dealing with complex test conditions.In fact, the benefits of MCAT over unidimensional CAT mainly lie in the detailed cognitive information obtained based on multiple dimensions.Hence, there is a need for more work on algorithms that reduce the computation time of the item selection methods, or simplified and valid item selection methods based on existing rules, such as the two simplified KL indexes provided by Wang et al. (2011).Second, the test measurement precision of each dimension can be guaranteed by most MCAT item selection methods automatically, but thousands of other constraints are encountered in real tests.Hence, it would be useful to examine how to deal with non-statistical constraints in MCAT.

S
between the observed and expected item exposure rates.Finally, the test overlap rate was computed according to the expression proposed by Chen, Ankenmann, and Spray (2003denotes the variance of item exposure rates .Generally, smaller values of T  demonstrate more balanced item utility. Reckase, 1982), where T denotes the transpose and D is the number of dimensions.For an examinee with ability denotes a straight line in D-dimensional space.The compensatory features of M-2PL originate from the fact that all examinees giving equal  posterior density function of   is denoted by item.This method was called D-optimality byMulder and van der Linden (2009), and the item with the largest k D is chosen from the remainder pool.

.
Minimum error variance of the linear combination score with equal weight (V1): From the perspective of error variance, van der Linden (1999) suggested that the kth item should minimize the error variance of the composite score standard error of Mao, X., Özdem ir , B., Wang, Y., Xin, T. / Investigating The Effect of Exposure -Control Strategies on Item Selection Methods in MCAT _________________________________________________________________________ ___________________________________________________________________________________________________________________ ISSN: 1309 -6575 Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi Journal of Measurement and Evaluation in Education and Psychology 301 measurement (SEM) for composite score    .Yao (2012) derived the formula 2 where j er denotes the observed exposure rate of item j and max r denotes the allowed maximum exposure rate.Let  H be the maximum item information in 1 parameter  plays the same role and takes the same value as in the RT method.The constant C should be greater than all the SEMs; in this study, we set C = 10000.Note Mao, X., Özdem ir , B., Wang, Y., Xin, T. / Investigating The Effect of Exposure -Control Strategies on Item Selection Methods in MCAT _________________________________________________________________________ ___________________________________________________________________________________________________________________ ISSN: 1309 -6575 Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi Journal of Measurement and Evaluation in Education and Psychology 302

Figure 3 .
Figure 3. Item Bank Usage and Overexposed Item Rates for Each Method Under Different Correlations.

4
(a) the four item selection indices without item exposure control 4(b) the three item exposure control methods combined with KLP.

Figure 4 .
Figure 4. Item Exposure Rates of Different Methods Under the Correlation of 0.6 -6575 Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi Journal of Measurement and Evaluation in Education and Psychology 311

Mao, X., Özdem ir , B., Wang, Y., Xin, T. / Investigating The Effect of Exposure -Control Strategies on Item Selection Methods in MCAT _________________________________________________________________________
___________________________________________________________________________________________________________________ ISSN: 1309 -6575 Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi Journal of Measurement and Evaluation in Education and Psychology 297

Mao, X., Özdem ir , B., Wang, Y., Xin, T. / Investigating The Effect of Exposure -Control Strategies on Item Selection Methods in MCAT _________________________________________________________________________ ___________________________________________________________________________________________________________________ ISSN
: 1309 -6575 Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi Journal of Measurement and Evaluation in Education and Psychology

Mao, X., Özdem ir , B., Wang, Y., Xin, T. / Investigating The Effect of Exposure -Control Strategies on Item Selection Methods in MCAT _________________________________________________________________________
___________________________________________________________________________________________________________________ ISSN: 1309 -6575 Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi Journal of Measurement and Evaluation in Education and Psychology 299

Mao, X., Özdem ir , B., Wang, Y., Xin, T. / Investigating The Effect of Exposure -Control Strategies on Item Selection Methods in MCAT _________________________________________________________________________
___________________________________________________________________________________________________________________ ISSN: 1309 -6575 Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi Journal of Measurement and Evaluation in Education and Psychology 300 The candidate item set includes all items whose information values lie in 

, Özdem ir , B., Wang, Y., Xin, T. / Investigating The Effect of Exposure -Control Strategies on Item Selection Methods in MCAT _________________________________________________________________________
___________________________________________________________________________________________________________________ ISSN: 1309 -6575 Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi Journal of Measurement and Evaluation in Education and Psychology 303 MSE statistics provided in Figure

, Özdem ir , B., Wang, Y., Xin, T. / Investigating The Effect of Exposure -Control Strategies on Item Selection Methods in MCAT _________________________________________________________________________
___________________________________________________________________________________________________________________ ISSN: 1309 -6575 Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi Journal of Measurement and Evaluation in Education and Psychology

Table 1 .
Item Exposure Statistics Associated with Each Method

, Özdem ir , B., Wang, Y., Xin, T. / Investigating The Effect of Exposure -Control Strategies on Item Selection Methods in MCAT _________________________________________________________________________
___________________________________________________________________________________________________________________ ISSN: 1309 -6575 Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi Journal of Measurement and Evaluation in Education and Psychology 306

Mao, X., Özdem ir , B., Wang, Y., Xin, T. / Investigating The Effect of Exposure -Control Strategies on Item Selection Methods in MCAT _________________________________________________________________________ ___________________________________________________________________________________________________________________ ISSN
: 1309 -6575 Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi Journal of Measurement and Evaluation in Education and Psychology