Silent Predictors of Test Disengagement in PIAAC 2012

Although the effects of test disengagement on the validity of the scores obtained from the data set have been examined in many studies, the predictors of the disengaged behaviors received relatively limited scholarly attention in low-stakes assessment, in particular, in international comparison studies. As such, the present study with a twofold purpose sets out to determine the best fitted explanatory item response theory model and examine the predictors of test disengagement. The data were collected by using items measuring literacy and numeracy skills of adults from different countries such as Norway, Austria, Ireland, France, Denmark, Germany, and Finland participated in PIAAC 2012. The results of the model with item and person characteristics demonstrated that adults tended to be disengaged on very difficult items. Similarly, age has a negative effect on test-taking engagement for adults in several countries such as France and Ireland, while several predictors such as educational attainment, readiness to learn, and the use of ICT skills at home and work had positive effects on test engagement. In addition, females exhibit a higher level of engagement in Norway. Overall, the findings suggested that the effect of the predictors on disengagement depended on the domain and country. So, this study brings further attention that the role of test disengagement should be a prerequisite practice before reaching a conclusion from international large-stake assessments.


INTRODUCTION
Examinees are not always motivated to put their full effort into responding to test items, especially in low-stakes settings, such as the Programme for the International Assessment of Adult Competencies (PIAAC). (e.g., Finn, 2015;Wise & DeMars, 2010). The reason why low test motivation is often seen in low-stakes assessments can be revealed by expectancy-value models (e.g., Eccles & Wigfield, 2002). More specifically, as indicated by these models, achievement motivation is closely affected by factors, such as expectancy and value. The former factor is defined as the individual's expectation of achievement in responding to the test items and will be low if the item is too difficult relative to the ability of the individual. In the most general sense, the latter factor is related to the perceived importance and usefulness of the test. However, there is not a straightforward explanation since there are different aspects of value components, such as attainment value, intrinsic value, utility value, and perceived costs (Eccles & Wigfield, 2002). Both the combination of them and each aspect separately is considered to be low in low-stakes assessments. This is because, although there is a need to make a sufficient effort to respond to the test items correctly, the intrinsic motivation of some of the respondents is low, and the results obtained from the test are not vital for the respondents. Therefore, this results in a contradiction. There will be serious problems when the lower levels of motivation of individuals give rise to a low test effort (Wise & DeMars, 2010). These invalid responses cause construct-irrelevant variance and distortion of psychometric features (e.g., Rios, Guo, Mao, & Liu, 2017), leading to the misinterpretation of the results obtained from the data set (Nagy, Nagengast, Becker, Rose, & Frey, 2018). To put it in different words, the true scores of the individuals are contaminated by a systematic source of error due to their level of engagement in the test (Braun, Kirsch, Yamamoto, Park, & Eagan, 2011). In addition, disengagement gives rise to (a) inflated item difficulties, as well as deflated item discriminations (e.g., van Barnevald, 2007), (b) biased item and test information estimates (e.g., van 432 (Asseburg & Frey, 2013;Sundre & Kitsantas 2004;Wise, 2009). Therefore, according to the expectancy-value theory, individuals will attribute the same value to the areas measured in these practices. Consequently, there will be no individual differences in terms of test engagement. However, individuals' perceived expectations about their ability to answer items correctly change from one person to another, depending on the several characteristics that they have. In this regard, gender can affect their perception of the capability, and thus their engagement. Several studies in the literature indicate that males exhibit disengaged behaviors more frequently than females (e.g., DeMars, Bashkov & Socha 2013). Females tend to spend more time answering the items (Setzer, Wise, van den Heuvel, & Ling, 2013).
Although the education level and age of individuals may have a significant effect on the time they spend responding to an item in the test, it has been observed that the literature does not focus on this issue sufficiently. The investigation of this effect would help shed light on solving some unanswered questions in education. For example, highly educated individuals are committed to achieving several tasks and thus have sufficient competency (Organisation for Economic Co-operation and Development [OECD], 2016b); therefore, they may spend more time responding to an item. In addition, older adults may have the necessary knowledge and skills and tend to respond faster to items due to biological factors, such as fatigue and boredom so that they can complete the assessment as soon as possible (Xie, 2003).
Individuals' readiness to learn has an effect on their disengagement levels. It is closely related to whether adults have sufficient motivation, cognitive skills, and learning strategies to learn a task, feel curious about it, are interested in learning, look for associations among ideas, and believe that they can cope with a problem that they face (Smith, Rose, Smith & Ross-Gordon, 2015). Although the extent to which individuals have the characteristic to be measured by that test plays an important role in responding to a test item, in some cases, various factors also have a critical effect on responding behaviors. When these factors are not taken into account, invalid interpretations can be obtained by only looking at test scores (Nagy et al., 2018), At this point, considering that the test items in the PIAAC are given in a computer environment regardless of which domain measurement, the familiarity of the individuals with various technological elements such as computers and the internet will also have an effect on the individuals' behavior of responding to the test items as if they were insidious, silent factors. In other words, as a source of variation in the engagement levels of respondents, familiarity with information and communications technology (ICT) can also affect respondents' engagement. The frequent use of the ICT skills of individuals makes them familiar with computers, which increases the motivation, concentration and achievement of individuals in computer-based assessments (Mastuti & Handoyo, 2017). In addition, the extent to which the individuals use various skills at home and work can have an effect on how much effort they applied when responding to tests.
In the literature, it has been stated that several item-level variables have an impact on individuals' disengagement levels. According to the expectancy-value theory, if the individuals perceive an item as difficult by taking into consideration their competence, their engagement in the testing situation will be negatively affected. Some studies revealed that individuals put more effort into items which had moderate difficulty relative to their ability (Asseburg & Frey, 2013).
In conclusion, the importance of addressing these variables can be explained by analogy with the area above and below an iceberg. While there is only a small part of the total mass above the iceberg, there is a large part of it below, and this controls all the movements of the iceberg. At this point, the same logic can be used to explain the disengagement behaviors of individuals. In other words, in this study, these variables that make up the area under the disengagement as an iceberg will play an important role in explaining the disengagement behavior of individuals. To narrow the focus even more, when the effect of these person and item-level variables on disengagement is ignored, the difference in test scores due to disengagement could not be determined correctly (Braun et al., 2011). Thus, investigation of to what extent these variables explain the disengagement behavior is crucial.
It seems, however, that there has been extensive research on the topic of test-taking effort. Many of these endeavors possess several limitations: focusing on relatively homogenous populations based in a single country (Goldhammer, Martens & Lüdtke, 2017). To date, there have been very few studies that have examined potential differences in test-taking effort between countries in international assessments  (Rios & Guo, 2020), although their personal characteristics largely differ by culture/country (Brown & Harris, 2016). Also, regardless of the number of response categories, studies using traditional IRT models provide information on various individual or item-related characteristics such as respondents' abilities, cognitive levels, achievements, or difficulty and discrimination. Still, they are insufficient to identify systematic effects resulting from the design of the measurement process. In other words, they do not reveal common variability across items or individuals depending on the design of the measurement process or measurement tool. However, this information is very important in determining construct-irrelevant variance originating from various reasons such as cognitive, cultural, and biological factors (AERA, APA, NCME, 2014). Since data were collected in a nested design in the PIAAC study, analyses were done using explanatory item response theory models (EIRT), which allow to include several item and person characteristics as first-level and second-level units, respectively. Thus, this study begins to close this gap in the literature taking a closer look at the predictors of the test disengagement of adults from different countries. Examination of predictors provides the opportunity to obtain more detailed and appropriate results about the factors behind the disengagement of examines.

Purpose of the Study
The aim of this study was to examine the role of several item-and person-level variables on engaged responses in the domains of literacy and numeracy assessed in PIAAC 2012. Investigation of examines' responses on these domains is crucial since, in the most basic sense, the skills regarding numeracy and literacy contribute to the development of various high-level thinking skills, such as analytical thinking, understanding the information in a particular field. In particular, numeracy means more in everyday life than the mathematics we learn at school. In addition, the skills in these areas are used in many areas, from real life to education, business life, and communication with authorized persons (OECD, 2013c). Thus, in order to investigate examines' responses in terms of their engagement in tests requiring numeracy and literacy skills, the answers to the two related research questions were sought: 1. Which of the explanatory item response theory (EIRT) models (baseline model, a model with person characteristics, a model with the item characteristic, and a model with all person and item characteristics and the interaction between them) is best fitted to the PIAAC 2012 subdata?
2. To what extent does the engagement of adults in responding to items included in PIAAC 2012 be explained by person and item characteristics?

Sample and Population
The target population of this study included all non-institutionalized adults between age 16 and 65 residing in the country at the time of data collection and participated in Round 1 of PIAAC 2012. In this study, the reason for the selection of countries participating in Round 1 is the high number of countries participated in this round and to increase the representation and generalizability of the results. Another reason for choosing Round 1 is that the t-disengagement rates of the countries participating in only this round are clearly examined in relation to each other in the official report (OECD, 2019), which ensures that the selection of data sets is based on evidence.
In PIAAC, probability sampling was used (OECD, 2013b). In the present study, countries were selected according to their rates of t-disengagement, which represents situations where a respondent spends less time than specified as an item-specific threshold (OECD, 2019). Therefore, in the term "tdisengagement", "t" stands for threshold. the average percentage of t-disengagement, which is 15.70%, this country is classified as the country with a high percentage of t-disengagement. Accordingly, in addition to two countries such as France (21.50%) and Ireland (20.40%) with the highest percentage of individuals with t-disengagement, two countries such as Denmark (14.50%) and Germany (12.30%) where the percentage of individuals with t-disengagement is close to the average were selected. Also, three countries with the least percentage of individuals showing t-disengagement were selected (OECD, 2017) to represent better the pattern observed in the countries that participated in PIAAC 2012. From these examines, the ones who took the computer-based assessment of PIAAC 2012 were included as participants of this study. As a result, the sample of the current study includes 29959 adults from seven countries in total. Specifically, the frequency of these participants by the variables of the interest and countries were presented in Table 1.

Data Collection Instruments
In PIAAC 2012, whether the surveys to be used as data collection tools will be applied in the computer environment or in the form of paper and pencil is determined according to the success of the respondents in two tests that measure their ICT skills. If the respondents fail to reach a certain level in the first stage, they will be redirected to the paper-based core section. Furthermore, if the respondents who were successful in the first task fail the subsequent short test, they only participate in the paper-based assessment. To participate in the computer-based assessments, the respondents must pass both tests.
The data collection instrument of the present study contained the literacy and numeracy surveys administered in the computer-based assessment of PIAAC 2012 (Round 1). Fifty-eight items were included in the literacy survey assessing adults' ability to read digital texts, as well as traditional printbased texts. Additionally, 56 items were included in the numeracy survey assessing the adults' ability to use, apply, interpret, and communicate mathematical information. For each domain, the distribution of items by context was presented in Table 2 (OECD, 2016a). In order to get evidence for the reliability of the test scores, how much variance is explained by the model for each cognitive domain was computed. Accordingly, reliability coefficients of the results obtained from literacy and numeracy domains range from .86 to .90 (OECD, 2013b). These values are found to be acceptable because they are more than .60, which is the minimum cut-off criteria in social sciences (Zikmund, Babin, Carr, & Griffin, 2010).

Explanatory item-level and individual-level variables
Studies (Bridgeman & Cline, 2000;Masters, Schnipke, & Connor, 2005;and Yang, O'Neill, & Kramer, 2002) examining the factors that have an influence on the time individuals spend on responding to a test item have considered item difficulty, item type, content area, degree of abstraction, etc. as an item level variable. However, in this study, since not all items and thus their characteristics are released by the OECD, only the item difficulty variable (OECD, 2013b) is considered the item-level variable as taken by the similar study of Goldhammer et al. (2017).
The cognitive pre-test is a kind of short test given to examinees to determine whether they are directed to full computer-based assessment of PIAAC. It includes three literacy and three numeracy items of low difficulty. If the examines failed from this test, they will be given the reading components of the assessment. On the other hand, if they achieve this test, they will take the full assessment (OECD, 2013b).
In PIAAC, there are several demographic variables regarding examinees. One of them is gender. More precisely, in this assessment, examinees are required to provide information about their gender. Also, there is an item which assesses examinees' age in 10-year bands such as 24 or less, 24-34, 35-44, 45-54, and over 55. Another demographic variable assessed in PIAAC is educational attainment, which refers to the highest level of schooling. This categorical variable includes categories such as less than high school, high school, and above high school.
In PIAAC, examinees' readiness to learn is also measured. Specifically, there are six items focusing on the extent to which the examinees deal with problems and tasks they encounter. With these questions, they are asked how often they relate a new idea to the real-life situation and what they learned before, they are willing to learn something new, try to learn hard things in all details, and search for additional information to make it understandable when something they don't understand (Perry, Helmschrott, Konradt, & Maehler, 2017).
One of the variables measured in PIAAC is the use of ICT at work. There are a set of questions about the frequency of the use of computers or the internet as part of their job. More precisely, these questions focus on the use of e-mail, the internet for understanding job-related issues, conducting transactions on the internet, participating in real-time discussions on the internet, and the use of spreadsheets and word processing and the use of a programming language to program or write computer code. For measuring the use of ICT at home, the same questions were exposed to the examinees. However, this time these questions focus on the frequency of doing these activities in everyday life. All in all, examinees are divided into subcategories according to their frequency of using ICT at work, from those who use it least to those who use it most (OECD, 2015).

Data Analysis
The following procedure was followed to identify disengaged behaviors. If the time taken to respond to an item is below the threshold, it is considered that insufficient effort has been made for that item. To compute item-specific thresholds, the proportion correct greater than zero (P+>0%) method was used. Before seeking answers for the research questions, the time spent on the item was converted to a dichotomous engagement indicator (0 = disengaged, 1 = engaged) as an item response variable depending on whether the response time was below or above the response time thresholds. The variables cognitive pre-test score and item difficulty were centered and scaled to make a more meaningful interpretation of interaction effects.

Validity checks
In the present study, two validity checks were used to ensure that the threshold procedure employed accurately identified disengaged responses. In the first validity check, the engaged and disengaged response behaviors were compared in terms of their proportion correct (e.g., Wise & Kong, 2005;Wise & Ma, 2012). In order for the threshold determination process to be valid, the proportion correct for engaged behavior should be higher than the chance level, and the proportion correct for disengaged behavior should be at the level of chance. Considering that the items measuring verbal and numerical skills of adults in the PIAAC application have many response options, the probability of finding the correct answer by chance is very close to zero or zero. In the present study, the distributions of the observed proportion correct for responses classified as engaged or disengaged using the proportion correct conditional method ( P+>0%) were examined for each domain and country. Accordingly, it was proven that the proportion correct for disengaged response behavior was found to be close to zero or zero, whereas the proportion correct for engaged response behavior was much higher. As an example, the distribution of the proportion correct scores of the engaged and disengaged individuals in Norway for each domain is presented in Figure 1. In the upper part of Figure 1, the red line shows the proportion correct for engaged response behavior while the lower green line represents the corresponding proportion correct for disengaged response behavior. Figure 1 clearly shows that the proportion correct scores of the engaged individuals were higher than those of the disengaged individuals in Norway. A similar pattern was also observed in the other selected countries.
Another validity check for each item and domain was the examination of the association between the proficiency scores of individuals and the proportion correct of engaged and disengaged behaviors (e.g., Lee & Jia, 2014). According to the proficiency scores, individuals are divided into different groups referred to as score groups. In order for the threshold determination process to be valid, it is expected that there must be a positive relationship between the proportion correct and proficiency scores of the engaged responses for each item. No such relationship is expected for disengaged behaviors.
In the current study, the participants were divided into six score groups ranging from low competency to high competency as defined by PIAAC competency levels (OECD, 2013a) for both domains. Regardless of which plausible value is taken for examinees, individuals are at the same competency level defined by PIAAC. Furthermore, the plausible values were not used in the main analysis, but only as a proof of validity check. Therefore, in order to provide ease in calculations and interpretations, in assigning people to score groups, the mean of the adults' 10 plausible values regarding both domains was used. For each item, the relationship between the proficiency scores of the participants (i.e., an average of plausible values) and the proportion correct scores of engaged and disengaged response   Figure 2 shows the related findings for the selected literacy and numeracy items.

Journal of Measurement and Evaluation in Education and
Numeracy Item C605508

Figure 2. Association between the Score Groups and Proportion Correct Scores in Selected Literacy (C301C05) and Numeracy Items (C605508)
In both figures, the upper and lower lines show the association between the score groups and proportion correct scores for engaged and disengaged response behaviors, respectively. As expected, the association between the score group (plausible values) and the proportion correct for engaged response behavior was positive for all items in both domains.
Once the validity of the procedure for determining a threshold was proven, a 1-parameter logistic (1PL) item response model for each domain with dichotomous engagement indicators (0 = disengaged, 1 = engaged) was tested as an item response variable. 1PL models assume uni-dimensionality and equal discriminations across items. To determine the item fit, information-weighted (Infit) and unweighted (Outfit) mean-squared residual-based item fit statistics were inspected. If the infit and outfit values are between .5 and 1.5, it shows that the item fits the data (de Ayala, 2009). Thus, for each country and domain, very few items that did not fit the data were removed from the data set, which will not distort the representativeness of items. Specifically, for the countries Norway, Austria, Denmark, Germany, and Ireland, nine items were removed from the literacy survey, while seven items were removed from the numeracy survey. Furthermore, for Finland, three items were not included in the analysis of the responses to the literacy survey, while seven of the items were removed from the numeracy survey. Lastly, for France, the numbers of the items excluded from the data sets regarding the domain of literacy and numeracy were six and four, respectively.
Different EIRT models were constructed due to their flexibility to include the effect of the item and person-level variables simultaneously (Briggs, 2008). These models can be used for measurement and explanation purposes. The EIRM approach defines individuals as clusters, items as the repeated observations, and item responses as the dependent variable within a multilevel structure. In other words, the EIRT is of the multilevel models in which individuals' item responses are considered as the first- Accordingly, after testing the baseline model, Model 0 and Model 1 with personal characteristics, such as educational attainment, gender, age group, cognitive skill, readiness to learn, and use of ICT skills at home and work were tested. Model 2 included only item difficulty since an item characteristic was being tested. Finally, the full Model 3 was tested with item-and person-level variables and the interaction of item difficulty with cognitive skill. After running all models, likelihood-based fit statistics, such as the likelihood-ratio (LL) statistics, Akaike's information criterion (AIC), and the Bayesian information criterion (BIC), were determined. All models were estimated in the R environment (R Core Team, 2016). The TAM package (Kiefer, from the "lme4" package (Bates, Maechler, Bolker & Walker, 2015) was used to test explanatory item response models. The intra-class correlation (ICC) for each domain and country was computed to determine the proportion of variance in the dependent variable and the testtaking engagement that is attributed to personal differences. ICC is calculated by dividing the random effect variance by the total variance (Hox, 2002).

Model-Fit
For both literacy and numeracy domains, four explanatory IRT models were tested, and the LL, BIC, and AIC values were examined to determine the most appropriate IRT model for PIAAC 2012. There is no general rule about which model (the most complex or simpler) will fit the data. Therefore, in this study, although it was not predicted that Model 3 would definitely fit better before, it is predicted that item and individual-level variables may be effective on individuals' engagement levels. When the results were examined, it was found that Model 3 fitted the PIAAC 2012 data best because of the lower values of these indices. Therefore, the results of Model 3 were taken into consideration in this study. The model-fit results were presented in Table 3.

Differences in Test Engagement
For each country, the results regarding the effects of the item-and person-level factors on test-taking engagement are presented in Tables 4 and 5. As shown in Table 4, the difficulty of items measuring literacy had a negative effect on the engagement of participants in France (-.93) and Finland (-2.68), showing that when the item difficulty increased, adults tended not to give sufficient time to the items. On the other hand, the difficulty of items measuring numeracy was found to have no significant effect on the engagement of the adults. In addition to the main effect of item difficulty on engagement, the interaction between item difficulty and cognitive skill was also significant. Specifically, the effect of item difficulty on engagement was higher among strong test-takers who put more effort into solving items than poor test-takers who did not put sufficient effort into items.
Age had a statistically significant on the engagement of participants in literacy items in France and Norway. Specifically, as the age of the French participants increased, they tended to be disengaged. Additionally, there was a particularly strong decrease in the engagement rate of the oldest group, participants aged 55 or above in Norway (-.23). A similar pattern was also found for the domain of numeracy. Moreover, the significant negative effect of age on the engagement of the adults taking the numeracy items was observed in the countries of Ireland and Finland.
The highest level of educational attainment was associated with higher engagement in Germany (.22) and Finland (.35). In other words, individuals with a high level of education in Germany spent more  Table 5, it was found that educational attainment had no significant effect on engagement.
As it was clearly seen in Table 4, for Austria, Germany, France, and Ireland, the adults' readiness to engage in learning activities had a positive effect on their engagement on items addressing literacy skills. However, this was not the case for the participants from Finland. The adults' readiness to engage in learning activities which require the use of literacy skills had a negative effect on their engagement. The finding was that the adults who were highly ready to learn put insufficient effort into answering the items. For the domain of numeracy, as s presented in Table 5, a similar pattern observed for literacy domain was also found in France and Ireland in terms of the effect of adults' readiness to learn on their engagement levels. That is, as the level of readiness to learn of the adults increased, their testengagement levels also increased when responding to the items assessing numeracy items.
For the literacy domain, Table 5 shows that the effect of the use of ICT skills at home of individuals from each category in Austria on their engagement levels was positive and significant, suggesting that the test-takers who more frequently used ICT skills at home exhibited a higher level of engagement. In contrast, the use of ICT skills at home was negatively associated with the adults' engagement in numeracy in Ireland (-.79) and Finland (-.79), but the use of ICT skills at home for each category of the individuals in Norway was positively related to the students' engagement in numeracy.
When the effect of the use of the ICT skills of individuals at work was examined across all countries, according to Table 4, it was found that in Finland, those who more frequently used ICT skills at work tended to be more engaged while responding to the items measuring literacy. On the other hand, this was not the case for the field of numeracy. A negative and significant effect (-.35) of the use of ICT skills at work on the engagement of individuals in Austria was found, suggesting that the adults who used ICT skills frequently at work tended to be disengaged when answering the items in the test. When the findings regarding gender were considered, it was determined that for only the field of numeracy, in Norway, being female (.15) was found to be positively related to test-taking engagement.
For each country and domain, as presented in Tables 4 and 5, the ICC values taking into account the adults' test-taking engagement differences at the person level were found to be similar to each other.
Specifically, approximately 50% of the variation in engagement levels of individuals was attributable to differences between subjects.

DISCUSSION and CONCLUSION
This study aimed to determine which of the explanatory IRT model was the best fit for the analysis of the PIAAC sub-data. In addition, the present study aimed to investigate the effect of person-and itemlevel factors depending on the analysis of the model that best fitted the data. To achieve these aims, predictions were created utilizing different models for the domains of literacy and numeracy.
The conclusion of this study is that there is increasing disengagement in more difficult items measuring literacy skills, thus indicating that individuals spend little time on very difficult items (OECD, 2013a). When individuals perceive an item to be very difficult, they may tend to stop trying to understand and respond to the item very quickly. Considering that the data in this study belonged to the low stake assessment, the low motivation of the participants may have played a role in this outcome. Furthermore, whether a particular item is perceived as 'too difficult' depends on the cognitive level of the adult. The reason behind this finding is that there is a significant and positive effect of the interaction between cognitive pre-test and item difficulty on test engagement (Wise & Kingsbury, 2015). In other words, the significant effect of the interaction between item difficulty and cognitive pre-test shows that individuals tend to engage in relation to their cognitive skills.
Older adults tend to exhibit a higher propensity to disengage in both fields. Increasing disengagement by older test-takers in items in technology-rich environments may be related to their lower levels of ICT experience and skills (OECD, 2013a). They have more difficulty than their younger counterparts in using computers due to age-associated changes in visual, perceptual, psychomotor, and cognitive  (Xie, 2003), which may cause disengaged behaviors in testing.
Additionally, the present study revealed that more educated individuals were more engaged in the items assessing literacy. This finding is supported by the study of Goldhammer, Martens, Christoph, and Lüdtke (2016), in which the effect of educational attainment on the individual' disengagement was investigated. There may be several reasons for this result. Firstly, compared to individuals who are less educated, highly educated individuals are relatively more proficient and more likely to respond to more difficult items. Secondly, since those with higher education are more accustomed to testing and assessment environments; thus, they may get less tired than test takers with lower education levels. As a result, the former do not stop trying to give an answer to an item. Lastly, people with a high level of education may have a stronger sense of commitment to completing the assessment, which makes them put more effort into solving the items. Those people with a low level of education may have difficulty in understanding the items. They may not have sufficient literacy and numeracy skills (OECD, 2019), which can result in a tendency to respond to items quickly.
Individuals who are more ready to learn tend to exhibit more engagement in the items. The reason behind these results might be related to the composite feature of the readiness to learn, which consists of attitudinal or emotional, cognitive, behavioral, and, to a lesser extent, personality or dispositional components (Smith, Rose, Ross-Gordon & Smith, 2015). Therefore, individuals who are more ready to learn are more attentive, willing, and motivated to learn. Thus, they can easily concentrate on the items and complete them without getting bored (Eccles & Wigfield, 2002).
The current study concluded that adults who frequently used ICT skills at home and work engaged more than the adults that rarely used ICT skills. This finding is in line with the literature that suggests individuals with strong ICT skills engage more in a technology-enriched environment (Bergdahl, Nouri & Fors, 2019). This can be explained by familiarity with ICT which has an effect on the motivation and engagement of individuals (OECD, 2019).
It is concluded that gender has a significant effect on adults' engagement in items assessing numeracy skills, suggesting that engagement can be seen as a domain-specific construct (Goldhammer et al., 2016); for example, in Norway, females exhibit a higher level of engagement. This finding is also supported by the study of Marrs and Sigler (2012). They found that females tended to engage in the material at a deeper level, whereas males tended to display minimal effort.
Interpreting the results regarding literacy obtained from this study in terms of country groups according to t-disengagement percentages shows that the use of ICT skill had no effect, except for the test-taking engagements of countries with a low t-disengagement percentage. On the other hand, for the numeracy domain, there were several similarities in the effect of person-level factors on the same country groups. For example, the effect of age and readiness to learn on countries with a high t-disengagement percentage was similar. For the numeracy domain, age had a negative effect on test-taking engagement for adults in both France and Ireland, whereas readiness to learn had a positive effect. Additionally, it was concluded that some personal-level variables (age, gender, readiness to learn, and use of ICT skills at home and work) did not have an effect on the test-taking engagement of countries with a relatively moderate t-disengagement percentage.
To make more accurate evaluations, it is suggested that assessment practitioners should manage disengagement by identifying disengaged responses when obtaining test scores and filtering such responses in the data. Additionally, adults can be provided with valuable feedback regarding their performance (DeMars et al., 2013). One or more of these methods can be used for the validity of the results obtained from low-stake assessments. Underestimating disengaged responses may have significant negative consequences due to the potential high-stakes nature of international assessments for educational stakeholders and policymakers. By demonstrating the differential predictors of disengaged responses by country, this study revealed the potential for educational stakeholders to make inaccurate inferences when comparing subgroup performance across countries. For example, when comparing performance by gender, it is possible that score differences observed between males and females across countries may be confused with test-taking effort as opposed to true differences. Since such effects may be investigated as a basis for constructing national education policy reform, it is crucial that disengaged responses are identified and filtered before performing operational analyses (e.g., item analyses) and research analyses (Rios & Guo, 2020). These recommendations are some examples of how the results of this new study can be used and how they can benefit practitioners. However, in any case, the most important message that can be derived from this study is that the source of the differences in the scores of individuals in low-stake assessments may be their disengagement levels. Future research can be conducted to explore the extent to which these factors developed in recent years are effective in disengagement under low-stakes conditions.
The findings from this study offer practical uses; however, they are limited in a number of ways. Firstly, in this study, a selection was made from countries with different levels of disengagement, but not all countries participating in PIAAC 2012 were included. The findings of the present study cannot be generalized to adults; thus, further similar research is required. Secondly, this study used only one method to determine response time thresholds. Since there are many other methods to detect disengaged behaviors, future research can be conducted to compare the effectiveness of these methods. Despite the limitations of this study, it is considered that it draws further attention to the role of test-taking effort in international assessments and contributes to the discussion of investigating test-takers' effort as part of standard operational practices.