The Effect of the Item’s Type and Cognitive Level on Its Difficulty Index: The Sample of TIMSS 2015

Accepted: 26.03.2020 In this research, the effect of an item’s type and cognitive level on its difficulty index was investigated. The data source of the study consisted of the responses of the 12535 students in the Turkey sample (6079 and 6456 students from eighth and fourth grade respectively) of TIMSS 2015. The responses were a total of 215 items at the eighth-grade level (115 multiple-choice and 100 open-ended) and 178 items at the fourth grade level (93 multiple-choice and 85 open-ended) in the mathematics test. In the present study, the difficulty indices for the items of different types and cognitive level were calculated firstly. Then, the item type was defined as a dummy variable and multiple regression analysis was applied to test for the predictive effect of the type and cognitive level of the items on its difficulty index. In the study, it was determined that both the type and the cognitive level of the item had a statistically significant effect on its difficulty index. Compared to multiple choice items, it was detected that students had more difficulty in open-ended items. The effect of item type on difficulty index was found to be higher in eighth grade items than the fourth-grade ones. It was ascertained that difficulty index moved toward zero as its cognitive level increased at both grade levels. However, the effect of cognitive level on the difficulty index was higher in the fourth-grade items compared to the eighth grade items.


Introduction
International exams play an important role in confirming students' success and shaping the educational policies of countries. These exams provide the opportunity to make a comparison between the participant countries and they present important data regarding the change in a certain country's success over the years. Moreover, the fact that international exams consist of student and school questionnaires besides achievement tests turns these exams into a crucial data source for studies that aim to specify the variables which influence success.
One of the international exams in which Turkey participates is TIMSS (Trends in International Mathematics and Science Study), which is directed once in every four years by IEA (International Association for the Evaluation of Educational Achievement). TIMSS aims at assessing student achievement in mathematics and science and it focuses on the basic skills included in the curriculum (MoNE, 2015). In this respect, the results obtained through TIMSS provide precious feedback for the education policy makers, curriculum developers and teachers as to the functioning of the educational system. TIMSS is conducted at fourth and eighth grades, and this makes it possible to carry out quasi-longitudinal studies by means of using the relevant data (see https://www.iea.nl/studies/iea/timss).
One of the most important aspects of TIMSS is that it contains items pertaining to different cognitive levels and types in achievement tests. TIMSS include items at three different cognitive levels, which can be listed as knowing, applying and reasoning, all of which are parallel to the cognitive levels in Bloom's taxonomy (Delil, 2006). The first level includes concepts and principles that students should know. The second level focuses on students' ability to apply what they know while they are solving the problems that they encounter and while giving answers to the questions. The third level is about students' ability to create a solution to unfamiliar, stepwise and complex problems that go beyond routine issues. These cognitive levels listed above are common for both grades (Mullis & Martin, 2013). While the tests are being created for TIMSS, strict attention is paid to ensure they all include items at these three levels for each subject field.
There is a variety of items in TIMSS not only in terms of cognitive levels but also regarding format. TIMSS tests include multiple choice and open-ended questions. Multiple choice items require students to choose the correct answer among the given alternatives, whereas students construct their own answers for the open-ended questions. Both multiple choice items and openended questions have some advantages as well as constraints. The prominent advantages of multiple choice items are that they can be scored in an objective, easy and quick way (Haladyna, 2004), and content validity and reliability can be ensured easily as a relatively large number of items can be administered in a short time (Linn, 2006). The restrictions of multiple choice items can be enlisted as flows: they are not suitable for assessing high cognitive levels, it takes quite a lot of time to prepare them, examinees can choose the correct answer by guessing (Bennett, Ward, Rock & LaHart, 1990) and some factors that fall out of the relevant ability focused on such as reading speed and reading comprehension can interfere with the actual assessment results (Doğan, 2009). On the other hand, open-ended questions are advantageous in that they are suitable for assessing high-cognitive levels (Bahar, Nartgün, Durmuş & Bıçak, 2010, they cannot be answered correctly only through guessing (Turgut & Baykul, 2012) and they can be prepared in a short time (Başol, 2013). That being said the restrictions involve the fact that asking few open-ended questions decreases content validity (Özçelik, 2011), and scoring them is difficult; time-consuming at the same time relying on rater judgements (Tekin, 2009). One of the most effective methods to be employed in order to take the most advantage of multiple choice and open-ended questions and thusly minimize their restrictions is to make use of these two types of items together. In parallel with this, multiple choice and open-ended items are used together in TIMSS as is the case in other national and international examinations such as PISA (Program for International Student Assessment) and ABİDE [Monitoring and Assessing Academic Skills (Akademik Becerilerin İzlenmesi ve Değerlendirilmesi)]. In TIMSS, multiple choice items are scored with 1 (0-1). On the other hand, open-ended questions are scored with 1 (0-1) or 2 (0-1-2) depending upon the content and the skill that is measured (see https://timssandpirls.bc.edu/timss2015/international-database/assessment-items.html)

Aim of the Study
This study aims at identifying the effect of the type and cognitive level of the item on its difficulty index by making use of the data gathered from TIMSS 2015. In line with this aim, first of all, the difficulty indices of the items of different cognitive level and type will be calculated. Then, the effect of the type and cognitive level of the item on the difficulty index was explored. The literature review shows that there are studies which analyse Turkey's success in international exams within the context of the type of the item (Demir, 2010, Köklü, 2017, Özer Özkan & Özaslan, 2018, Yılmaz Koğar & Koğar, 2019. This study is different from the previous studies in the literature in that it aims to test the predictive effect of the item's type on its difficulty index instead of depicting students' success in different types of items. Another aspect that differentiates this study from the others in the literature is that it intends to identify the effect of not only the type of the item but also the cognitive level of the item on the difficulty index. During the literature review, the researchers have not come across any study that analyses Turkey's success in international exams within the context of the cognitive level of the item. Moreover, this study is believed to be different from the other studies in the literature in that it attempts to identify whether the effect of the type and cognitive level of items on the difficulty index differs in TIMSS items at fourth and eighth grade level. When all these points are considered, it is thought that the results of the study will shed light to the ongoing discussions on multiple choice items and open-ended questions in high stake exams.

Research Design
This study aims at depicting the facts without any intervention. Therefore, it can be accepted to be a descriptive research (Coe, 2017).

Source of Data and Arranging the Data
The data source of this study consists of the responses of the 12535 students in the Turkey sample (6079 and 6456 students from eighth and fourth grade respectively) of TIMSS 2015 to the items in the mathematics test. The data source was reached via the official international data base of TIMSS (https://timssandpirls.bc.edu/timss2015/internationaldatabase/). The parts of the data base that were specifically considered for the purpose of this study were "Item Information Table" and "Item Percent Correct Statistics" located under the "Items" section.
Item information tables include information regarding item numbers (ID), the topic/content that they are related to, their type and cognitive level. Item percent correct statistics, on the other hand, give information on the scores that the students get for each item. These pieces of information represent the proportion of students that give correct answers to multiple choice questions and open-ended questions that are scored as 0-1 to the total number of students that sit the exam. When it comes to open-ended questions that are scored as 0-1-2, they express the ratio of students who give a fully correct answer and get 2, who give a partially correct answer and get 1, and who give an incorrect answer or does not provide any answer and get 0.
Item information tables and item percent correct statistics are shared as two separate sets of data on TIMSS official website. These two data sets were united with a view to realising the analysis within the scope of the study. To that end, the variables related to the type and cognitive level were taken from the data set of item information tables; the percentages of correct answers of students in Turkey sample were taken from the data set of items percent correct statistics. Afterwards, they were united in a single file on the basis of the item numbers. When the obtained data set was examined, it was seen that the percentage of correct answers given by the students belonging to five questions at eighth grade was not included in the item percent correct statistics, and hence these items were not included in the data set. Together with this, when the variable of alternatives regarding the multiple-choice items were examined, four items at eighth grade and seven items at fourth grade appeared to have two alternatives. As the proportion of these items to the total number of items was quite low, it was decided to take these items out of the data set. Consequently, 393 items in TIMSS 2015 mathematics test were included in the data analysis. The distribution of these items in accordance with their type and cognitive level are given in Table 1.

Data Analysis
The analysis process started with figuring out the difficulty indices of open-ended items by using Formula 1 (İlhan, 2019) below. When the multiple-choice items are in the question in TIMSS data set, the percent of correct answers represent the difficulty index of these items, so there was no need to calculate difficulty index once more for multiple choice items.
Formula 1 After getting the difficulty indices regarding each item, means of difficulty indices concerning items of different type and cognitive level separately for eight and fourth grade, along with figures that include these means were created. After that, stepwise multiple regression analysis was applied in order to be able to test the predictive effect of the item's type and cognitive level on its difficulty index. As the item type is a variable at a nominal scale with two categories, it was transformed into dummy variable prior to the regression analysis. Right after the transformation process, it was checked if the assumptions of the regression analysis were met or not. The first assumption that was tested in the study was compliance of the data with the normal distribution. For that purpose, coefficients of skewness and kurtosis of eighth and fourth grade items were examined. The obtained coefficients are given in Table 2. As is clear in Table 2, coefficients of skewness and kurtosis are all within the interval of ±1. According to Büyüköztürk, Çokluk and Köklü (2011), having coefficients of skewness within the interval of ±1 shows that the data do not present an important deviation from normal distribution. Having a Z value between the interval of ±2 after dividing coefficients of skewness and kurtosis by their standard error is another criterion that is considered acceptable for normal distribution in the literature . The values given in Table 2 meet this criterion. Therefore, it can be stated that the study data meet the normality distribution, which is the first assumption of regression analysis.
The second assumption of regression analysis is having a linear relationship between the dependent variable and independent variables. This assumption can be tested through correlation coefficients. When the correlation coefficients belonging to the relationship between the independent variables and dependent variable exceed the value of .30, it can be inferred that the related assumption is met (Pallant, 2005). In the current study, the dependent variable is the item difficulty index, whereas the independent variables are the type and cognitive level of the item. The correlation coefficients between the listed variables are presented in Table 3. As can be witnessed in Table 3, the correlations between the type and cognitive level of the item, and difficulty index are statistically significant for both eighth and fourth grade. In light of this, it can be announced that the assumption regarding the regression analysis that there is a linear relationship between the dependent variable and independent variables is met. The third assumption of the multiple regression prescribes that there is no multicollinearity between the independent variables. Having a correlation value of over .80 between the independent variables shows that there might be a multicollinearity problem, while correlation coefficients that are equal to or over .90 show that there is a serious multicollinearity problem (Büyüköztürk, 2010). When the correlation coefficients given in Table 3 are considered in accordance with these criteria, it is clearly seen that the data set of this study does not have a multicollinearity problem. After confirming that the necessary assumptions are met, multiple regression analysis was carried out. In the study, independent samples t-test and one-way ANOVA were also performed to see the effect of the item's type and cognitive level on its difficulty index more clearly. The data analysis was carried out through Microsoft Excel and SPSS 21.0 packaged software.

Results
This section starts with the figures that belong to the difficulty index in accordance with the type and cognitive level of items. There are two separate figures for TIMSS items at eighth and fourth grade level. Figure 1 given below shows the difficulty indices of the TIMSS items at the eighth-grade level.   Table 4.  As Table 4 indicates, the regression model that was created to predict the difficulty indices of TIMSS items is statistically significant for both eighth [F=42.45, p<.01] and fourth [F=22.24, p<.01] grade levels. At the first step of the regression analysis for eighth grade level TIMSS items, the variable of the item type took part and it could predict 25% of the variance regarding the item difficulty [R 2 =.25]. At the second step of the analysis, the variance that could be predicted increased to 28% as the variable of the cognitive level was included in the model [R 2 =.28]. In the present case, it is clear that the cognitive level of the item contributes to the model with a value of 3%. When Table 4 is examined, it is seen that β coefficients are positive for item type, whereas they are negative for cognitive level. It is necessary to interpret the results of the analysis in compliance with the multiple-choice questions as multiple choice items were coded as 1 and open-ended items were excluded after item type was transformed into dummy variable. In conformity with this, the fact that β coefficients regarding the multiple-choice items are positive stems from the fact that the difficulty indices of these items are closer to 1 when compared to the ones of open-ended items. The fact that coefficients of β regarding the variable of cognitive level are negative displays that the higher the cognitive level of an item is, the closer its difficulty index to zero, which means that the item gets more difficult for students.
The order of predictive variables included in the regression model for the fourth grade is different from the eighth grade. According to Table 4, the regression analysis respecting fourth grade TIMSS items firstly included the variable of cognitive level and it could predict 12% of the variance regarding the item difficulty index [R 2 =.12]. In the second step, the variable of the item type as well as cognitive level were included in the regression model and the predicted variance increased to 19% [R 2 =.19]. Thereupon, it can be articulated that the variable of item type contributes to the regression model with a value of 7%. What is more, β coefficients exhibit that the higher the cognitive level of an item gets, the closer its difficulty index gets to zero and that the difficulty indices are closer to zero in open-ended items when compared to multiplechoice items. That is, just like eighth grade students, fourth grade students had more difficulty in open-ended items alongside items that were at a high cognitive level.

Discussion and Conclusion
In this study, the effect of the type and cognitive level of an item on its difficulty index was analysed with the data set obtained from TIMSS 2015 mathematics test. The findings of the study clearly reveal that students are more successful in multiple-choice questions than they are at open-ended questions no matter what the cognitive level of the item is. As a result of the regression analysis, type of the item was found to be a significant predictor of the difficulty index at both eighth and fourth grade level. These findings are parallel to the results of the studies by Demir (2010) and Yılmaz Koğar and Koğar (2019). Demir (2010) conducted a study that compared the students' success in Turkey in different item types included in the cognitive field test in PISA 2003 and PISA 2006. At the end of this study, it was determined that the type of the item in which the students in Turkey had the greatest success in the sub-fields of reading skills and science literacy was multiple-choice questions. It was also clear that for the sub-fields of maths literacy and problem-solving skills, students' success in structured item types (multiple choice, complex multiple choice and semi-structured questions) was higher than it was in item types in which students were expected to construct the answer themselves (shortanswer, open-ended). Similarly, Yılmaz Koğar and Koğar (2019) figured out the difficulty index of the items included in the science literacy cognitive test in PISA 2015 and detected that open-ended questions were more difficult than multiple-choice questions. Pepple, Young and Carroll (2009);Temizkan and Sallabaş (2011);Temel, Dinçol Özgür and Yılmaz (2012); Gayef, Oner and Telatar (2014); Delaram and Safiri (2014); Thawabieh (2016) as well as Öksüz and Güven Demir (2019) carried out studies in which they compared students' success in multiple-choice and open-ended questions not with the data obtained from national or international exams but with the tests conducted in class, and they got similar findings.
There are also some studies in the literature that have different results from the results of this study in terms of the effect of the item type on the difficulty index. Hastedt and Sibberns (2005) analysed the differences in success between open-ended and multiple-choice questions with the data set obtained from TIMSS 1995 and TIMSS 1999, and they found out that there was no statistically significant difference between students' success in different types of items included in maths test. The difference between the study results obtained by Hastedt and Sibberns (2005) and the findings of the present study can be explained by the fact that a country's education system influences students' success in parallel with their performance in different item types. Indeed, Hastedt and Sibberns (2005) pinpointed that the effect of item type on students' success differs from country to another and reported that student's academic achievement is lower in open-ended questions when compared to multiple-choice ones in Eastern European countries.
According to one of the remarkable findings of this study, the effect of the item type on the difficulty index was found to be 25% at eighth grade while it was 7% at fourth grade although the effect was seen to be statistically significant at both grade levels. This difference might be a reflection of high-stakes exams that are a crucial part of Turkish education system. The reason is that almost all of the high-stakes exams held in Turkey are composed of multiple-choice items, and open-ended questions are not included in these exams. Such exams inevitably impact the overall educational system (Çıkrıkçı Demirtaşlı, 2010). Especially high-stakes exams applied within the scope of transition to high school turn in-class activities, homework given to students and students' study habits into a test-focused nature as grade level proceeds from primary school to secondary school. Correspondingly, students at higher grade levels come across open-ended items less, and so questions in open-ended format turn out to be more difficult for students. Thence, the effect of the item's type on its difficulty index is more limited at lower grade levels, whereas it is more significant at higher grade levels. Consistent with this, at the end of the study carried out by O'Leary (2001) to investigate countries' rankings in TIMSS in terms of item type, it was discovered that the item type that students mostly come across leads students to have different levels of success in items that are of different types. According to O'Leary's (2001) study, students who live in countries where multiple-choice items are frequently used are more successful in multiple-choice questions, whereas students who mostly come across open-ended items have a higher academic achievement in this type of questions. Eventually, item type turns out to be the most significant determinants of the item difficulty index at higher grade levels where students mostly encounter multiple-choice questions. In this respect, the steps taken by the Ministry of National Education and the Student Selection and Placement Centre to include open-ended questions as well as multiple-choice items in high stakes exams are important in terms of distancing student achievement from being a function of item type.
It was found out at the end of the study that the cognitive level of an item is a significant predictor of its difficulty index at both eighth and fourth grade level and that the difficulty index gets closer to zero as the cognitive level of the item increases. It was figured out that students had more difficulty in items with high cognitive level. This finding is in parallel with the study results of Kim, Patel, Uchizono and Beck (2012), Nevid and McClelland (2013), Veeravagu, Muthusamy, Marimuthu and Subrayan, (2010), and Koçdar, Karadağ and Şahin (2016), whereas it is in contradiction with the results of the studies carried out by Kibble and Johnson (2011), and Momsen et al. (2013). The fact that there is not a consistency between the findings of different studies that concentrated upon the effect of an item's cognitive level on its difficulty index can be attributed to differences in the content of the items that were included in these studies. To put it in a different way, an item which is at the remembering/knowing level can sometimes be more difficult than an item at the applying or a higher-level one because of its content. Momsen et al. (2013) exemplified this case with an item included in the study carried out by Nehm and Schonfeld (2008). In the study conducted by Nehm and Schonfeld (2008), it was expected that an item about natural selection could be easy for the participants as it was at the remembering level. Nevertheless, it was seen that the participants had difficulty with the item just mentioned. In this sense, the fact that different studies' coming to contradictious conclusions about the effect of an item's cognitive level on its difficulty index can be explained by the fact that qualities such as the content of the item, the subject field that it is related to, the way it is expressed, and others are also amongst the factors affecting the item difficulty index. This can also be the reason why the predictive effect of an item's cognitive level on its difficulty index is different for eighth and fourth grade TIMSS data set.

Recommendations
When it is considered that students have more difficulty in open-ended questions when compared to multiple-choice questions, it can be suggested that measurement and evaluation studies within the educational process should include more open-ended questions. When students come across open-ended items more, this may help them to experience less difficulty in dealing with this type of items. Considering that students have more difficulty in items at the cognitive level of reasoning, it should be noted that the process of learning and teaching should employ methods and techniques that will contribute to the development of high level mental processes and measurement and evaluation activities should also include items that focus on assessing high level cognitive skills. Lastly, as this study was carried out with the data set obtained from the items included in TIMSS 2015 maths test, a similar study can be carried out with the items included in science test or with the items included in ABİDE, which is accepted to be the national counterpart of TIMSS and PISA.