Developing an Achievement Test about 7th Grade “Solar System and Beyond” Unit: Analysis of Validity and Reliability

The aim of this research is to develop a reliable and valid achievement test to measure academic success of pupils about 7th grade’s “Solar System and Beyond” unit. For this reason, depending on the objectives of “Solar System and Beyond” unit, which is included in middle school science program, which was published in 2018, 42 multiple-choice test questions were prepared. The clearness of the test questions, cohesion with the objectives and scientific knowledge were designed with the care of various sights of the authorities in teaching science field which depend on the technic which was suggested by Lawshe (1975). According to this, content validity score was calculated as .94. The pilot study for this test put into practice with 254 students who had studied 7th grade in 2018-2019 academic years. As a consequence of the item statistic which was realized in the process of test development by the answers of the students for each question, difficulty score and item discrimination were calculated for each of the item. As a consequence of item statistic, 8 items were excluded from the test and the last form of the “Solar System and Beyond Academic Achievement Test” was designed with 35 questions. As a result of the analysis, the last form’s KR-20 reliability co-efficient was calculated as .87. Average item difficulty index was calculated as .61 and average item discrimination index was calculated as .48. According to this outcome, average item difficulty was identified as midlevel, and average item discrimination was identified as high-level.


Introduction
Assessment and evaluation instruments have used in every level of education system.
There are questionnaires, oral examinations, true/false items, multiple choice tests, matching questions, fill-in-the blanks questions, written examinations, open-ended questions and some other assessments and evaluation instruments (Kempa, 1986). Each of these instruments has some flaws and also superiorities depending on the aim and students' group. When the literature analyzed it was seen that one of the most used assessment and evaluation instruments is multiple choice test because of its superiorities (Özçelik, 2011). There are lots (2000) also. Trumper (2000) designed a 19 items multiple choice test by collecting and changing the questions which were used in literature before, and he practiced it with the teacher candidates (Trumper,2001c) middle school students (Trumper, 2001a) and high school students (Trumper 2001b). By this means he tried to determine the students' misconceptions about basic astronomy. Zeilik (2002), with a group of astronomy lecturers, developed Astronomy Diagnostic Test (ADT) which includes standardized conceptual multiple-choice questions and defines the most popular astronomy misconceptions. One of the strong features of ADT is that, while the teachers regard the questions as easy the students do not regard them easily. Lindel & Sommer (2004) developed phases of the moon concept inventory (LPCI) aiming to help teachers to evaluate students' phases of moon cognitive models. Keller (2006) fulfilled two research; one of them includes Mars's surface composition and the other one included planetary science education research. By this means he developed greenhouse effect concept inventory (GECI) with the aim of evaluating the conceptual change of greenhouse effect. Bardar (2006) developed various teaching materials with the aim of making students understand the light and spectroscopy concepts and also he designed light and spectroscopy concept inventory (LSCI).
On the other hand, Türk (2010) aimed to evaluate readiness of students about basic astronomy concepts which are included in ‚Solar System and Beyond ‚Unit, and effects of planetariums and observatories on teaching the basic concepts of this unit; Küçüközer, Bostan and Işıldak (2010) stated the ideas of elementary mathematics teaching department's 2 nd graders about basic concepts of astronomy before education and after education and effects of education on conceptual differences; Düşkün (2011) analyzed that development a Solar-Earth-Moon model and its effect on success of science education department students; Gündoğdu (2014) analyzed the correlation between 8 th graders success and conceptual comprehension level and their attitude towards science lesson. Çeliker and Balım (2012) analyzed the effects of project-based teaching technics on students' success in solar system and beyond: space riddles unit. Arıcı (2013) have searched effects of virtual reality programs on success and learning permanency of students about astronomy. Williamson (2013) developed Newton Gravity Concept Inventory (NGCI) which is a multiple-choice evaluation instrument for characterizing the cognitive models of students about gravity. Slater (2014) developed Test of Astronomy Standards (TOAST) which is an extensive evaluation instrument with the aim of evaluating students' knowledge about content of astronomy. Gülen and Demirkuş (2014) determined the effects of using visual materials in solar system and beyond: space riddles, on students' success. Çepni and Çoruhlu (2014) analyzed the effects of learning conditions which are proper for 5E education model on students' success in solar system and beyond: space riddles unit. Ürün (2015) analyzed the effects of process evaluation method on students' academic success and attitudes in solar system and beyond: space riddles unit. Kaya (2015) realized the effectiveness of techonology advanced directory materials which were developed considering cognitive load theory, in solar system and beyond: space riddles unit. Taşcan and Ünal (2016) researched the analyzing of science teachers' knowledge about fundamentals of astronomy according to demographic variables. Şahin (2016) analyzed the effects of computer-assisted education on students' success and attitude in solar system and beyond: space riddles unit. Demirçalı (2016) analyzed the effects of modelling-based science education on academic success, scientific process skills, and mental model development of students. Albayrak, Yalçın and Yalçın (2017) stated the effects of station technique on academic success of students. Çoban (2017) analyzed the effects of 3D computer models on academic success in science education. Kırıkkaya and Şentürk (2018) analyzed the effects of using augmented reality technics in solar system and beyond: riddle of the space unit on students' academic success. Coşkun (2018) analyzed the effects of the education, which is supported by augmented reality and mobile applications, on academic success of students in solar system and beyond: space riddles unit. Kalkan (2018) analyzed the effectiveness of teaching objectives of ‚solar system and beyond: space riddles‛ unit, with the material and model supported activities. Uçar and Aktamış (2019) studied on development of achievement test and attitude scale about astronomy for 7 th grade ‚solar system and beyond: space riddles‛ unit. Şekercioğlu and Akkuş (2019) analyzed the effects of the drama techniques on students' academic success in ‚solar system and beyond: space riddles‛ unit. Gökçe (2019) analyzed the effects of STEM technics on academic success and permanency in solar system and beyond: space riddles‛ unit. Demir and Armağan (2019) developed an astronomy achievement test. It is claimed that, by this research this deficiency will be remedied in the literature. In this research a reliable and valid achievement test developed for evaluating the students' success at ‚solar system and beyond‛ unit caring the test development process.

Research Pattern
In this research scan pattern from the quantitative research methods was used.
Frankel, Wallen & Hyun, (2012) identified the scan pattern as the research which are practiced on all population or a group of samples which were taken from the population, with the aim of making generalizations for the population in the selected samples which have so many similarities.

Participants
The pilot study of this study was applied 254 seven grade students (128 females, 126 males) who study three different secondary schools in Gelibolu, Çanakkale/Turkey. These state schools which belong to ministry of national education were chosen randomly without any care of academic success. Sample distribution according to the schools is demonstrated on Table 2.

Data Collection Tools
In this research academic achievement test belonging to solar system and beyond unit was used as the data collection tool. In the process of the education, evaluating the subconcepts of the unit, assessing the evaluation, and evaluating how much the unit was learned were provided by using a multiple-choice test as an achievement test. The academic achievement test belonging to solar system and beyond unit which is used in research were prepared caring the aims of solar system and beyond unit of ministry of national education's science teaching program. While the test items were being creating by the researchers, 7 th grade science lesson books which was prepared by education ministry, leaf tests, various questions which were asked in various exams, and the exams for secondary school students took in consideration. In this context 42 multiple choice test items (each of them has 4 choices), which are in consisting with 7 th graders readiness level, were created.

General Description about the product
In the solar system and beyond unit there are two sub-topics. Space researches and beyond the solar system: celestial bodies. Totally 16 course hours for the 10 objectives of the units has given in the education ministry teaching program (MoNE, 2018). The sub-topics and contents are demonstrated on Table 3.

Data Analysis
In order to analyze the data which were collected from the participants during the test, for each item these things were statistically calculated; standard deviation, arithmetic mean, item discrimination, item difficulty, skewness-kurtosis test for normalization test, Point biserial correlation, KR-20 for correlation and reliability calculations. For, Test Analyzed Program (TAP version 4.2.5) was used.

Findings
Development of achievement test process which had been developed by Hanson and his colleagues (1980) was applied after the literature which is about academic achievement test belonging to solar system and beyond unit, had been analyzed. Hanson & his colleagues (1980) reported the achievement test development process under three criteria as analyzing about teaching (description), preparing a test (applying), and test verification process (analysis) according to this, academic achievement test which was belonging to ‚solar system and beyond‛ unit were developed by taking into considering, designing, item writing, item analyzing, and item choosing process consideration. The achievement development process which was used in this study has shown on Table 4.

Analysis of Validity
In this study at least 3 test items were prepared for each objective while the academic achievement test belonging to solar system and beyond unit. The content validity of the test items was calculated by the method which was developed by Lawshe (1975). In Lawshe (1975) method; in order to find content validity ratio and content validity index at least 5, at most 40 expert opinions are needed. For this purpose, 2 academicians from Çanakkale 18 Mart University, 2 doctorate students, and 4 science teachers share their opinions as experts.
There are 3 statements I expert report; which are ‚proper‛, ‚needs to be edit‛, ‚needs to be excluded‛ for the purpose of evaluating each of the items in the academic achievement test belonging to solar system and beyond unit. According to the opinions of the experts which were concluded from the expert evaluation form, content validity ratio was calculated for each item. Veneziano and Hooper (1997)  seen that the all items which are kept on the scale, statistically meaningful (Lawshe, 1975).
On the other hand, for the face validity of the academic achievement test belonging to solar system and beyond it is consulted with an academician from science teaching department, a science teacher and a language expert. And according to their feedbacks, after making corrections, pilot scheme section has begun. In addition to this, table of specifications which had been prepared for specifying the content validity and a dashboard including at list three items for each objective are demonstrated on Table 5. Students will be able to explain forms of galaxies 36, 37, 38, 39 Students will be able to explain universe concept 40, 41 * , 42 * Total 42 * According to pilot study, excluded items from the solar system and beyond academic achievement test The achievements of the prepared test and each item in the test were prepared according to the processes of Haladyna (1997) Taxonomy. Attention was paid to preparing a question from each achievement. A wide variety of sources were used when preparing test questions. Accordingly, in the application of the pilot study solar system and beyond unit there are 2 subtopics which are research about the space (26 items) and beyond the solar system: celestial bodies (16 items). Results of the pilot study have given on Table 6.

Results of the Normality Test
The data on the sketch form of academic achievement test of solar system and beyond unit's arithmetic mean was calculated as 26.0 (65.5%) and median was calculated as 25.28 (58.8%) where the maximum score is 42 for this achievement test. Based on mean and median score, it can be referred that the distribution is normal (not skewed to right or left side) on the horizontal axis. In addition to this Skewness and Kurtosis values of the properness of academic achievement test of solar system and beyond unit's form was were identified. To calculate the skewness and kurtosis values as -.320 ± .163 and -.743 ± .304 was interpreted that the points do not demonstrate a meaningful deviation between [-1, +1] (Clements, 1999). Consequently, the result of kurtosis and skewness demonstrated that the achievement test scores of the students did not get significantly different. Moreover, coefficient of skewness for sketch form of the academic achievement test of solar system and beyond unit -.320 indicates that the distribution is right skewed (positive skewness). To be coefficient of kurtosis -.743 indicates the distribution is under the Gaussian distribution.
Similarly, the point of median under the arithmetic mean has supported all of these results.

Item Difficulty and Item Discrimination Score
While the data were being analyzed from the sketch form of academic achievement test of solar system and beyond unit students who answered correctly were given 1 point.
Furthermore, students who answered wrong, or the students who didn't answer, or students who answered the two choices for the same question were given 0 point. Since there are 42 questions, the maximum point is 42. And the total score was calculated for each student. The scores were put in an order as from the high to low via TAB program. The first 27% part was identified as upper group and the last 27% part was identified as lower group. Item statistic was realized depending on upper group and lower group statistics. Item analyses was given on table 6 for the sketch form of academic achievement test of solar system and beyond unit according to the lower group and upper group correct answer scores. 486 who answered (Tan, 2005). Item difficulty index can be the various numeric values between 0-1. Item difficulty demonstrates the ratio of correct answers for the item (Gajjar, Sharma, Kumar & Rana, 2014). Bayrakçeken (2007) stated that the item difficulty index could be proper if it is around .50. Sözbilir (2010) reported that that an item difficulty level (pj) of an item was between .00-.19, means that item was a very difficult item, an item difficulty level (pj) of an item was between .0.20-0.34 means that item was a difficult item, an item difficulty level (pj) of an item was between .34-.64 means that item was an average difficulty level item, an item difficulty level (pj) of an item was between .65-.79, means that item was an easy item, an item difficulty level (pj) of an item was between .80-1.00, means that item was a very easy item (Sözbilir, 2010). Therefore, the level of the item difficulty increases as much as the difficulty level of an item approaches to 0 and, the level of the item difficulty decreases as much as the difficulty level of an item approaches to 1. According to table 6, item difficulty level can be a value between .12-.86. Item discrimination index defines the power of distinguishing the students in upper group and in lower group of an item. To keep or not an item on an achievement test is decided depending on the item discrimination index.
According to Tan (2005) item discrimination level can be a value between -1 and +1 so that the values which are close to 1 demonstrate the item discrimination level of the item is high.
Özcelik (2011) states that the items whose item difficulty index (rjx) is negative or zero should not be in the test, and in the event that lower than .20 that should not use or should prepare again. Also Özcelik (2011) stated that the item discrimination index between .20-.29 item had to be revised; or that the item discrimination index between .30-.39 item's discrimination was acceptable level; and the item discrimination index between .4 or higher than .4, the item's discrimination was a very high level. According to

Item Analysis Predicated on Correlation
Adjusted biserial correlation is used in order to identify the total item correlation in item analysis predicated on correlation. Adjusted biserial correlation is used to identify the relation between a continuous variable and a real discontinuous variable having 2 categories (Büyüköztürk, 2010). According to this, there is a statistically significant relationship between the scale score which is got from the total value of the data which are got from the form of academic achievement test of solar system and beyond unit and the score which is got from each of the items of the test. For each item on the test, biserial correlation was calculated by giving 0 points for the wrong answers or the questions which wasn't answered and giving 1 point for the correct answers. Total item correlation explains the relation between the total score that the participants get from the achievement test and the score between the participants get from each item. Büyüköztürk (2010)   questions which were excluded from the sketch form of academic achievement test of solar system and beyond unit did not have a negative effect for the content validity. Tekin (2010) stated that if the item discrimination index of an achievement test is .40 or above, the discrimination power is of that item is ‚high‛. According to this, being the average item discrimination index 48 means that the discrimination level of the form of academic achievement test of solar system and beyond unit is high. As a conclusion it can be said that the item difficulty level and item discrimination level are in a good level. In addition to this, besides the statistical values, arithmetic mean, standard deviation, variance, and reliability are calculations were repeated for both the sketch and final forms of the form of academic achievement test of solar system and beyond unit.  Guilford (1956).

Discussion and Conclusions
Besides the evaluation is an integral part of the process of education, it is also important to determine how the teaching-learning process occurs. The most important outcomes in this process are the ones about students. While evaluating the students' knowledge and skills mostly multiple-choice tests are used. Multiple choice tests are one of the most preferable instruments in our education system. Some of the reasons of this are objectivity and easy grade taking process, high content validity level, and opportunity to ask questions in different levels (Burton, Sudweeks, Merril & Wood, 1991). The achievement tests which evaluate the academic success are not only practiced for the knowledge level of students in the things they learnt but also, they are instruments for teaching. Students can learn the terms and subjects which they couldn't learn in the classes, via multiple choice achievement tests. The developed item samples instigate the students think on the questions.
For this reason, the developed item samples will be strengthening their being in use characteristics. In the literature, when multiple choice tests about Earth and Universe are examined, it is seen that most of the tests examine the concepts of meteor, meteorite, stars,  (Slater, Schleigh, & Stork, 2015;Wallace, 2011). However, among these studies, it has been determined that the success tests developed for the acquisitions of the "Solar System and Beyond" unit are quite limited within the framework of the studies given in Table 1. Therefore, the aim of this research was to develop a reliable and valid evaluation material to evaluate the students' success about 7 th grade science lesson solar system and beyond unit. In the process of development of the form of academic achievement test of solar system and beyond unit at first its pilot scheme and after that its item analysis were applied.
In the direction of the objectives which are in education ministry science program, 8 items were excluded from the achievement test with 42 items as a conclusion of item analysis; and the final form of the form of academic achievement test of solar system and beyond unit which includes 35 items was created. As a result of the item analyses of the final form of the academic achievement test of solar system and beyond unit; item difficulty was calculated between .43-.83, item discrimination index was calculated between .3-.72 and total item biserial correlation coefficient was calculated between .27-.69. Also average of the item difficulty index and item discrimination index of the final forms of the form of academic achievement test of solar system and beyond unit were successively calculated as .61 and .41. Tekin (2010) states that if the item discrimination level is .40 or above, the item discrimination is high. According to this, while calculating KR-20 reliability coefficient the average difficulty of test was found ‚average‛ and average item discrimination found ‚very good‛. KR-20 was identified .87. Can (2014) reported that the reliability of an evaluation instrument had to be between .60-.90 if students' scores needed to be reliable. In this case the scores of the students' reliability can be identified as high. The results demonstrated the form of academic achievement test of solar system and beyond unit is a valid and reliable test to evaluate the 7 th graders' academic achievements about solar system and beyond unit.
Thereby it is believed that the form of academic achievement test of solar system and beyond unit will be useful to determine the readiness and knowledge deficiency of 7 th graders. Also the final form of academic achievement test of solar system and beyond unit can be contribute the suggested fields below: -Organizing the learning activities of students by the developed test, as regards the identified deficiencies.

492
-Using the developed achievement test for process evaluation.
-Determining the misconceptions of students via the choices which belongs to the developed multiple-choice test.
-To make permanent teaching, presenting beneficial knowledges via developed achievement test.
-In using the developed achievement test as a data collector for the other researches in the field of science education.

Acknowledgement
The earlier version of this paper was presented at International Congress on Gifted and Talented Education at Inönü University, Malatya-Turkey (November 1-3, 2019).
The data used in this study was confirmed by the researchers that it belongs to the years before 2020.    3) Yapay Uydu c) Gezegenlere ya da uzay boşluğuna gönderilen insansız veri toplama aracı.