Validation of Volleyball Common Content Knowledge Test

Aim: Aim of this study was to check and evaluate the validity and reliability of volleyball common content knowledge (CCK) test for physical education teachers. Methods: Rasch modelling was used for validating the test and data were collected from 214 physical education teacher education (PETE) students. The expert group followed a four-step test developing process and developed 20 test items. Results: Results showed that 18 of 20 test items demonstrated high internal consistency and reliability for both test items and person attended this study. The wrightmap showed that items demonstrated the cumulative norm. Conclusion: The developed test is valid  and reliable for measuring volleyball CCK level of PETE students and physical education teachers. The knowledge base acquired from such CCK test may assist policymakers and university faculty to design the PETE programs as well as professional development programs.


INTRODUCTION
Effective teaching is quite important for contemporary school physical education and sport. Recent studies indicated that teachers need to have deep content knowledge (CK) for effective teaching in physical activity and sport related courses (Ward, 2009). For example, a physical education (PE) teacher needs to have basketball CK in order to teach it properly. CK is highly related to pedagogical content knowledge (PCK), which is defined by Shulman (1987) as a teacher's planning, enacting and describing of instructional tasks and its representations. Studies in PE concluded that if CK level of teachers increased, their PCK level also improved (Iserbyt, Ward & Li, 2017;Ward, Kim, Ko& Li, 2015). PE teachers with a lack of CK and PCK, cannot plan, sequence, and teach developmentally appropriate instructional tasks to their students thus expected learning outcomes cannot be reached (Siedentop, 2002).
Deep CK requires two knowledge bases that are common content knowledge (CCK) and specialized content knowledge (SCK) (Ball, Thames & Phelps, 2008;Ward, 2009). CCK is the knowledge of how to perform a specific sport skill, (e.g., knowing how to execute the overhead pass) and SCK is the knowledge of how to teach CCK (e.g., knowing that using a balloon may help a student who cannot move her body fast enough to touch the ball at her forehead) (Ward, 2009). CCK has two sub-domains: a) rules, etiquette and safety, b) techniques and tactics. SCK has also two sub-domains: a) detecting student errors, b) representing instructional tasks to correct those errors. SCK is mostly confused with PCK. Differences can be comprehended with an example. If a PE teacher knows how to teach the forearm pass in volleyball with incremental steps (e.g., (1) shadow pass, (2) self-toss, pass, catch, (3) partner toss, pass, catch, (4) partner toss, pass, catch with movement and so on) he has SCK. On the other hand, if the teacher modifies and represents the instructional tasks to effectively teach them to a particular group of students such as 6 th or 9 th graders based on a number of knowledge bases including knowledge of students, context, and pedagogy, he has volleyball PCK.
Studies about PCK indicated the importance of CK for effective teaching in PE (Kim et al., 2018;Ward & Ayvazo, 2016). Despite the recent efforts to create CCK tests for teaching particular sports in school settings, there are only a few studies that measured CCK, most of which focused on the healthrelated fitness knowledge of the students enrolled in physical education teacher education (PETE) programs and inservice PE teachers (Castelli & Williams, 2007;Miller & Housner, The role and contributions of each authors as in the section of IJSETS Writing Rules "Criteria for Authorship" is reported that: 1. Author: Contributions to the conception or design of the paper, data collection, writing of the paper and final approval of the version to be published paper; 2. Author: Data collection, preparation of the paper according to rules of the journal, final approval of the version to be published paper; 3. Author: Statistical analysis, interpretation of the data and final approval of the version to be published paper;  (Dervent, Devrilmez, Ince & Ward, 2018).
This study as another effort to create a CCK test in a specific sport will fill the gap in literature in two ways. First, one of the important components of preparing knowledge tests isthe necessity for conforming with purposes, contents, and applications of current primary, secondary and high school PE curriculums, which will be taught by PETE students in future (Ayvazo, Ward & Stuhr, 2010;Hunuk, Ince & Tannehill, 2013). Ministry of National Education of Turkey (MoNE) (2018) renewed primary, secondary, and high school PE curriculums based on the key competences of lifelong learning. Thus, all PE curriculums are compatible with each other. Second gap is the importance of identifying CCK for teaching a specific sport in school PE lessons. Although PETE programs mostly focus and teach CCK, CCK levels of PETE students were lower than expected (Devrilmez, 2016). If a PE teacher doesn't know what to teach, it negatively affects his/her subject selections and quality of school PE lesson.
Volleyball as a popular sport in Turkey was chosen for this study. It is broadly taught at schools because it can be performed with limited space requirement and low-cost materials. The acquired knowledge from this study is expected to guide for preparing a valid and reliable knowledge test for other physical activities/sports to measure the depth of CK of preservice and inservice PE teachers. Moreover, volleyball CK levels of PETE students who already took volleyball course were measured. This acquired knowledge is believed to contribute to the improvement of instructional tasks presented in PETE programs.
Literature defined that valid and reliable sport specific CCK tests were required He et al., 2018). The purpose of the study was to check and evaluate validity and reliability of volleyball CCK test by using Rasch modeling. We hypothesized that volleyball CCK test was a valid and reliable measurement tool to assess PETE students'depth of CCK.

METHOD
The approval from the institutional review board was obtained. Individual consent forms of each participant were collected. Setting: According to the Institution of Higher Education (YÖK) (2006), there were 71 PETE programs in Turkey at the time of the study. PETE students are graduated from these programs after a 4-year education. Through this process, they graduate with learning team sports. Volleyball, one of the team sports taught in PETE programs, was chosen for two reasons: a) it is a widespread sport in Turkey with the highest number of certified athletes and referees as well as the second sport in terms of number of certified coaches (General Directorate of Sports of Turkey, 2018) and b) it is convenient to be taught in PE lessons even at the resource-poor schools due to its required equipment and place. In Turkey, all the curriculums of teacher education programs, which were recently updated, were centrally designed by YÖK. Thus, PETE programs are obliged to use the curriculum provided by YÖK. In the previous versions of the curriculum, volleyball was a compulsory course at the third year. In the current version of the curriculum, volleyball is an elective course (e.g., one of the three elective courses of team sports), however most of the programs advise their students to select volleyball. When PETE students are graduated, they are supposed to have the knowledge to teach it in school PE lessons. Hence, assessing their volleyball CK level is valuable and important. Participants There were 214 (42 % female and 58 % male) third-and fourth-year PETE students from one private and two state universities. They accepted to attend this study voluntarily. They had completed compulsory volleyball course in their third year prior to data collection. They also answered demographic questions. According to their answers, participants' ages ranged between 21 and 28 years (MSD=23.62.12). Only 8.41% (n=18) of the participants had volleyball experience either as a player or a coach, while 91.59 % (n=196) of them had no experience other than the volleyball course they had taken at their PETE program in their third year and K-12 education. Development of the test At the beginning, we reviewed the literature to find a valid and reliable volleyball knowledge test. Review on major databases (i.e., ERIC, SPORTDiscus, EBSCO) showed that there were not any published tests to use in this study. Following process was used to develop volleyball knowledge test. In the first phase, we aimed to align the volleyball CCK test to the PETE curriculum (YÖK, 2006). To do this, an expert group comprised of an expert curriculum developer in PETE, two professors with deep volleyball knowledge, and two lecturers who teach volleyball at PETE programs discussed the expected outcomes of PETE volleyball courses. After this discussion, 32 questions were produced in total (e.g., 16 questions for techniques, 7 questions for tactics, 9 questions for rules and safety). Then, a Turkish language expert checked the appropriateness of questions to Turkish language and clearness of them for PETE students. Next, four national level volleyball coaches with at least 25 years experience were asked to review the questions. Then, in order to establish face validity, the questions were provided to six PE teachers who had minimum 10 years experience in teaching volleyball either at middle or high schools and 10 preservice PE teachers, five of whom were volleyball players in local leagues while the other five had no volleyball experience. At this point, 12 questions were removed from the test since they were found inappropriate or too easy or too difficult for the PETE curriculum.
In the last step, the draft of the test was given to other 12 PETE students who had already taken volleyball course two semesters ago. These students indicated that questions prepared for the test were clear enough to understand so no further revision was made. The final version of the test was consisted of 20 multiple-choice questions (e.g., eight techniques, five tactics, and seven rules questions). There was only one correct answer from four answers, for each question. Four choice questions are good for testing knowledge level of participants (Naseer & Hong, 2015). Sample questions for each domain were represented in Table 1.

Question 1-Rule
Which of the following cannot pass to the offensive player in the front area?

a) Setter b) Defense c) Spiker d) Libero Question 2-Technique
What is the first technique to be described as a defensive movement when the opposing team is attacking? a) Block b) Spike c) Forearm pass d) Diving-rolling

Question 3-Tactic
Which technical sequence is the game run after a service is played?

a) Forearm pass-block-spike-overhead pass b) Block-overhead pass-spike-forearm pass c) Forearm pass-overhead pass-spike-block d) Spike-block-forearm pass-overhead pass
Note: Underlined options in questions are correct answers.

Procedures
The test was filled out by the participants during their regular PETE course. Approximate time for filling was 30 minutes. Collected data were entered to MS Excel spreadsheet and transferred to Winstep software Version 3.72.4 (Linacre, 2008). It was used for calibrating data to Rasch measurement model (Rasch, 1980). Traditional measurement models require "fit the data", on the contrary, Rasch model focuses the data to "fit the model" (Linacre, 2008). Model results do not change even data were collected from different participants. There are four analyses for Rasch measurement model including Wright Maps, item fit, person fit, and separation and separation-reliability indices.

Person-item/Wright maps:
Person-item maps also called Wright maps are the scale method, which shows distribution of item difficulties and distribution of responses of participants (Linacre, 2008). The Wright maps display the items and location of the participants on the same continuum, so the actual achievement of an individual is described. This might provide guidance for teacher educators in the context of what PETE students know and what they need to learn to teach effectively. Right side of the map demonstrates item difficulty rank. The most difficult questions take uppermost part of scale and easiest questions locate in the lowest part. Answers of participants are demonstrated on the left side of map. These answers rank from the highest score on the top of left side and lowest score takes lowest part of it. Item fit: Item fit analysis is used to test appropriateness of items to the model (Bond & Fox, 2007). It includes infit and outfit values, which enable to detect to what extent the data fit the model. Infit statistics are sensitive to model's anticipation on where the response to be while, outfit statistics are sensitive to unexpected patterns (He, Ward & Wang, 2018). To give an example, if PETE students who had volleyball experience were able to answer the difficult questions, this was indicative of a good fit of the model. On the other hand, students with no experience answered those questions; this was a sign of a poor fit of the model. Infit and outfit statistics are determined with mean square residual (MNSQ) and the standardized mean square residual (ZSTD) values in the Rasch modeling (Rasch, 1980). MNSQ values should be ranging from 0.5 to 1.5 (Linacre, 2008) and ZSTD values should be ranging from +2 and -2 (Bond & Fox, 2007). Person fit: Person-fit analysis is used to measure item-score pattern, which indicates appropriateness of it in the model (Bond & Fox, 2007). Person-fit indices are determined with MNSQ values. If infit and outfit MNSQ values are between .05 and 1.5, it demonstrates that person-fit indices are good for the model (Linacre, 2008).

Separation index and separation-reliability index:
Construct validity of the model is determined by item separation index, which sets apart low and high achievers. Bond and Fox stated that if item or person separation indices are over 1.5, it is an acceptable separation level. Separation level of 2.00 indicates a good level and an index of 3.00 demonstrates great level of separation level. There is also separation-reliability index value, which gauges to check reliability of either person or item responses. If the value is close to 1.00, it shows high confidence for responses (Bond & Fox, 2007).

RESULTS
Infit and outfit values were represented by Table 2. Infit statistic results for MNSQ showed that all items were within expected range. ZSTD values of infit statistic were ranged from -1.7 to 2.0 and were acceptable (Bond & Fox, 2007). Outfit statistic results indicated that items 3 and 7 were a little over than acceptable MNSQ value of 1.5. ZSTD values of all items were within expected range, except item 3. Item difficulty of test is ranged from -0.87 to 1.52. Wright maps demonstrate difficulty separation of the model. Figure 1 clearly showed that items that were represented left side of map were demonstrated cumulative norm (MSD=0.460.64). Person responds on the right side were distributed from the highest to lowest scores and their performances were moderate (MSD=9.12.4). Wright map shows a good separation for person responses.

Measure
Person-MAP-Item more rare 3 ++ Real estimate values were used for this study instead of model estimate because real estimate is more conservative and reliable more than model estimate (Boone, Staver & Yale, 2014). Item separation index of model was 4.52. This value shows an excellent level of separation. The reliability of the model was reported in Table 3. The separation-reliability estimate was 0.95 representing high level of reliability (Boone et al., 2014). According to results, items of test can separate participants who have volleyball knowledge from those that do not. Moreover, items are reliable to measure PE teachers' and PETE students' volleyball knowledge level.  Table 4 reports the person separation index of 3.78 that demonstrate high level of separation index. The separation-reliability level was calculated as 0.92. Results showed that participants selected for this study were reliable (Boone et al., 2014).

DISCUSSION
The purpose of this study was to develop a valid and reliable volleyball CK test. According to our hypothesis, volleyball CCK test was a valid and reliable measurement tool to assess PETE students' depth of CCK.Findings indicated that the volleyball CCK test met the necessary item difficulty and item discrimination standards for educational measurement (Boone et al., 2014). For fit statistics, while all of the infit MNSQ and ZSTD values were within acceptable range, only two items were outside of outfit MNSQ values (items 3 and 7) and one item had significant outfitting ZSTD value (item 3). According to Linacre (2008), misfitting infit is much more problematic for validity of test than misfitting outfit values. Since results show high item and person reliability indices; and excellent level of item and person separation indices, test is acceptable for measuring volleyball CCK (Baghaei & Amrahi, 2011;Boone et al., 2014).
Valid and reliable physical activity and sport specific knowledge tests are required in order to measure CCK level of PETE students and PE teachers. Until now there have been three studies used Rasch modeling to develop CCK knowledge tests. Similar to our study, soccer (He et al., 2018) gymnastic  and soccer test developed for Turkish CCK (Dervent et al., 2018) demonstrated high internal consistency. He and her colleagues (2018) studied on developing and validating soccer CCK test. Totally 27 of 30 items were within acceptable range and test could be used to measure soccer CCK level. Studies demonstrated that Rasch method is useful and effective method to develop and validate sport specific CCK tests.
Even there were studies measuring CCK of PETE students or PE teachers, we did not find any volleyball CCK test. Hence, this study contributed literature in two ways. First, even some sport specific CCK tests were developed recently, this study extended to CCK database, especially volleyball. Second, Rasch modeling with the item and person discriminations as well as Wright maps allowed us to develop valid and reliable volleyball CCK test. Model is suitable for creating instrument, not like traditional models, which determine validity through sensitive responses of person and item-total correlations. Rasch modeling focuses on data fit the model, not fit the data just like in traditional model. Even some items do not fit the overall model, developed test could be used to measure expected subject knowledge if overall internal consistency and reliability of items and person are high.
The study had some limitations. One of them was moderate sample size of this study even though it was enough for validating the test. Larger sample size could be used for future studies. Second, test is developed for Volleyball sport. Different physical activityand sport specific CCK tests for school PE setting should be developed.

CONCLUSION
Despite the fact that it was not the primary focus of the study, results also show the mean correct answers, thus the volleyball CCK levels of the students. The participants answered correctly only MSD=9.632.31 (48.15 %) of the questions, which indicated low CCK than the expected level (70 % or over correct answers from total questions) to teach the specific sport, volleyball (Castelli & Williams, 2007). Low CCK levels might be the result of the structure of PETE programs. So, determining the CCK levels of students gives us the opportunity to better understand the effectiveness of PETE programs to train future PE teachers.

PRACTICAL APPLICATION
This reliable and valid volleyball CCK test is an effective measurement tool to assess PETE students' depth of CCK. It could also be used to evaluate subject matter knowledge while recruitment ofPE teachers for private schools. The knowledge base acquired from such CCK test may assist policy makers and university faculty to design the PETE programs as well as professional development programs.