Adaptation of the Self-efficacy Beliefs in STEM Education Scale and Testing Measurement Invariance across Groups

Academic performance on science, technology, engineering, and mathematics (STEM) education is important for the economic development of countries. From the perspectives of social cognitive theory, one of the predictors of academic performance is self-efficacy. In order to measure middle school students’ self-efficacy beliefs in STEM education, STEM Competency Beliefs scale was developed in English originally by Chen, Cannady, Schunn, and Dorph (2017). In this study, it is aimed to adapt the English scale into Turkish and to provide evidence regarding reliability and validity. Throughout the adaptation process, forward and backward translation was completed. In the pilot study (n = 77), the reliability of the data and the clarity of the statements in the Turkish version of the scale was examined. In the main study, the Turkish version was administered to 330 middle school students to investigate the psychometric properties of the scale. The results pointed out that the scores obtained by the Turkish version of the scale had good internal consistency. Regarding the dimensionality of the scale, in contrast to the original version, the adapted scale showed a two-dimensional structure. Measurement invariance findings for gender groups supported configural and metric invariance, whereas scalar invariance was partially achieved. Measurement invariance findings for career choice groups supported configural, metric, and scalar invariance. Scale scores of students were estimated using multidimensional Item Response Theory. The findings suggested that the scale can be utilized for STEM-related research to assess the competency beliefs of students.


INTRODUCTION
Science, technology, engineering, and mathematics (STEM) education is the integration of these disciplines (Breiner, Harkness, Johnson, & Koehler, 2012;) in order to deal with real-world problems (Johnson, Peters-Burton, & Moore, 2016;National Research Council-NRC, 2014). STEM education is substantial for countries in terms of three interconnected aspects: competitiveness in the global market, needs for innovation, and jobs of the future (Atkinson & Mayo, 2010;English, 2016;Johnson et al., 2016). One of the ways to stay competitive in global markets for countries is maintaining development in STEM disciplines. Science-and technologybased innovation enforces countries in the global market by increasing exports (Atkinson & Mayo, 2010). This kind of innovation is only possible with a workforce educated in science, technology, engineering, and mathematics content (Atkinson & Mayo, 2010). It is predicted that in the future one out of three jobs will be STEM-integrated or strongly related to STEM fields. Hence, students need to be educated with integrated STEM approach as candidates for the future workforce (English, 2016). STEM fields to achieve the goals of 2023. Preliminary actions have been done, such as changing the national curriculum (Ministry of National Education-MEB, 2018a) and opening STEM institutions and centers to empower STEM education (Colakoğlu & Gökben, 2017). Moreover, research about STEM studies and developing STEM-related master and doctorate programs have been increasing (Akgündüz et al., 2015).
Self-efficacy beliefs are regarded as one of the variables that play a key role in academic achievement (Jinks & Lorsbach, 2003;Kanny, Sax, & Riggers-Piehl, 2014;Nelson & Ketelhut, 2008) and career persistence (Green & Sanderson, 2018) in STEM fields. It is significant to improve self-efficacy and academic achievement of students in STEM fields to fulfill the STEM-related jobs. Even though the number of STEM education research has gained acceleration both at international level (Atkinson & Mayo, 2010;Breiner et al., 2012;English, 2016;Johnson et al., 2016; and in Turkey (Han, Capraro, & Capraro, 2016;Hacıoğlu, Yamak, & Kavak, 2016;Yerdelen, Kahraman, & Taş, 2016), to the best of our knowledge, there is not a valid scale to assess the STEM self-efficacy beliefs in Turkey.
Firstly, the present study aimed at adapting the English version of the STEM Competency Beliefs scale into Turkish and validating the adapted version. Secondly, the study compared the participants' selfefficacy beliefs on STEM education in terms of their gender, school type, and career choices in a Turkish context. Finding significant differences between school types (private vs. public) and career choices (stem related and not-stem related) could be considered as additional validity evidence (Sireci & Sukin, 2013) as these groups are expected to be different in their competency scores due to the resources and student motivation, respectively.
Having a valid scale to assess STEM self-efficacy beliefs in Turkish is significant for researchers and educators to investigate individual's self-efficacy on STEM and its relationships with other crucial variables such as academic performance in STEM or interest towards STEM fields in Turkey. Moreover, having a STEM Competency Belief scale in Turkish enables researchers, teachers and policymakers to evaluate STEM programs and identify the learner characteristics in terms of STEM self-efficacy in Turkey. Comparing STEM competency beliefs of gender groups in Turkey is also expected to extend the literature.

Self-efficacy Beliefs in STEM Education
Self-efficacy is defined as the capability of an individual's point of view for himself/herself to perform at a level of proficiency (Bandura, 1999) and interchangeably used perceived self-competence (Zimmerman, 1995). Self-efficient people are more resilient, solution-oriented, hard workers (Pajares & Miller, 1997), active in the control of time, better at task focus (Bouffard-Bouchard, Parent, & Larivee, 1991), self-regulated, more efficient in the use of problem-solving strategies and in the management of working time (Zimmerman, 2000). Bandura (1999) also explained that self-efficient people perceived failure differently than less self-efficient people. They regard failure to insufficient effort, weak strategies, or conditions. These features of self-efficient people play a key role in their performance (Bandura, 1999;Bouffard-Bouchard et al., 1991).
Beliefs about self-efficacy influence how much students learn (Vincent-Ruz & Schunn, 2017). For instance, Nelson and Ketelhut (2008) investigated ninety-six middle school students' self-efficacy and their performance in learning science in a virtual environment. As a result of the study, it was indicated that students with lower levels of self-efficacy did not perform as well as students with higher levels of self-efficacy. Bandura (1997) emphasized that the relationship between self-efficacy and performance is reciprocal. In other words, if people are self-efficient, their characteristics help them to be successful in related tasks. Achieving tasks boosts their self-efficacy, which leads to working harder and targeting more difficult tasks. Working harder helps to achieve new tasks that continue with better performance and higher self-efficacy. Moreover, Hidi and Ainley (2008) emphasized a positive relationship between interest and self-efficacy. The more students believe themselves, the more they are interested in their subjects. Thus, educators are required to help learners to experience better feelings and improve their beliefs about themselves. It helps students continue to work on or reengage 165 with activities, ideas, objects and so on, and to increase knowledge and a stored value (Hidi & Ainley, 2008).
Beliefs about capabilities function as an important role that influences science or non-science related majors and career choices (Hackett & Betz, 1982). Durik, Vida, and Eccles (2006) examined how the 10 th graders' self-concept of ability on English/reading was related to their career choices. The results showed that the subject-oriented self-concept of ability predicted future career preferences of 10th graders.  also emphasized that people choose careers in areas where they believe that they are good at doing it well.
Studies found that females have lower self-efficacy towards STEM fields (Tellhed, Backström, & Björklund, 2017). Females do not believe that they can accomplish STEM fields because of the lack of role models and social or verbal persuasions (Zeldin, Britner, & Pajares, 2008). Self-doubts, lower performance expectations, male-dominated fields, social persuasions and vicarious experiences about STEM fields, individual backgrounds, family influences and expectations, perceptions towards STEM fields, psychological values, factors, and preferences are related with females' lower interests towards STEM fields (Kanny et al., 2014;Tellhed et al., 2017;Zeldin et al., 2008). Lower self-efficacy beliefs of females towards STEM is needed to overcome to reduce gender segregation in the field. One of the ways for increasing females in the area is increasing their self-efficacy for STEM careers (Tellhed et al., 2017).
Self-efficacy is a personal state which can change especially based on positive personal outcomes. As Jenson, Petri, Day, Truman, and Duffy (2011) stated STEM self-efficacy is an important focus and worthy of observation. Therefore, to assess STEM self-efficacy, many scales have been developed over the years (e.g., Dawes, Horan, & Hackett, 2000;Lent, Brown, & Larkin, 1986). In 2014, Milner, Horan, and Tracey (2014) argued that most of the scales have validity issues, and they developed the STEM Career Self-Efficacy Test. Pieces of evidence were presented to claim that the scale can be accepted as a valid instrument to measure self-efficacy in engaging STEM activities (Milner et. al., 2014). However, the scale is not applicable to middle school students who are expected to learn STEM fields at schools. In 2017, the STEM Competency Beliefs scale was developed for middle school students in Activation Lab in the USA . Activation Lab gathers academicians from various universities of the USA. They aim to increase young people's understanding and appreciation of STEM to prepare them for future challenges. One of the main research areas in Activation Lab is to develop scales to measure significant variables for STEM education, such as Science Competency Scale (Chung, Cannady, Schunn, Dorph, & Vincent-Ruz, (2016) and STEM Competency Belief scale . The STEM Competency Belief scale was developed to assess an individual's STEM Competency Beliefs. Cannady stated that the scale was also adapted into different languages like Spanish and African (M. Cannady, personal communication, November 12, 2018). As the original scale was developed very recently, there is not any publication yet based on this scale. Moreover, Smith (2019) adapted the original scale to measure technology competency beliefs. She applied the adapted version to investigate the effect of a coding instruction to seventh graders' self-efficacy in technology.

Present Study
In a decade when STEM has gained popularity and been studied from different perspectives, it is crucial to assess the self-efficacy of students for STEM fields. One of the scales to assess middle school students' self-efficacy in STEM education is the STEM Competency Beliefs scale. The scale was developed by Chen et al. (2017) in English. The purpose of the present study was twofold. First, to adapt the scale into Turkish and to test the factor structure of the STEM Competency Beliefs scale with the Turkish sample. The second purpose was to test whether the factor structure of the scale had measurement invariance across gender groups and career choice groups in the Turkish sample. The research questions of this study are: 2) Are the configural, metric, and scalar parameters invariant across girls and boys?
3) Are the configural, metric, and scalar parameters invariant across students who want to follow stem-related and not stem-related careers? 4) Is there any significant difference between students' scale scores on gender groups, career groups, and school types?

METHOD
This study primarily aimed to adapt STEM Competency Beliefs scale into Turkish and to test measurement invariance for the factor structure of the STEM Competency Beliefs scale. Therefore, the adaptation part could be named as a descriptive study and measurement invariance part could be named as a correlational study. Detailed information about participants, data collection instrument and data analysis are presented below.

Participants
For the pilot and the main study, two different sample groups were used. All the students were science center visitors taken by their schools as a school trip to attend workshops; therefore, the sampling method was the convenience sampling. These workshops were held in a science center in İstanbul which belongs to a Municipality. Seventy-seven students (4 th to 8 th graders) participated in the pilot study. The participants consisted of 32 male (42%) and 45 female (58%) students. Seven of the participants (9%) were from private schools, and 70 of them (91%) were from public schools.
Participants of the main study were 330 students coming from different schools as visitors to the science center. Among these 330 students, 4 of them did not provide all responses to the items. Therefore, after listwise deletion, all the analyses were conducted based on 326 students (2 females and 2 males; 3 public and 1 private school). The gender percentages of the students were regarded as balanced, consisting of 157 females (48%) and 169 males (52%). Also, students who participated in the study were coming from different school types as public schools (n = 302, 93%) and private schools (n = 24, 7%). The majority of the students were 7th graders. Among these students, 161 of them (49%) stated that they want to have STEM-related careers, whereas 165 of them (51%) do not want to follow STEM-related careers. According to student ratios of gender groups, school types, and students' choices of future careers, and the way these students were brought to the center, the sample could be considered as not biased.

Data Collection Instrument
The STEM Competency Belief scale is a 12-item 4-point Likert-type scale . The survey was designed for 10-14-year-old respondents to assess an individual's STEM Competency Beliefs. The reliability of the STEM Competency Beliefs Scale was good (Cronbach's Alpha = .83; polychoric Alpha = .87) based on a data collected from a sample of 205 middle school youth . Two of the items were listed below as sample items: "I can do math problems I get in the class." "I am the technology expert in the house."

Data Analysis
The scale adaptation process included the following stages: scale adaptation, piloting, reliability and validity analysis, and testing measurement invariance for gender groups and career choice groups.

Scale adaptation
Methodology in translation and adaptation of a scale has enhanced rapidly in last 25 years. The reasons behind this rapid development are based on four issues including interest in cross-cultural psychology (van de Vijver & Hambleton, 1996), international comparative studies in education, worldwide exams, and fairness in testing for language preferences (Hambleton, Merenda & Spielberger, 2012;International Test Commission-ITC, 2017;).
Translation and adaptation are two major terms used in the field. Compared to the test translation, the test adaptation is a more preferred, more reflective, broader, and commonly used term (Hambleton et al., 2012;ITC, 2017). During the application of test adaptation, a variety of activities are required, such as deciding whether the same construct occurs in different languages, determining translators, deciding accommodations, adapting the tests, and checking for equivalence. On the other hand, the test translation is only one of the steps that happen in the adaptation. This step is language translation from one to another. However, a test adaptation requires thinking deeply in terms of cultural, psychological and linguistic issues (Hambleton et al., 2012). Briefly, translation and adaptation have different meanings, and the adaptation is a more comprehensive term.
ITC (2017) guideline grouped the steps of the test adaptation process as before, in progress, and after. According to the guideline, before the adaptation, three steps are suggested for experts: obtaining permission from test developers, evaluating the similarities between cultures, and minimizing the cultural and linguistic differences. In the progress part of the adaptation, five steps are emphasized: ensuring the minimal cultural differences, using appropriate design methods to maximize suitability, providing evidence that the test is the same for intended populations, providing evidence for the structure of the test, collecting data to complete necessary revisions. In the last part, four steps are needed to be completed after the adaptation process: determining the sufficient size of the sample, providing statistical evidence for construct equivalence, providing evidence for reliability and validity analysis, and using appropriate data analysis procedure. In addition to the steps mentioned here, scoring and documentation are emphasized in the guideline (ITC, 2017).
For the adaptation process, two main design methods appear in the literature, namely forward and backward translation. The forward translation is a process that one or more translators adapt the test from the source language to the target language. Backward translation has three main processes in itself. Firstly, a test is translated from the source language to target language by determined translators. Then, different translators translate the test from target language back to the source language. Finally, these two forms of the test as source language and back-translated version are compared for equivalence (Hambleton et al., 2012). The backward translation allows the researcher to compare two forms in a more objective level.
For the adaptation of the STEM Competency Beliefs scale, preconditions were completed before the study. Firstly, permission was granted for the adaptation of the STEM Competency Beliefs scale into the Turkish (M. Cannady, personal communication, November 12, 2018). Then, cultural similarities and differences were evaluated by the research team, including an associate professor in science education, an assistant professor in assessment and evaluation, and the researcher. Finally, forward translation, backward translation, and final version editing were performed.
Forward translation: For the forward translation, the scale was translated from English to Turkish. Translators were 5 years experienced English teacher and 7 years experienced English interpreter. Each translator worked independently, and translated forms were collected in an excel document. The research team compared the translations, discussed STEM-related terms, and the scale was formed in Turkish. For example, the research team discussed "After school science club" and decided to translate as "science and technology club" which is a term in the National Education Social Activities Program Students' Club (MEB, 2009 Final version editing: As a final step, a linguist expert who is a doctorate student in a Learning Science program and a Turkish language editor compared the back-translated version of the scale and the original one. After some smooth changes on the adapted scale, the adapted Turkish version was finalized.

Piloting the adapted version of the scale
A pilot study was conducted to check the clarity of the items from students' perspectives. There were 2 additional questions at the end of the survey: "Is there any question that you struggle to understand?" and "if yes, which question(s) were they?" to identify problematic statements. Additionally, Cronbach's Alpha value and corrected item-total correlations were estimated to flag problematic items. Related revisions were made as a result of the pilot analysis.

Reliability analysis of final data
The reliability of the scale was tested using Cronbach's Alpha internal consistency coefficient. Cronbach's Alpha value above .70 is acceptable, above .80 is good, and .90 and above is excellent. Results that are closer to 1 mean higher internal consistency (George & Mallery, 2001). In the item level, the corrected-item total correlations were reported. Items with low correlations (less than .30) are considered as problematic items (Field, 2013), and these items are investigated to detect the source of the problem.

Validity analysis of the final data
For the validity analysis, confirmatory factor analysis (CFA) was conducted. CFA is one of the forms of factor analysis to test whether the hypothesized structure fits the collected data well or not (Urdan, 2010). In order to evaluate the goodness of the fit of the data for the proposed model, fit indices are used. CFI (Comparative Fit Index), TLI (Tucker Lewis index) and RMSEA (Root Mean Square Error of Approximation) are widely used fit indices that are less sensitive to the sample size. CFI and TLI values over .95 and RMSEA value smaller than .06 is accepted as a good fit (Ullman, 2001). CFA analysis for the study was conducted with MPLUS 7.4 (Muthén & Muthén, 2015) using the Weighted Least Square estimation method. One dimensional structure proposed in the English version was tested with the data collected by the adapted Turkish version. Multivariate normality, outliers, and sample size assumptions were checked to conduct CFA (Ullman, 2001).
When the student data does not fit the hypothesized structure, exploratory factor analysis (EFA) could be used to investigate the communalities among items. EFA using principal axis factor extraction technique with direct oblimin rotation was conducted as items could be correlated with each other. An item that has 0.400 or less item loading to its primary factor is considered as a problematic item. Also, if an item is loaded to at least two factors at the same time (factor loading difference of an item to a primary factor and other factor is less than .10), that item is also called problematic item (Field, 2013).

Item response theory scaling
Item response theory (IRT) scaling was conducted to estimate students' ability on the latent variables. Generally, IRT requires the data to be unidimensional (Hambleton & Jones, 1993). In the case of violating unidimensionality, multidimensional IRT estimations are available (Reckase, 2009

Measurement invariance of final data
Measurement invariance analysis for gender groups and career choice groups were conducted to test whether the same construct was being measured across groups. As the number of students from private schools was not enough to estimate the parameters, measurement invariance analysis for school type was not performed. Having measurement invariance across gender or career choice groups implies that the scale scores of boys and girls, or students who want stem-related and not stem-related careers are comparable. The measurement invariance is tested comparing fit results of nested models: configural, metric, and scalar models. In the configural model, whether the same factor structure exists across groups is tested. In this model, factor loadings and thresholds are freed to be different across groups. In the metric model, factor loadings were constrained to be equal across groups, but the thresholds could take different values. In the scalar model, both factor loadings and item thresholds are constrained to be equal for groups (Milfont & Fischer, 2010;Vandenberg & Lance, 2000). Measurement invariance is assessed by comparing ΔCFI and ΔRMSEA values with cutoff criteria (ΔCFI ≤ .01, ΔRMSEA ≤ .015) suggested by Chen (2007), and Cheung and Rensvold (2002).

Pilot Study of the Scale
In the pilot study, items were administered to 77 students to test the clarity and fluency of the statements mainly. There were 2 additional questions at the end of the survey: "Is there any question that you struggle to understand?" and "if yes, which question(s) were they?" Seventy-two students stated that they could understand the statements clearly, and five students indicated that they had a problem to understand some items. These answers were used to determine if the statements need any changes or improvements before finalizing the Turkish version. For instance, one child expressed that item 2 was difficult for her/him because the word website was not familiar to him. Then, the word website changed as internet sitesi for the main study. Cronbach's Alpha coefficient of the data was found as .75. Corrected item-total correlations were between .28 (item4) to .60 (item12) which were acceptable values.

Reliability Analysis of the Final Scale
The reliability analysis of the final form of the 12-item scale pointed out that Cronbach's Alpha coefficient was .83, which implied the data had good internal consistency. Table 1 showed that the corrected-item total correlation of each item was higher than .30, which means that there were no problematic items in terms of item discrimination.

Confirmatory Factor Analysis
The original scale was shown to have a one-factor structure by the scale developers. Therefore, in the CFA, the adapted version of the scale was hypothesized to have a one-factor structure. The assumptions of multivariate normality were tested by drawing a histogram and estimating skewness and kurtosis. As histogram, and skewness (-.28) and kurtosis (-.30) values implied, the data were distributed normally. There was no outlier in the data. The ratio of sample size to the number of the variable was 27.5, which implied that the sample size was sufficient. The ratio of 1 to 10 is considered as enough sample size (Bentler & Chou, 1987). The fit statistics obtained through CFA was not acceptable for the one-factor model as shown in Table 2 (CFI = .890 < .950; TLI = .866 < .950; RMSEA = .117 > .060). Hence, exploratory factor analysis (EFA) was conducted to understand the structure of the Turkish version. Principal axis factoring (PAF) with oblimin rotation was performed for the EFA. Kaiser-Meyer-Olkin measure of sampling adequacy value of .863 indicated that the proportion of variance in the items might be caused by the underlying factor. Bartlett's test of sphericity (p < .05) showed that the correlation matrix was different from an identity matrix. Therefore, the data was appropriate for conducting the exploratory factor analysis. As shown in Table 3, the data had a two-factor structure where items 1, 8, and 9 were loaded to a different factor.
The items that were loaded to a new factor were listed below. These three items include statements regarding mathematics, whereas the other nine items focus on science, technology, and engineering. Hence, the primary factor was called self-efficacy related to science-technology-engineering (STE), and the second factor was called self-efficacy for mathematics (Math). Items loaded to the second factor are listed below.
Item 1: "I can do math problems I get in class." Item 8: "I think I am very good at Explaining my solutions to math problems." Item 9: "I think I am very good at: Solving problems" As the data structure in PAF suggested a two-factor structure, a CFA with two factors was reconducted.
The two-factor model improved the fit statistics impressively as shown in scale had the two-factor structure for the Turkish data as science-technology-engineering is the first factor, and mathematics is the second factor.

Measurement Invariance
Configural, metric and scalar invariance of the scale across gender groups and career choice groups were evaluated (See Table 5 and 6). For school type, as there were a limited number of students in one group (24 students in from private school), measurement invariance analysis could not be achieved. Configural invariance results across gender groups indicated that the fit indices were good (TLI = .971, CFI = .975, RMSEA = .058). This means that the factor structure of the scale was similar for boys and girls. Metric invariance analysis showed that the change in the fit statistics supported the invariance (ΔCFI = .001, ΔRMSEA = -.003). Having metric invariance means that in addition to the factor structure, the factor loadings were equivalent across gender groups. Scalar invariance results showed that the change in the CFI was higher than allowed, whereas, for RMSEA, the change was within an acceptable range (ΔCFI = -.016, ΔRMSEA = .006). Modification indices suggested that this problem could be due to item 7. Freeing thresholds of item 7 for boys and girls resulted in better and accepted change in fit statistics (ΔCFI = -.010, ΔRMSEA = .002). This finding means that except item 7, item thresholds were invariant, and mean scores of males and females were comparable. Item 7 is "I think I am very good at: Giving evidence when I tell my opinion." Therefore, partial scalar invariance was supported for gender groups.
Configural invariance results across career choice groups indicated that fit indices were good (TLI = .961, CFI = .969, RMSEA = .063). This means that the factor structure of the scale was similar for students who want to follow STEM-related or not STEM-related careers. Metric invariance analysis showed that the change in the fit statistics supported the invariance (ΔCFI = .002, ΔRMSEA = .005).
Having metric invariance means that besides the factor structure, the factor loadings were equivalent across career choice groups. Scalar invariance results showed that the changes in the CFI and RMSEA were also within acceptable ranges (ΔCFI = .000, ΔRMSEA = .009). This finding suggested that the mean scores of career choice groups are comparable.

Comparative Analyses
Comparative analyses were conducted to test mean score differences of related groups (gender, school type, and career choices). The scores used in these comparisons were estimated using multidimensional IRT scaling. As all subgroup scores were normally distributed, a parametric test of group comparison was chosen. For the first comparison, Science, Technology, and Engineering (STE) and Mathematics (Math) score means were compared for gender groups, excluding item 7. Table 7 shows the mean score of boys and girls for STE and Math factors. Independent sample t-test showed that the mean score difference of self-efficacy on Math for boys and girls was not statistically significant (p > .05; d = 0.12). A similar result was found for STE mean scores of boys and girls (p > .05; d = 0.21). For the second comparison, STE and math factor score means were compared for public and private schools. The mean score differences between public and private school students were statistically significant for both STE and Math, as showed in Table 8. Levene's test for equality of variances indicated that the variances were equal (p =.35 for STE and p =.07 for Math). In order to assess the magnitude of the differences, effect sizes were calculated (d = 0.83 for STE, and d = 1.27 for Math). The differences between public and private school groups were significant, with large effect sizes for both STE and math (Cohen, 1988). As the third comparison, the mean scores of students according to their career choices (STEM-related vs. not STEM-related) were compared. Table 9 demonstrates that there are statistically significant differences between the groups. Cohen's d was calculated for the group and obtained 0.38 for STE and 0.41 for Math. It shows the group mean scores are not equal, and they have a medium effect size. scale is expected to enable scholars to use the scale in the Turkish context. Providing measurement invariance results before comparing mean scores of scales for subgroups is also important to exemplify the procedure in comparative studies. In this respect, this study fills a gap by providing an adapted version of the newly emerging Stem Competency Beliefs scale.
An important difference between the English original and Turkish adapted scale emerged in the dimensionality of the scale. While the original scale was reported to have a one-factor structure, the Turkish scale was shown to have a two-factor structure. Item 1, 8, and 9 were loaded to a different factor, which was closely related to Math-related self-efficacy. The rest of the items were related to science, technology, and engineering. Cannady stated that the scale was also adapted into different languages as Spanish and African (M. Cannady, personal communication, November 12, 2018), and those data also showed a unidimensional structure. It can be argued that there is a sharp distinction in STEM perceptions of Turkish students as considering math in one group, and science, technology, and engineering projects in the other group. This distinction is not an expected interdisciplinary view proposed by the STEM theory. The reason for this distinction could be that Turkey does not have a direct STEM action plan, whereas many countries have a concrete strategy plan and action (MEB, 2016). Hence, students in Turkey have difficulty in perceiving STEM as a whole. Besides that, in the latest revisions of the curriculum in Turkey, there is a statement emphasizing the "science, technology, engineering" in one hand, and mathematics on the other hand (MEB, 2018a(MEB, , 2018b. This might be one of the plausible explanations of why students consider STEM fields in two distinct groups. Also, studies in Turkey supported the idea that STEM is not taught in an integrative way in the schools (Baran Canbazoglu-Bilici, Mesutoglu, & Ocak, 2016;Colakoglu, 2016;Ercan, Altan, Taştan, & Dağ, 2016;Han, Yalvac, Capraro, & Capraro, 2015). All the issues mentioned here may lead students not to comprehend STEM in the actual manner.
As the mean scores of boys and girls are compared frequently throughout the scales, providing evidence regarding measurement invariance is important to get valid inferences. The measurement invariance findings showed that configural and metric invariance was supported whereas scalar invariance could be achieved freeing item 7 across gender groups. This means that the factor structure of the scale and the factor loadings were similar for boys and girls. Except for item7, threshold values to endorse statements were also similar. Therefore, excluding item 7, mean scores of boys and girls on these factors are comparable. Item 7 is related to giving evidence about opinions. This finding implies that for boys and girls, providing evidence for their opinions could have a different meaning. Similarly, measurement invariance results for student groups according to their career choices (STEM-related vs. not STEM-related) suggested that the mean scores of career choice groups could be comparable.
Comparative analysis results showed that the mean score difference of self-efficacy on Math for boys and girls was not statistically significant, as well as STE mean scores. The effect sizes also supported these findings. On the contrary to the literature (Hackett & Betz, 1982;Tellhed et al., 2017;Zeldin et al., 2008), no major differences were observed between mean scores of both STE and Math factors in Turkey. The studies in the literature generally were related to high school or older students. Hence the lower ages of the participants of this study might be an explanation for a different pattern of the findings in Turkey. It can be stated that female students are as comfortable as male students towards STEM fields in Turkey.
Secondly, it was found that students at private schools had higher self-efficacy towards STEM compared to students at public schools. This finding might be related to learning opportunities, teachers' professional development, and class size differences between school types. Many private schools promote STEM education, have STEM laboratories, and invest in robotics and technology competitions at the national and international levels. These activities and opportunities may have a positive influence on private school students. This finding is also consistent with the literature (Chittum, Jones, Akalin, & Schram, 2017;John, Bettye, Ezra, & Robert, 2016;Monterastelli, Bayles, & Ross, 2008). Additionally, teacher-related variables are an important predictor for students' academic performance (Corlu, Capraro, & Capraro, 2014). Teachers working in private schools have more opportunities to take STEM-related professional in-service training. On the other hand, public 174 school students mostly depend on the individual efforts of their teachers. Lastly, class size might be an explanation for the differences because private schools have smaller class sizes than public schools. Other significant differences in the scale scores were found between students who want a STEMrelated career and who do not want a STEM-related career. It was observed that students who want to follow STEM-related careers had higher self-efficacy beliefs on STEM. Having an interest in STEM fields as a future career might affect these students' self-efficacy in STEM fields. Finding significant differences between private and public school students' mean scores and between mean scores of students who want a STEM-related career or not strengthen the validity of the scale. This scale could differentiate scale scores of students who have better opportunities in private schools and who have limited resources in public schools in terms of STEM education. Additionally, this scale could assign different scores for students who want to pursue a career in STEM-related fields and for students who are not willing to pursue such a career. These findings are additional evidence for the validity of the scale (Sireci & Sukin, 2013). Therefore, this reliable and valid scale is expected to contribute to the STEM self-efficacy research in the Turkish context.

Limitations
The main limitation of the study was related to the sampling procedure. As convenience sampling was used, the generalizability of the findings could be limited. Testing the structure of the scale with another sample would provide additional evidence regarding the structure.