A Comparison of IRT Vertical Scaling Methods in Determining the Increase in Science Achievement

This study is based on a vertical scaling implemented with reference to the Item Response Theory, and involves a comparison of vertical scaling results obtained through the application of proficiency estimation methods and calibration methods. The vertical scales thus developed were assessed with reference to the criteria of grade-to-grade growth, grade-to-grade variability, and the separation of grade distributions. The data used in the study pertains to a dataset composed of a total of 1500 students from twelve primary schools in the province of Ankara, characterized by different levels of socio-economic cultural development. The comparison of the findings pertaining to the first and the second sub-problems reveals that the mean differences found through separate calibration were lower than those applicable to concurrent calibration, while the standard deviation found in the case of separate calibration were again lower than the values established through concurrent calibration. Furthermore, the scale of impact in the case of separate calibration was again lower than the values applicable to concurrent calibration. The results reached for all three criteria, using the concurrent calibration method were ranked in the order ML < MAP < EAP, with ML leading to the lowest value while EAP producing the highest one. In case of separate calibration, on the other hand, the ranking of results was found to vary with reference to the criteria applied.


INTRODUCTION
Exams applied at schools serve for a wide range of objectives.When deciding on the school a student will attend, or setting the test score a candidate is expected to have for admission for a university, deciding on what to do to enhance the education system, and assessing the changes in educational practices, information derived from exams is used (Kolen, & Brennan, 2004).
In order to ascertain the level of change in academic development from one year to the next, developmental scale scores established by converting the scores pertaining to students at different levels of class into a common scale is used (Kolen, & Brennan, 2004).An awareness of the level of development through the years can provide dependable knowledge about the continuity of success, whereupon improvements at the student and class level can be effected.Large-scale assessments covering the period from K-12 grade involved numerous studies to assess the academic achievement levels of the students.It is necessary to develop a single scale score for all students' performances in all levels for reviewing and comparing academic development through the years and presenting all test scores in a single scale regardless of the year.
The fundamental problem regarding the level of academic development from one year to the next is the differences in the level of difficulty of tests, as well as their contents, even if the general topic may be the same.In order to overcome this issue, a common set of items are directed to students from consecutive years of education and the scores of students at different proficiency levels are converted into a common scale by using these items.
The process of establishing a link between the scores received in tests applied to different years is called vertical scaling (Kolen, & Brennan, 2004;McBridge, & Wise, 2001).The primary reason of applying scaling on test batteries is to provide a developmental scale score to the test developers to enable monitoring the progress in students' achievement levels (Loyd, & Hoover, 1980).
Different data collection designs, scaling methods, calibration methods, proficiency estimation methods or evaluation criteria can be applied in vertical scaling processes.The researchers would be required to make certain decisions about the designs and methods to be used in the scaling process.Such decisions were observed to have an impact on vertical scaling, and therefore the patterns indicating the change in the achievement levels of students (Tong, & Kolen, 2007).There is a brief discussion of the designs and methods chosen for this study.

Data Collection Designs
In equating, the data collection design is often called the "scaling design" (von Davier, & Wilson, 2008).Non-equivalent groups anchor test design, scaling design, and equal-to-group design are the most common used designs in vertical scaling.As the non-equivalent groups anchor test design is used in the present study, the following section will provide a brief description of the method.
The non-equivalent groups anchor test design enables the comparison of the performance of groups with reference to anchor items by building on the overlapping structure of test batteries in elementary education.For each grade, a test compatible with the level of the grade would be developed, and each such test would be applied only to the relevant grade.The test-takers' level of success with the anchor items are then used to establish the level of growth from one year to the next (Kolen & Brennan, 2004).As the design is applied on two non-equivalent groups, it is called nonequivalent groups anchor test (or anchor item) design (NEAT) (von Davier, Holland, & Thayer, 2004).Where anchor items are chosen correctly, this design helps reduce the equating error in the scaling (Hambleton, Swaminathan, & Rogers, 1991;Holland, & Dorans, 2006).

Scaling Design
Each equating method is based on a distinct theory and assumption.The equating methods are categorized as methods based on the Classical Test Theory (CTT) or on the Item Response Theory (IRT), with reference to the underlying theoretical framework.
Equating based on IRT involves the development of a mathematical relationship between the scores in two distinct forms of a test (Dongyang, 2009).Equating methods based on IRT are developed on the basis of the assumption of the existence of a mathematical function defining the relationship between the respondents' proficiency level (θ) and the probability to provide a correct response (Kolen & Brennan, 2004).Understanding, implementing, and explaining IRT methods are harder compared to CTT methods; yet IRT methods are more flexible (Harris, 2003).
One-parameter logistic model, two-parameter logistic model, and three-parameter logistic models may be applied with reference to the scale, in case of items scored on a binary scale (1-0).The present study applies a two-parameter logistic model (2-PLM).

Calibration Methods
When NEAT design is used in vertical scaling, the anchor items enable the establishment of a shared scale linking the test levels of different grades.With NEAT design, IRT parameters are either estimated for each test level by running the program separately, or estimated concurrently as the program is ran only once (Kolen, & Brenan, 2004).These calibration methods are called concurrent and separate calibration methods (Meng, 2007).
Concurrent calibration: Data pertaining to all grades is calibrated at once, to produce a vertical scale in concurrent calibration.The item parameters of the forms are estimated on the basis of the assumption that anchor items present the same item parameters for consecutive grades (Meng, 2007).In this context, the first thing to do is to set a reference grade, followed by the development of a scale with a mean of 0 and standard deviation of 1, pertaining to the scaled proficiency estimations for consecutive grades (Çetin, 2009).The item parameters for the anchor items included in the target test are estimated once again after adjustment to the values of the reference test.The item parameters pertaining to anchor items are known, while IRT calibrations are used to place non-anchor items of the target test with reference to the reference test scale (Meng, 2007).

Separate calibration:
In separate calibration, the item parameters are calculated separately for each grade.As the item and proficiency parameters established separately for two different test forms have different scales, they are not readily comparable.With a view to enabling comparisons, a grade is chosen as the reference level, and θ scale is set as the starting scale for a grade.Item and proficiency parameters' estimation are used to place on the starting scale by using a series of linear conversions, with reference to the anchor items in the NEAT design (Kolen, & Brennan, 2004).Numerous linking procedures were developed in order to place the results obtained through the separate calibration on a single shared scale.The studies comparing various equating methods proposed in the literature recommend the use of Haebara and Stocking Lord (SL) methods utilizing item and test characteristics curves, instead of moment methods applying item parameters (Hanson, & Béguin, 2002;Kim, & Kolen, 2006;Kolen, & Brennan, 2004).Furthermore studies note that SL method generates less error compared to alternative methods (Hanson, & Béguin, 2002;Karkee, & Wright, 2004;Kim, 2007).Therefore, the present study applied Stocking Lord method as a characteristic curve equating method.
Furthermore, the present study compares the results obtained through scaling via both concurrent and separate calibration.

Proficiency Estimation Methods
Once the item parameters are converted into a common scale using an appropriate calibration method, the methods for estimating proficiency level should be decided.Total score or pattern scoring can be used when applying θ proficiency level estimation with reference to item response theory.The total score method, which offers a more practical and simpler approach, is used more frequently compared to the pattern scoring method.However, its error rate is larger compared to pattern scoring, while the amount of information it provides is smaller (Tong, & Kolen, 2010).For proficiency estimation regarding the binary items coded as 1-0 in IRT, often three distinct proficiency estimation methods are used.These are Maximum Likelihood (ML), Maximum A Posteriori (MAP), and Expected A Posteriori (EAP) estimation methods.The present study provides a comparison of the results achieved through all three proficiency estimation methods.

Evaluation Criteria
The final stage of the scaling study involves the comparison of the results obtained.The normative characteristics of developmental scale scores constitute the subject matter of numerous studies.The characteristics of the scale scores are compared in order to be able to compare the results of the vertical scaling analysis.These characteristics refer to grade-to-grade growth, grade-to-grade variability, and separation of grade distributions.Grade-to-grade growth is assessed with reference to mean difference between consecutive grades, grade-to-grade variability is assessed with reference to standard deviation between consecutive grades, and separation of grade distributions are interpreted with reference to the effect size index proposed by Yen (1984) (Kim, 2007;Kolen, & Brennan, 2004).The present study provides a comparison of the results through all three evaluation criteria.

Purpose of the Study
The literature has not yet to come up with a common view about which method reveal the best and most accurate depiction of the increase in the level of the students' achievement.Nevertheless, vertical scaling is used by numerous test developers, and every test developer determine its own vertical scaling processes (Tong, & Kolen, 2007).
Vertical scaling as a means of revealing the development of students' achievement from one grade to the next, has subsequently became an important field, and there is an increase in the number of the vertical scaling studies.The present study can provide a model about monitoring of the development in terms of students' achievement levels.
A glance at the literature reveals the rarity of studies based on real data, while studies based on simulated data are more common.The present study, on the other hand, is based on the results of science achievement tests applied with 1500 students enrolled in six different schools.In this vein, the study is expected to contribute to the literature as a model based on real data.
The purpose of the study is to implement a vertical scaling analysis based on the item response theory, and to come up with a comparison of the developmental scale scores established through the application of calibration methods (separate and concurrent calibration) and estimation methods (maximum likelihood, maximum a posteriori, and expected a posteriori estimation), with reference to the mean, standard deviation and effect size.That is why the study discusses the grade-to-grade growth, grade-to-grade variability, and separation of grade distribution characteristics pertaining to developmental scale scores.Mean and mean differences were employed to assess grade-to-grade growth, standard deviation figures for each grade were used to assess the grade-to-grade variability, and effect size were analyzed to assess the separation of grade distribution.

Research Questions
This study maintains vertical scales over three forms and investigated the question "How does the evaluation criteria vary by using various calibration methods and proficiency estimation methods in terms of vertical scaling on the basis of item response theory?".Specifically, the research questions to be investigated in line with this problem statement are as below: 1. How do; a. grade-to-grade growth, b. grade-to-grade variability, and c. separation of grade distribution vary with respect to maximum likelihood, maximum a posteriori, and expected a posteriori estimations using concurrent calibration?2. How do; a. grade-to-grade growth, b. grade-to-grade variability, and c. separation of grade distribution vary with respect to maximum likelihood, maximum a posteriori, and expected a posteriori estimations using separate calibration?

Type of Study
Because the existing methods and techniques in the research were tested through real data, and since the aim was to contribute to theoretical studies by designating the methods with minimum error, the research is a fundamental study (Creswell, 2013).

Participants
The participants of the study consist of 6 th , 7 th , and 8 th grades.The data used in the study were gathered from a total of 1500 students from 12 distinct schools; two from each of the Altindag, Cankaya, Golbasi, Kecioren, Sincan, and Mamak districts of Ankara province.
The science achievement test applied was developed using items selected out of Placement Exam (SBS), High School Entrance Examination (OKS), and Free Boarding and Scholarship Examination (PYBS) applied between the years 2008-2012 by checking the item discrimination and item difficulty indices, whereupon the items were compiled to achievement tests of 40 items for each of the three grades.Ten items were identified as anchor items to enable chain scaling between consecutive grades.While Hambleton, Swaminathan and Rogers (1991) note that 20% of the overall test would be a sufficient guideline to establish the number of anchor items, many studies note that increase in the number of anchor items would help reduce the standard deviation regarding the assessment sought through the test (Boughton, Lorie, & Yao, 2005;Kim, Lee, Kim, & Kelley, 2009).Therefore, the present study employed an anchor item ratio of 25% of the total number of items.

Research Design
In this research, the non-equivalent groups anchor item design was used.Even though the design is one of the most frequently employed ones, it is also one of the most flexible and most complex ___________________________________________________________________________________ designs (Sinharay, & Holland, 2007).Even though it is a design preferred on practical grounds, it is also less restrictive compared to other designs (Zhu, 1998).

Data Analysis
Before running the analyses, data was subjected to preprocessing to remove incomplete or missing data from the dataset.Furthermore, the scores received from the science achievement test were checked for unidimensionality, local independence, and model-data fit compliance among major IRT assumptions.
When unidimensional Item Response Theory (IRT) is used for equating, it is necessary to test the unidimensionality assumption for the tests (Hambleton, & Swaminathan, 1985).In order to test the unidimensionality assumption of the item response theory, confirmatory factor analysis (CFA) was applied to all three grade levels of the science tests given to students, leading to the testing of the model for a significance level of 0.05.Numerous goodness of fit indices are used in order to evaluate the model-data fit.Among these, the most frequently used indices, namely Chi-Squared Test (χ2 / sd), Root Mean Square Error of Approximation (RMSEA), Goodness of Fit Index (GFI), Adjusted Goodness of Fit Index (AGFI), Comparative Fit Index (CFI), and Normed Fit Index (NFI) were checked.The obtained results are presented in Table 1.A review of the goodness of fit indices obtained through CFA analysis and presented in Table 1 reveals that the model presents a high level of fit for all three grades, and the model meets the requirements of the unidimensionality assumption.Based on the CFA analysis, it can be said that data meets the unidimensionality assumption; hence the science achievement test assesses a single feature in all grades involved.
Local independence means that a response given to each item is independent from others, and the possibility of giving a positive answer to an item is not affected by other items.When the proficiency level is fixed, the correlation between items is expected to approach to zero.With a view to meeting the requirements of the local independence assumption, where just a single proficiency is required for responding all items, these items are considered unidimensional (Nandakumar, 1994).The compliance with the unidimensionality assumption can provide evidence regarding the local independence assumption (Hambleton, Swaminathan, & Rogers, 1991;Lord, & Novick, 1968).
Given the fact that the present study meets the requirements of the unidimensionality assumption, it is also deemed to have met the requirements of the local independence assumption.
Once the assumptions were tested in accordance with the Item Response Theory, model-data fit was checked in order to identify the model offering the highest level of fit with the data set.The fit statistics calculated through separate calibrations for each grade revealed a state of affairs wherein ___________________________________________________________________________________ the 1 Parameter Logistics Model (PLM) and 2 PLM had model-data fit, while no model-data fit was observed for 3 PLM.Therefore, the analyses were applied in line with 2 PLM model.

FINDINGS and INTERPRETATION
The findings of the study and the results obtained with reference to grade levels, calibration methods, and proficiency estimation methods employed were reviewed in light of mean, standard deviation, and effect size criteria.
In order to come up with an answer to first sub-problem, data pertaining to all grade levels were compiled in a single file, and all data were calibrated concurrently, using the software BILOG-MG 3.
Concurrent calibration method was applied to estimate the item and proficiency parameters for each grade.The θ proficiency level means, mean differences, standard deviations and effect size values were established on the basis of ML, EAP and MAP proficiency estimation methods.The values thus calculated are presented below, in Table 2.

___________________________________________________________________________________
As shown in both Table 2 and Graph 1 reveals, the means calculated through concurrent calibration on the basis of the data from the science test suggest that the proficiency level of the students increase as they progress from grade 6 th to 8 th .The review of mean differences with a view to ascertaining the criteria of development between individual grades suggests that the highest mean difference figures were observed with EAP, while the lowest ones were achieved with ML method.
The review of standard deviations, to assess the variability criteria between individual grades, on the other hand, reveals that the standard deviation fell as one moved from 6 th grade to 8 th , and the highest standard deviation was observed with EAP, while ML produced the lowest ones.As 7 th grade was chosen as the reference year, all estimation methods stipulated a standard deviation of one (1) for that grade.
The analysis of effect sizes, with a view to evaluate the differentiation criteria between level distributions, reveals that effect size grew from 6 th grade to 8 th , with the largest effect sizes were observed with EAP, while the lowest ones were obtained with ML method.An analysis of the figures in Table 2 reveals that the effect size changes between the 6 th and 7 th grade can be considered small, while the one between the 7 th and 8 th is medium.
These findings run in parallel to the studies by Tong and Kolen (2010) and Kim (2007), using concurrent calibration method.Furthermore, the studies by Meng, Kolen and Lohman (2006) and Tong (2005) also found, in a similar vein, that the smallest effect size value was obtained through ML estimation.
In order to come up with an answer to second sub-problem, data for each grade level were calibrated separately using 2PLM.Item and proficiency parameters were calculated using BILOG-MG 3 software.In order to present the parameter estimations for each grade on the scale for the 7 th grade, which is accepted as the reference level, the ST (Hanson, Zeng, & Chien, 2004) software, which is calculating IRT scaling constants and written in C programming language, was used.And also, Stocking Lord method was used to estimate the gradient and intersection values as a characteristics curve method.
Quadrature points are used for conversions applying Stocking Lord method.The analyses required for the calculation of Quadrature points were affected using the icl_win software.The quadrature points established thus were added to codes, to come up with SL conversion.
SL method was applied using the test-characteristic curves.The slope and intersection values produced are presented below in Table 3.The conversions are effected using the constants A and B obtained through the SL conversion presented in Table 3.Since 7 th grade is set as the reference level, when converting the 6 th grade to the 7 th , proficiency estimations are effected through the equation "θnew=θold x 1.121 + (0.767)".On the other hand, conversion of the 8 th grade to the 7 th is done through the equation θnew=θold x 1.574 + (-0.962).A two-step conversion is required for transition from the 8 th grade to the 6 th .The equation θnew=(θold x 1.121 + (0.767)) x 1.574 + (-0.962) was used for the conversion of the 8 th grade.The intersection values between the 6 th and the 7 th grades are positive, while those between the 7 th and the 8 th are negative.
Estimation was effected using separate calibration method with the BILOG-MG 3 software using the calculated estimation values as well.The θ proficiency level means, mean differences, standard ___________________________________________________________________________________ deviations and effect size values were established on the basis of ML, EAP and MAP proficiency estimation methods.The results are presented below in Table 4.As seen in both Table 4 and Graph 2 reveals, the means calculated through separate calibration on the basis of the data from the science test suggest that the proficiency level of the students increase as they progress from grade 6 th to 7 th , and fall from grade 7 th to 8 th .Mean differences, which reflect the level of improvement from one grade to another allows a better understanding of this criterion.
While the mean differences are positive between grades 6 th and 7 th , they are negative between grades 7 th and 8 th , and tend to fall from grade 6 th to 8 th .This finding can be interpreted as the fact that the 7 th grade students are more successful than the 8 th grade students and that the desired and expected growth from one class level to the other class level cannot be achieved.The reason for the 8 th grade students being less successful than the 7 th grade may be the TEOG (Basic Education to Secondary Transition) exam.The increase in students' anxiety levels may have adversely affected their success.
In addition, the fact that eighth grade students have entered adolescence may have affected their psychology and achievements negatively.In the study of Briggs, Weeks and Wiley (2009), parallel to this finding, it was stated that the growth patterns did not show an increase from one year to the

___________________________________________________________________________________
other year as a linear.It seems that there are studies supporting this finding in the literature (Tong, & Kolen, 2008;Cetin, 2009;Wysel, & Reckase, 2011;Altun, 2013).In Tong and Kolen (2010)'s study, it was found that the mean difference was higher in the lower class levels and the mean difference decreased as the class level increased.Similar to the results of Tong and Kolen's (2010) study, Ito, Skykes and Yao (2008)'s and Tong and Kolen (2007)'s studies, compared vertical scaling methods, have stated that he increase in the scores of the students in the lower grade level is higher than the increase in the scores of the students in the higher grade level.As a result of the IRT analyzes the scores of the students increase and decrease according to the grade levels.In other words, the success levels of unsuccessful students are increasing in 6 th grade to 7 th grade, compared to the transition from 7 th grade to 8 th grade.And, when the estimation methods are compared, it is seen that the highest mean differences were obtained with ML, while EAP produced the lowest ones.
A glance at standard deviation figures shows that overall standard deviation between grades 6 th and 8 th tend to fall.While the lowest standard deviation is established with ML method, EAP produced the highest level of standard deviation.
The analysis of effect sizes indicates that in all three methods, effect sizes tend to fall towards grade 8 th , with the largest effect sizes being observed when ML is applied, in contrast to the smallest ones are obtained through EAP.An analysis of the figures in Table 4 reveals that the effect size changes between the 6 th and 7 th grades as well as between the 7 th and 8 th grades can be interpreted as a weak effect.The review of the literature reveals that these findings run in parallel to those of Tong and Kolen (2007).

DISCUSSION and CONCLUSIONS
The objective of this study is to apply vertical scaling based on item response theory, leading to a comparison of calibration methods and proficiency estimation methods, and the developmental vertical scale scores calculated with reference to the mean, standard deviation, and effect size values.
The means calculated through concurrent calibration on the basis of the data from the science test showed that the proficiency level of the students increase as they progress from grade 6 th to 8 th .The mean differences for all three grades present a picture where largest differences are produced with EAP method.A glance at standard deviation figures shows that standard deviation between grades 6th and 8th tends to fall, and the lowest standard deviation value is established with ML method.Effect size picture suggests an increase from grade 6 th to 8 th , with the largest effect size values being produced with EAP method.
When the separate calibration method is applied as another calibration, the developmental scale scores present an increase in the means from grade 6 th to 8 th , while mean differences fall approaching from 6 th to grade 8 th .The highest mean difference was observed with EAP method.The mean differences generated through separate calibration were also notably lower than those generated through concurrent calibration.Standard deviation picture presents falling rates as one move from grade 6 th towards 8 th .The lowest standard deviation was observed with ML method.The standard deviation values calculated in separate calibration were generally lower than those produced through concurrent calibration.On the effect size front, it is observed that the effect sizes values decreasing from 6 th grade to 8 th grade.The highest effect size was observed with ML method.The effect size values calculated in separate calibration were lower than those produced through concurrent calibration.
The comparison of the findings pertaining to the first and the second sub-problems reveals that the mean differences found through separate calibration were lower than those applicable to concurrent calibration, while the standard deviation found in the case of separate calibration were again lower than the values established through concurrent calibration.Furthermore, the scale of impact in the case of separate calibration was again lower than the values applicable to concurrent calibration.The results reached for all three criteria, using the concurrent calibration method were ranked in the order ML < MAP < EAP, with ML leading to the lowest value while EAP producing the highest one.In ___________________________________________________________________________________ case of separate calibration, on the other hand, the ranking of results was found to vary with reference to the criteria applied.
The conclusions reached through the study reveal that vertical scaling is a complex process, and that there is no single all-applicable method.Since there is no single method supported by a wide-ranging consensus, taking into account the complexities of the methods applied and the results of the analyses, it is recommended that the researcher should decide on the method to apply, within the context of her specific study.The interactions between the issues discussed in this process can have an impact on the results of vertical scaling, and hence on the interpretations about the ongoing development of the students' achievements, one can recommend effective comparisons employing a range of methods, to lead to decisions regarding the achievements of students.Hanson and Béguin (2002) also emphasized that no single all-applicable method can be designated, and that comparing results through a combination of various equating methods under different conditions is the way to go.
Such an analysis should actually be considered an inherent part of the overall vertical scaling process.Test developers and users can be recommended to work on the process of equating the observed and actual scores in the final stage of the vertical scaling process, with the review of factors affecting observed scores.
Achievement levels of the students were observed to increase as one move from earlier grades to subsequent ones.However, further studies may be needed to assess whether such increases are at required levels or not.In order to ascertain the level of change students experience from one grade to another, vertical scaling practices are crucial.Vertical scaling assessments can be recommended to review the students' achievements at the K-12 level.
In the present study, test length (40 items), number of anchor items (10), sample size (1500), and applied model (2PLM) were fixed, and not subjected to analysis as determining factors or independent variables.Other studies can use these as variables in their own right, and investigate their impact on vertical scaling results as well.It is also possible to carry out a longitudinal study to review the achievement levels of individual students through extended years, followed up by an analysis on the basis of data from such longitudinal study.Since there is no single and exact criteria to assess the applicability of the methods employed in vertical scaling, the researchers are recommended to use more than one evaluation criteria (mean, mean differences, standard deviation, effect sizes, vertical distance, root-mean square error of approximation (RMSEA) and bias values) when comparing scaling results.
of Values Obtained Through the Concurrent Calibration Method: (a) Mean Differences, (b) Standard Deviations, (c) Effect Size.Albayrak Sari, A., Kelecioglu H. / A Comparison of IRT Vertical Scaling Methods in Determining the Increase in Science Achievement Graphs of Values Obtained Through the Separate Calibration Method: (a) Mean Differences, (b) Standard Deviations, (c) Effect Sizes.

Table 1 .
Good Fit Indices Calculated Through Confirmatory Factor Analyses for Science Test

Table 2 .
Results of ML, EAP, and MAP Proficiency Estimation Obtained for Science Test through Concurrent Calibration Method

Table 2
presents the evaluation criteria values for each grade.The graphs pertaining to these values are shown below, in Graph 1.

Table 3 .
Constants A and B calculated for Stocking Lord Conversion

Table 4 .
The Results of ML, EAP, and MAP Proficiency Estimations Obtained for Science Test through Separate Calibration Method

Table 4
presents the evaluation criteria values for each grade.To present a clearer picture of these figures, the graphs pertaining to these values are shown below, in Graph 2.