This study aims to compare the G and Phi coefficients as estimated by D studies for a measurement tool with the G and Phi coefficients obtained from real cases in which items of differing difficulty levels were added and also to determine the conditions under which the D studies estimated reliability coefficients closer to reality. The study group for this research consisted of 80 seventh-grade students from various public and private secondary schools in the provinces of Ankara, Istanbul, and Adana in Turkey. Four raters who served as Turkish teachers in various public secondary schools in Ankara were included in this study. A data collection tool consisting of 12 tasks was prepared to measure the participating seventh grade students’ written expression skills in Turkish. The equation of the G and Phi coefficients estimated in the D study and obtained through the real cases was observed only when six tasks with item difficulty indexes close to the mean difficulty of the test were added in such a way that the mean difficulty of the test never changed. In other cases, where the mean difficulty of the test changed because of the addition of easy or difficult tasks, it was determined that the reliability coefficients estimated in the D study and obtained in real cases were similar, but they had different values.
Reliability, Generalizability theory, Decision study, Item difficulty index