Automatic story and item generation for reading comprehension assessments with transformers

Okan Bulut; Seyma Nur Yildirim-erbasli

doi:10.21449/ijate.1124382

Research Article

Automatic story and item generation for reading comprehension assessments with transformers

Year 2022, Volume: 9 Issue: Special Issue, 72 - 87, 29.11.2022

Okan Bulut , Seyma Nur Yildirim-erbasli

https://doi.org/10.21449/ijate.1124382

Cited By: 7

Abstract

Reading comprehension is one of the essential skills for students as they make a transition from learning to read to reading to learn. Over the last decade, the increased use of digital learning materials for promoting literacy skills (e.g., oral fluency and reading comprehension) in K-12 classrooms has been a boon for teachers. However, instant access to reading materials, as well as relevant assessment tools for evaluating students’ comprehension skills, remains to be a problem. Teachers must spend many hours looking for suitable materials for their students because high-quality reading materials and assessments are primarily available through commercial literacy programs and websites. This study proposes a promising solution to this problem by employing an artificial intelligence (AI) approach. We demonstrate how to use advanced language models (e.g., OpenAI’s GPT-2 and Google’s T5) to automatically generate reading passages and items. Our preliminary findings suggest that with additional training and fine-tuning, open-source language models could be used to support the instruction and assessment of reading comprehension skills in the classroom. For both automatic story and item generation, the language models performed reasonably; however, the outcomes of these language models still require a human evaluation and further adjustments before sharing them with students. Practical implications of the findings and future research directions are discussed.

Keywords

Reading comprehension , Natural language processing , Automatic item generation , Language modeling , Text generation

References

Agosto, D.E. (2016). Why storytelling matters: Unveiling the literacy benefits of storytelling. Children and Libraries, 14(2), 21-26. https://doi.org/10.5860/cal.14n2.21
Allington, R.L., McGill-Franzen, A., Camilli, G., Williams, L., Graff, J., Zeig, J., Zmach, C., & Nowak, R. (2010). Addressing summer reading setback among economically disadvantaged elementary students. Reading Psychology, 31(5), 411 427. https://doi.org/10.1080/02702711.2010.505165
Basu, S., Ramachandran, G.S., Keskar, N.S., & Varshney, L.R. (2020). Mirostat: A neural text decoding algorithm that directly controls perplexity. arXiv preprint. https://doi.org/10.48550/arXiv.2007.14966
Begeny, J.C., & Greene, D.J. (2014). Can readability formulas be used to successfully gauge difficulty of reading materials? Psychology in the Schools, 51(2), 198 215. https://doi.org/10.1002/pits.21740
Bigozzi, L., Tarchi, C., Vagnoli, L., Valente, E., & Pinto, G. (2017). Reading fluency as a predictor of school outcomes across grades 4-9. Frontiers in Psychology, 8(200), 1-9. https://doi.org/10.3389/fpsyg.2017.00200
Bulut, H.C., Bulut, O., & Arikan, S. (2022). Evaluating group differences in online reading comprehension: The impact of item properties. International Journal of Testing. Advance online publication. https://doi.org/10.1080/15305058.2022.2044821
Das, B., Majumder, M., Phadikar, S., & Sekh, A.A. (2021). Automatic question generation and answer assessment: A survey. Research and Practice in Technology Enhanced Learning, 16(1), 1-15. https://doi.org/10.1186/s41039-021-00151-1
Denkowski, M., & Lavie, A. (2014, June). Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the ninth workshop on statistical machine translation (pp. 376-380).
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). Bert: Pre training of deep bidirectional transformers for language understanding. arXiv preprint. https://doi.org/10.48550/arXiv.1810.04805
Dong, X., Hong, Y., Chen, X., Li, W., Zhang, M., & Zhu, Q. (2018, August). Neural question generation with semantics of question type. In CCF International Conference on Natural Language Processing and Chinese Computing (pp. 213-223). Springer, Cham.
Du, X., & Cardie, C. (2017, September). Identifying where to focus in reading comprehension for neural question generation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2067 2073). https://doi.org/10.18653/v1/D17-1219
Du, X., Shao, J., & Cardie, C. (2017). Learning to ask: Neural question generation for reading comprehension. arXiv preprint. https://doi.org/10.48550/arXiv.1705.00106
Duan, N., Tang, D., Chen, P., & Zhou, M. (2017, September). Question generation for question answering. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 866-874). https://doi.org/10.18653/v1/D17-1090
Duke, N.K., & Pearson, P.D. (2009). Effective practices for developing reading comprehension. Journal of Education, 189(1/2), 107 122. https://doi.org/10.1177/0022057409189001-208
Duke, N.K., Pearson, P.D., Strachan, S.L., & Billman, A.K. (2011). Essential elements of fostering and teaching reading comprehension. What research has to say about reading instruction, 4, 286-314.
Fan, A., Lewis, M., & Dauphin, Y. (2018). Hierarchical neural story generation. arXiv preprint. https://doi.org/10.48550/arXiv.1805.04833
Guthrie, J.T. (2004). Teaching for literacy engagement. Journal of Literacy Research, 36(1), 1-30. https://doi.org/10.1207/s15548430jlr3601_2
Heilman, M., & Smith, N.A. (2010, June). Good question! Statistical ranking for question generation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 609-617).
Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2019). The curious case of neural text degeneration. arXiv preprint. https://doi.org/10.48550/arXiv.1904.09751
Holtzman, A., Buys, J., Forbes, M., Bosselut, A., Golub, D., & Choi, Y. (2018) Learning to write with cooperative discriminators. arXiv preprint. https://doi.org/10.48550/arXiv.1805.06087
Kim, J.S., & White, T.G. (2008). Scaffolding voluntary summer reading for children in grades 3 to 5: An experimental study. Scientific Studies of Reading, 12(1), 1 23. https://doi.org/10.1080/10888430701746849
Kulikov, I., Miller, A.H., Cho, K., & Weston, J. (2018). Importance of search and evaluation strategies in neural dialogue modelling. arXiv preprint. https://doi.org/10.48550/arXiv.1811.00907
Liu, B. (2020, April). Neural question generation based on Seq2Seq. In Proceedings of the 2020 5th International Conference on Mathematics and Artificial Intelligence (pp. 119-123).
Lin, C.Y. (2004, July). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74-81).
Miller, S., & Pennycuff, L. (2008). The power of story: Using storytelling to improve literacy learning. Journal of Cross-Disciplinary Perspectives in Education, 1(1), 36-43.
Pan, L., Lei, W., Chua, T.S., & Kan, M.Y. (2019). Recent advances in neural question generation. arXiv preprint arXiv: https://doi.org/10.48550/arXiv.1905.08949
Papineni, K., Roukos, S., Ward, T., & Zhu, W.J. (2002, July). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318).
Peck, J. (1989). Using storytelling to promote language and literacy development. The Reading Teacher, 43(2), 138-141. https://www.jstor.org/stable/20200308
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI tech report.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI tech report.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P.J. (2019). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint. https://doi.org/10.48550/arXiv.1910.10683
Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2383–2392).
Rasinski, T.V. (2012). Why reading fluency should be hot! The Reading Teacher, 65(8), 516-522. https://doi.org/10.1002/TRTR.01077
Rus, V., Wyse, B., Piwek, P., Lintean, M., Stoyanchev, S., & Moldovan. C. (2012). A detailed account of the first question generation shared task evaluation challenge. Dialogue and Discourse, 3(2), 177–204. https://doi.org/10.5087/dad
Sáenz, L.M., & Fuchs, L.S. (2002). Examining the reading difficulty of secondary students with learning disabilities: Expository versus narrative text. Remedial and Special Education, 23(1), 31-41.
See, A., Pappu, A., Saxena, R., Yerukola, A., & Manning, C.D. (2019). Do massively pretrained language models make better storytellers? arXiv preprint. https://doi.org/10.48550/arXiv.1909.10705
Sun, X., Liu, J., Lyu, Y., He, W., Ma, Y., & Wang, S. (2018). Answer-focused and position-aware neural question generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 3930-3939).
Tang, D., Duan, N., Qin, T., Yan, Z., & Zhou, M. (2017). Question answering and question generation as dual tasks. arXiv preprint. https://doi.org/10.48550/arXiv.1706.02027
Taylor, B.M., Pearson, P.D., Clark, K., & Walpole, S. (2000). Effective schools and accomplished teachers: Lessons about primary-grade reading instruction in low-income schools. The Elementary School Journal, 101(2), 121 165. https://doi.org/10.1086/499662
Taylor, B.M., Pearson, P.D., Peterson, D.S., & Rodriguez, M.C. (2003). Reading growth in high-poverty classrooms: The influence of teacher practices that encourage cognitive engagement in literacy learning. The Elementary School Journal, 104(1), 3 28. https://doi.org/10.1086/499740
Tivnan, T., & Hemphill, L. (2005). Comparing four literacy reform models in high-poverty schools: Patterns of first-grade achievement. The Elementary School Journal, 105(5), 419-441. https://doi.org/10.1086/431885
Wang, B., Wang, X., Tao, T., Zhang, Q., & Xu, J. (2020, April). Neural question generation with answer pivot. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 9138-9145).
Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., & Zhou, M. (2017, November). Neural question generation from text: A preliminary study. In National CCF Conference on Natural Language Processing and Chinese Computing (pp. 662-671). Springer, Cham.

Automatic story and item generation for reading comprehension assessments with transformers

Year 2022, Volume: 9 Issue: Special Issue, 72 - 87, 29.11.2022

Okan Bulut , Seyma Nur Yildirim-erbasli

https://doi.org/10.21449/ijate.1124382

Cited By: 7

Abstract

Keywords

Reading comprehension , Natural language processing , Automatic item generation , Language modeling , Text generation

References

Agosto, D.E. (2016). Why storytelling matters: Unveiling the literacy benefits of storytelling. Children and Libraries, 14(2), 21-26. https://doi.org/10.5860/cal.14n2.21
Allington, R.L., McGill-Franzen, A., Camilli, G., Williams, L., Graff, J., Zeig, J., Zmach, C., & Nowak, R. (2010). Addressing summer reading setback among economically disadvantaged elementary students. Reading Psychology, 31(5), 411 427. https://doi.org/10.1080/02702711.2010.505165
Basu, S., Ramachandran, G.S., Keskar, N.S., & Varshney, L.R. (2020). Mirostat: A neural text decoding algorithm that directly controls perplexity. arXiv preprint. https://doi.org/10.48550/arXiv.2007.14966
Begeny, J.C., & Greene, D.J. (2014). Can readability formulas be used to successfully gauge difficulty of reading materials? Psychology in the Schools, 51(2), 198 215. https://doi.org/10.1002/pits.21740
Bigozzi, L., Tarchi, C., Vagnoli, L., Valente, E., & Pinto, G. (2017). Reading fluency as a predictor of school outcomes across grades 4-9. Frontiers in Psychology, 8(200), 1-9. https://doi.org/10.3389/fpsyg.2017.00200
Bulut, H.C., Bulut, O., & Arikan, S. (2022). Evaluating group differences in online reading comprehension: The impact of item properties. International Journal of Testing. Advance online publication. https://doi.org/10.1080/15305058.2022.2044821
Das, B., Majumder, M., Phadikar, S., & Sekh, A.A. (2021). Automatic question generation and answer assessment: A survey. Research and Practice in Technology Enhanced Learning, 16(1), 1-15. https://doi.org/10.1186/s41039-021-00151-1
Denkowski, M., & Lavie, A. (2014, June). Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the ninth workshop on statistical machine translation (pp. 376-380).
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). Bert: Pre training of deep bidirectional transformers for language understanding. arXiv preprint. https://doi.org/10.48550/arXiv.1810.04805
Dong, X., Hong, Y., Chen, X., Li, W., Zhang, M., & Zhu, Q. (2018, August). Neural question generation with semantics of question type. In CCF International Conference on Natural Language Processing and Chinese Computing (pp. 213-223). Springer, Cham.
Du, X., & Cardie, C. (2017, September). Identifying where to focus in reading comprehension for neural question generation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2067 2073). https://doi.org/10.18653/v1/D17-1219
Du, X., Shao, J., & Cardie, C. (2017). Learning to ask: Neural question generation for reading comprehension. arXiv preprint. https://doi.org/10.48550/arXiv.1705.00106
Duan, N., Tang, D., Chen, P., & Zhou, M. (2017, September). Question generation for question answering. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 866-874). https://doi.org/10.18653/v1/D17-1090
Duke, N.K., & Pearson, P.D. (2009). Effective practices for developing reading comprehension. Journal of Education, 189(1/2), 107 122. https://doi.org/10.1177/0022057409189001-208
Duke, N.K., Pearson, P.D., Strachan, S.L., & Billman, A.K. (2011). Essential elements of fostering and teaching reading comprehension. What research has to say about reading instruction, 4, 286-314.
Fan, A., Lewis, M., & Dauphin, Y. (2018). Hierarchical neural story generation. arXiv preprint. https://doi.org/10.48550/arXiv.1805.04833
Guthrie, J.T. (2004). Teaching for literacy engagement. Journal of Literacy Research, 36(1), 1-30. https://doi.org/10.1207/s15548430jlr3601_2
Heilman, M., & Smith, N.A. (2010, June). Good question! Statistical ranking for question generation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 609-617).
Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2019). The curious case of neural text degeneration. arXiv preprint. https://doi.org/10.48550/arXiv.1904.09751
Holtzman, A., Buys, J., Forbes, M., Bosselut, A., Golub, D., & Choi, Y. (2018) Learning to write with cooperative discriminators. arXiv preprint. https://doi.org/10.48550/arXiv.1805.06087
Kim, J.S., & White, T.G. (2008). Scaffolding voluntary summer reading for children in grades 3 to 5: An experimental study. Scientific Studies of Reading, 12(1), 1 23. https://doi.org/10.1080/10888430701746849
Kulikov, I., Miller, A.H., Cho, K., & Weston, J. (2018). Importance of search and evaluation strategies in neural dialogue modelling. arXiv preprint. https://doi.org/10.48550/arXiv.1811.00907
Liu, B. (2020, April). Neural question generation based on Seq2Seq. In Proceedings of the 2020 5th International Conference on Mathematics and Artificial Intelligence (pp. 119-123).
Lin, C.Y. (2004, July). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74-81).
Miller, S., & Pennycuff, L. (2008). The power of story: Using storytelling to improve literacy learning. Journal of Cross-Disciplinary Perspectives in Education, 1(1), 36-43.
Pan, L., Lei, W., Chua, T.S., & Kan, M.Y. (2019). Recent advances in neural question generation. arXiv preprint arXiv: https://doi.org/10.48550/arXiv.1905.08949
Papineni, K., Roukos, S., Ward, T., & Zhu, W.J. (2002, July). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318).
Peck, J. (1989). Using storytelling to promote language and literacy development. The Reading Teacher, 43(2), 138-141. https://www.jstor.org/stable/20200308
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI tech report.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI tech report.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P.J. (2019). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint. https://doi.org/10.48550/arXiv.1910.10683
Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2383–2392).
Rasinski, T.V. (2012). Why reading fluency should be hot! The Reading Teacher, 65(8), 516-522. https://doi.org/10.1002/TRTR.01077
Rus, V., Wyse, B., Piwek, P., Lintean, M., Stoyanchev, S., & Moldovan. C. (2012). A detailed account of the first question generation shared task evaluation challenge. Dialogue and Discourse, 3(2), 177–204. https://doi.org/10.5087/dad
Sáenz, L.M., & Fuchs, L.S. (2002). Examining the reading difficulty of secondary students with learning disabilities: Expository versus narrative text. Remedial and Special Education, 23(1), 31-41.
See, A., Pappu, A., Saxena, R., Yerukola, A., & Manning, C.D. (2019). Do massively pretrained language models make better storytellers? arXiv preprint. https://doi.org/10.48550/arXiv.1909.10705
Sun, X., Liu, J., Lyu, Y., He, W., Ma, Y., & Wang, S. (2018). Answer-focused and position-aware neural question generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 3930-3939).
Tang, D., Duan, N., Qin, T., Yan, Z., & Zhou, M. (2017). Question answering and question generation as dual tasks. arXiv preprint. https://doi.org/10.48550/arXiv.1706.02027
Taylor, B.M., Pearson, P.D., Clark, K., & Walpole, S. (2000). Effective schools and accomplished teachers: Lessons about primary-grade reading instruction in low-income schools. The Elementary School Journal, 101(2), 121 165. https://doi.org/10.1086/499662
Taylor, B.M., Pearson, P.D., Peterson, D.S., & Rodriguez, M.C. (2003). Reading growth in high-poverty classrooms: The influence of teacher practices that encourage cognitive engagement in literacy learning. The Elementary School Journal, 104(1), 3 28. https://doi.org/10.1086/499740
Tivnan, T., & Hemphill, L. (2005). Comparing four literacy reform models in high-poverty schools: Patterns of first-grade achievement. The Elementary School Journal, 105(5), 419-441. https://doi.org/10.1086/431885
Wang, B., Wang, X., Tao, T., Zhang, Q., & Xu, J. (2020, April). Neural question generation with answer pivot. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 9138-9145).
Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., & Zhou, M. (2017, November). Neural question generation from text: A preliminary study. In National CCF Conference on Natural Language Processing and Chinese Computing (pp. 662-671). Springer, Cham.

There are 43 citations in total.

Details

Primary Language	English
Subjects	Other Fields of Education
Journal Section	Special Issue
Authors	Okan Bulut 0000-0001-5853-1267 Seyma Nur Yildirim-erbasli This is me 0000-0002-8010-9414
Early Pub Date	November 17, 2022
Publication Date	November 29, 2022
Submission Date	June 1, 2022
Published in Issue	Year 2022 Volume: 9 Issue: Special Issue

Cite

APA	Bulut, O., & Yildirim-erbasli, S. N. (2022). Automatic story and item generation for reading comprehension assessments with transformers. International Journal of Assessment Tools in Education, 9(Special Issue), 72-87. https://doi.org/10.21449/ijate.1124382