A review of automatic item generation techniques leveraging large language models

Bin Tan; Nour Armoush; Elisabetta Mazzullo; Okan Bulut; Mark Gierl

doi:10.21449/ijate.1602294

EN TR

A review of automatic item generation techniques leveraging large language models

Abstract

This study reviews existing research on the use of large language models (LLMs) for automatic item generation (AIG). We performed a comprehensive literature search across seven research databases, selected studies based on predefined criteria, and summarized 60 relevant studies that employed LLMs in the AIG process. We identified the most commonly used LLMs in current AIG literature, their specific applications in the AIG process, and the characteristics of the generated items. We found that LLMs are flexible and effective in generating various types of items across different languages and subject domains. However, many studies have overlooked the quality of the generated items, indicating a lack of a solid educational foundation. Therefore, we share two suggestions to enhance the educational foundation for leveraging LLMs in AIG, advocating for interdisciplinary collaborations to exploit the utility and potential of LLMs.

Keywords

Büyük Dil Modelleri Kullanan Otomatik Madde Üretimi Yöntemlerinin İncelenmesi

Öz

Bu çalışma, otomatik madde üretimi (AIG) için büyük dil modelleri (LLM) kullanımına ilişkin mevcut araştırmaları gözden geçirmektedir. Yedi araştırma veritabanında kapsamlı bir literatür taraması yaptık, önceden tanımlanmış kriterlere göre çalışmalar seçtik ve AIG sürecinde LLM'leri kullanan 60 ilgili çalışmayı özetledik. Mevcut AIG literatüründe en sık kullanılan LLM'leri, AIG sürecindeki özel uygulamalarını ve üretilen maddelerin özelliklerini belirledik. LLM'lerin farklı diller ve konu alanları arasında çeşitli madde türleri üretmede esnek ve etkili olduğunu bulduk. Ancak, birçok çalışma üretilen maddelerin kalitesini göz ardı etti ve bu da sağlam bir eğitim temelinin eksikliğini gösteriyor. Bu nedenle, AIG'de LLM'leri değerlendirmek için eğitim temellerini geliştirmek üzere iki öneri paylaşıyoruz ve LLM'lerin faydasını ve potansiyelini kullanmak için disiplinler arası işbirliklerini savunuyoruz.

Anahtar Kelimeler

References

Ackerman, R., & Balyan, R. (2023). Automatic multilingual question generation for health data using LLMs. In International Conference on AI-generated Content (pp. 1-11). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-99-7587-7_1
Agrawal, A., & Shukla, P. (2023). Context aware automatic subjective and objective question generation using Fast Text to text transfer learning. International Journal of Advanced Computer Science and Applications, 14(4), 456-463.
Aigo, K., Tsunakawa, T., Nishida, M., & Nishimura, M. (2021). Question generation using knowledge graphs with the T5 language model and masked self-attention. In 2021 IEEE 10th Global Conference on Consumer Electronics (pp. 85 87). IEEE. https://doi.org/10.1109/GCCE53005.2021.9621874
Akyön, F.Ç., Cavusoglu, A.D.E., Cengiz, C., Altinuç, S.O., & Temizel, A. (2022). Automated question generation and question answering from Turkish texts. Turkish Journal of Electrical Engineering and Computer Sciences, 30(5), 1931 1940. https://doi.org/10.55730/1300-0632.3914
Alsubait, T., Parsia, B., & Sattler, U. (2016). Ontology-based multiple choice question generation. KI-Künstliche Intelligenz, 30, 183-188. https://doi.org/10.1007/s13218-015-0405-9
Alves, C.B., Gierl, M.J., & Lai, H. (2010, April). Using automated item generation to promote test design and development [Paper presentation]. American Educational Research Association Annual Meeting, Denver, CO, United States.
Arksey, H., & O'malley, L. (2005). Scoping studies: towards a methodological framework. International Journal of Social Research Methodology, 8(1), 19 32. https://doi.org/10.1080/1364557032000119616
Attali, Y., Runge, A., LaFlair, G.T., Yancey, K., Goodwin, S., Park, Y., & Von Davier, A.A. (2022). The interactive reading task: Transformer-based automatic item generation. Frontiers in Artificial Intelligence, 5, 903077. https://doi.org/10.3389/frai.2022.903077

Berger, G., Rischewski, T., Chiruzzo, L., & Rosá, A. (2022). Generation of English question answer exercises from texts using transformers-based models. In 2022 IEEE Latin American Conference on Computational Intelligence (pp. 1-5). IEEE. https://doi.org/10.1109/LA-CCI54402.2022.9981171
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7-74. https://doi.org/10.1080/0969595980050102
Bulathwela, S., Muse, H., & Yilmaz, E. (2023). Scalable educational question generation with pre-trained language models. In International Conference on Artificial Intelligence in Education (pp. 327-339). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-36272-9_27
Bulut, O., & Yildirim-Erbasli, S.N. (2022). Automatic story and item generation for reading comprehension assessments with transformers. International Journal of Assessment Tools in Education, 9(Special Issue), 72-87. https://doi.org/10.21449/ijate.1124382
Bulut, O., Gorgun, G., Yildirim‐Erbasli, S.N., Wongvorachan, T., Daniels, L.M., Gao, Y., ... & Shin, J. (2023). Standing on the shoulders of giants: Online formative assessments as the foundation for predictive learning analytics models. British Journal of Educational Technology, 54(1), 19-39. https://doi.org/10.1111/bjet.13276
Ch, D.R., & Saha, S.K. (2018). Automatic multiple choice question generation from text: A survey. IEEE Transactions on Learning Technologies, 13(1), 14 25. https://doi.org/10.1109/TLT.2018.2889100
Chiang, S.H., Wang, S.C., & Fan, Y.C. (2024). Cdgp: Automatic cloze distractor generation based on pre trained language model. arXiv preprint arXiv:2403.10326. https://doi.org/10.18653/v1/2022.findings-emnlp.429
Chughtai, R., Azam, F., Anwar, M.W., But, W.H., & Farooq, M.U. (2022). A lecture centric automated distractor generation for post-graduate software engineering courses. In 2022 International Conference on Frontiers of Information Technology (FIT) (pp. 100-105). IEEE. https://doi.org/10.1109/FIT57066.2022.00028
Chung, H.L., Chan, Y.H., & Fan, Y.C. (2020). A BERT-based distractor generation scheme with multi tasking and negative answer training strategies. arXiv preprint arXiv:2010.05384. https://arxiv.org/abs/2010.05384
Dalby, D., & Swan, M. (2019). Using digital technology to enhance formative assessment in mathematics classrooms. British Journal of Educational Technology, 50(2), 832-845. https://doi.org/10.1111/bjet.12606
Dembitzer, L., Zelikovitz, S., & Kettler, R.J. (2017). Designing computer-based assessments: Multidisciplinary findings and student perspectives. International Journal of Educational Technology, 4(3), 20 31. https://educationaltechnology.net/ijet/index.php/ijet/article/view/47
Desai, T. (2021). Discourse parsing and its application to question generation [Unpublished dissertation]. The University of Texas at Dallas.
Devlin, J., Chang, M.W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805
Dijkstra, R., Genç, Z., Kayal, S., & Kamps, J. (2022). Reading comprehension quiz generation using generative pre-trained transformers. In S. Sosnovsky, P. Brusilovsky, & A. Lan (Eds.), Proceedings of the Fourth International Workshop on Intelligent Textbooks 2022 (pp. 4–7). CEUR-WS. http://ceur-ws.org/Vol-3192/
Drori, I., Zhang, S., Shuttleworth, R., Tang, L., Lu, A., Ke, E., ... & Strang, G. (2022). A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level. Proceedings of the National Academy of Sciences, 119(32), e2123433119. https://doi.org/10.1073/pnas.2123433119
Falcão, F., Costa, P., & Pêgo, J.M. (2022). Feasibility assurance: a review of automatic item generation in medical assessment. Advances in Health Sciences Education, 27(2), 405-425. https://doi.org/10.1007/s10459-022-10092-z
Femi, J.G., & Nayak, A.K. (2022). EQGTL: An Ensemble Model for Relevant Question Generation using Transfer Learning. In 2022 International Conference on Machine Learning, Computer Systems and Security (pp. 253-258). IEEE. https://doi.org/10.1109/MLCSS57186.2022.00054
Fuadi, M., & Wibawa, A.D. (2022). Automatic question generation from indonesian texts using text-to-text transformers. In 2022 International Conference on Electrical and Information Technology (IEIT) (pp. 84-89). IEEE. https://doi.org/10.1109/IEIT56384.2022.9967858
Fung, Y.C., Kwok, J.C.W., Lee, L.K., Chui, K.T., & U, L.H. (2020). Automatic question generation system for english reading comprehension. In Technology in Education. Innovations for Online Teaching and Learning: 5th International Conference, ICTE 2020, Macau, China, August 19-22, 2020, Revised Selected Papers 5 (pp. 136-146). Springer Singapore. https://doi.org/10.1007/978-981-33-4594-2_12
Fung, Y.C., Lee, L.K., & Chui, K.T. (2023). An automatic question generator for Chinese comprehension. Inventions, 8(1), 31. https://doi.org/10.3390/inventions8010031
Ghanem, B., Coleman, L.L., Dexter, J.R., von der Ohe, S.M., & Fyshe, A. (2022). Question generation for reading comprehension assessment by modeling how and what to ask. arXiv preprint arXiv:2204.02908. https://doi.org/10.48550/arXiv.2204.02908
Gierl, M.J., & Lai, H. (2012). The role of item models in automatic item generation. International Journal of Testing, 12(3), 273 298. https://doi.org/10.1080/15305058.2011.635830
Gierl, M.J., & Lai, H. (2015). Automatic item generation. In Handbook of test development (pp. 410-429). Routledge.
Gierl, M.J., & Lai, H. (2016). A process for reviewing and evaluating generated test items. Educational Measurement: Issues and Practice, 35(4), 6 20. https://doi.org/10.1111/emip.12129
Gierl, M.J., Lai, H., & Tanygin, V. (2021). Advanced methods in automatic item generation. Routledge.
Godslove, J.F., & Nayak, A.K. (2023). Generative model for formulating relevant questions and answers using transfer learning. In AIP Conference Proceedings (Vol. 2819, No. 1). AIP Publishing. https://doi.org/10.1063/5.0136892
Gopal, A. (2022). Automatic question generation for Hindi and Marathi. In 2022 International Conference on Advanced Learning Technologies (ICALT) (pp. 19-21). IEEE. https://doi.org/10.1109/ICALT55010.2022.00012
Goyal, R., Kumar, P., & Singh, V.P. (2023). Automated question and answer generation from texts using text-to-text transformers. Arabian Journal for Science and Engineering, 1-15. https://doi.org/10.1007/s13369-023-07840-7
Granić, A. (2022). Educational technology adoption: A systematic review. Education and Information Technologies, 27(7), 9725-9744. https://doi.org/10.1007/s10639-022-10951-7
Grover, K., Kaur, K., Tiwari, K., Rupali, & Kumar, P. (2021). Deep learning based question generation using t5 transformer. In Advanced Computing: 10th International Conference, IACC 2020, Panaji, Goa, India, December 5–6, 2020, Revised Selected Papers, Part I 10 (pp. 243-255). Springer Singapore. https://doi.org/10.1007/978-981-16-0401-0_18
Han, Z. (2022). Unsupervised multilingual distractor generation for fill-in-the-blank questions [Unpublished thesis]. Uppsala University.
Jiao, Y., Shridhar, K., Cui, P., Zhou, W., & Sachan, M. (2023). Automatic educational question generation with difficulty level controls. In International Conference on Artificial Intelligence in Education (pp. 476-488). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-36272-9_39
Kalpakchi, D., & Boye, J. (2021). BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset. arXiv preprint arXiv:2108.03973. https://doi.org/10.48550/arXiv.2108.03973
Kasakowskij, R., Kasakowskij, T., & Seidel, N. (2022). Generation of multiple true false questions. 20. Fachtagung Bildungstechnologien. https://doi.org/10.18420/delfi2022-026
Khandait, K., Bhura, S., & Asole, S.S. (2022). Automatic question generation through word vector synchronization using lamma. Indian Journal of Computer Science and Engineering, 13(4), 1083-1095. https://doi.org/10.21817/indjcse/2022/v13i4/221304046
Kosh, A.E., Simpson, M.A., Bickel, L., Kellogg, M., & Sanford‐Moore, E. (2019). A cost–benefit analysis of automatic item generation. Educational Measurement: Issues and Practice, 38(1), 48-53. https://doi.org/10.1111/emip.12237
Kumar, A., Kharadi, A., Singh, D., & Kumari, M. (2021). Automatic question-answer pair generation using deep learning. In 2021 Third International Conference on Inventive Research in Computing Applications (pp. 794 799). IEEE. https://doi.org/10.1109/ICIRCA51532.2021.9544654
Kumar, N.S., Mali, R., Ratnam, A., Kurpad, V., & Magapu, H. (2022). Identification and addressal of knowledge gaps in students. In 2022 3rd International Conference for Emerging Technology (pp. 1-6). IEEE. https://doi.org/10.1109/INCET54531.2022.9824483
Kumar, S., Chauhan, A., & Kumar C, P. (2022). Learning enhancement using question-answer generation for e-book using contrastive fine-tuned T5. In International Conference on Big Data Analytics (pp. 68 87). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-24094-2_5
Kumari, V., Keshari, S., Sharma, Y., & Goel, L. (2022). Context-based question answering system with suggested questions. In 2022 12th International Conference on Cloud Computing, Data Science & Engineering (pp. 368 373). IEEE. https://doi.org/10.1109/Confluence52989.2022.9734207
Kuo, C.Y., & Wu, H.K. (2013). Toward an integrated model for designing assessment systems: An analysis of the current status of computer-based assessments in science. Computers & Education, 68, 388-403. https://doi.org/10.1016/j.compedu.2013.06.002
Kurdi, G., Leo, J., Parsia, B., Sattler, U., & Al-Emari, S. (2020). A systematic review of automatic question generation for educational purposes. International Journal of Artificial Intelligence in Education, 30, 121-204. https://doi.org/10.1007/s40593-019-00186-y
Lai, H., Alves, C., & Gierl, M.J. (2009). Using automatic item generation to address item demands for CAT. In D.J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. www.psych.umn.edu/psylabs/CATCentral/
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942. https://doi.org/10.48550/arXiv.1909.11942
Lee, H., Chung, H.Q., Zhang, Y., Abedi, J., & Warschauer, M. (2020). The effectiveness and features of formative assessment in US K-12 education: A systematic review. Applied Measurement in Education, 33(2), 124 140. https://doi.org/10.1080/08957347.2020.1732383
Lim, Y.S. (2019). Students’ perception of formative assessment as an instructional tool in medical education. Medical Science Educator, 29(1), 255 263. https://doi.org/10.1007/s40670-018-00687-w
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692
Lu, O.H.T., Huang, A.Y.Q., Tsai, D.C.L., & Yang, S.J.H. (2021). Expert-authored and machine-generated short- answer questions for assessing students’ learning performance. Educational Technology & Society, 24(3), 159–173. https://www.jstor.org/stable/27032863
Maheen, F., Asif, M., Ahmad, H., Ahmad, S., Alturise, F., Asiry, O., & Ghadi, Y.Y. (2022). Automatic computer science domain multiple-choice questions generation based on informative sentences. PeerJ Computer Science, 8, e1010. https://doi.org/10.7717/peerj-cs.1010
Malhar, A., Sawant, P., Chhadva, Y., & Kurhade, S. (2022). Deep learning-based Answering Questions using T5 and Structured Question Generation System. In 2022 6th International Conference on Intelligent Computing and Control Systems (pp. 1544-1549). IEEE. https://doi.org/10.1109/ICICCS53718.2022.9788264
Mathur, A., & Suchithra, M. (2022). Application of abstractive summarization in multiple choice question generation. In 2022 International Conference on Computational Intelligence and Sustainable Engineering Solutions (pp. 409 413). IEEE. https://doi.org/10.1109/CISES54857.2022.9844396
Matsumori, S., Okuoka, K., Shibata, R., Inoue, M., Fukuchi, Y., & Imai, M. (2023). Mask and cloze: Automatic open cloze question generation using a masked language model. IEEE Access, 11, 9835-9850. https://doi.org/10.1109/ACCESS.2023.3239005
Maurya, K.K., & Desarkar, M.S. (2020). Learning to distract: A hierarchical multi-decoder network for automated generation of long distractors for multiple-choice questions for reading comprehension. In Proceedings of the 29th ACM international conference on information & knowledge management (pp. 1115 1124). https://doi.org/10.1145/3340531.3411997
Mazzullo, E., Bulut, O., Wongvorachan, T., & Tan, B. (2023). Learning Analytics in the Era of Large Language Models. Analytics, 2(4), 877 898. https://doi.org/10.3390/analytics2040046
Min, B., Ross, H., Sulem, E., Veyseh, A.P.B., Nguyen, T.H., Sainz, O., ... & Roth, D. (2023). Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys, 56(2), 1-40. https://doi.org/10.1145/3605943
Muse, H., Bulathwela, S., & Yilmaz, E. (2022). Pre-training with scientific text improves educational question generation. arXiv preprint arXiv:2212.03869. https://doi.org/10.48550/arXiv.2212.03869
Newton, P.E. (2007). Clarifying the purposes of educational assessment. Assessment in education, 14(2), 149-170. https://doi.org/10.1080/09695940701478321
Nguyen, H.A., Bhat, S., Moore, S., Bier, N., & Stamper, J. (2022). Towards generalized methods for automatic question generation in educational domains. In European conference on technology enhanced learning (pp. 272-284). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-16290-9_20
Nittala, S., Agarwal, P., Vishnu, R., & Shanbhag, S. (2022). Speaker Diarization and BERT-Based Model for Question Set Generation from Video Lectures. In Information and Communication Technology for Competitive Strategies ICT: Applications and Social Interfaces (pp. 441 452). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-19-0095-2_42
Offerijns, J., Verberne, S., & Verhoef, T. (2020). Better distractions: Transformer-based distractor generation and multiple-choice question filtering. arXiv preprint arXiv:2010.09598. https://doi.org/10.48550/arXiv.2010.09598
Pochiraju, D., Chakilam, A., Betham, P., Chimulla, P., & Rao, S.G. (2023). Extractive summarization and multiple-choice question generation using XLNet. In 2023 7th International Conference on Intelligent Computing and Control Systems (pp. 1001-1005). IEEE. https://doi.org/10.1109/ICICCS56967.2023.10142220
Pugh, D., De Champlain, A., Gierl, M., Lai, H., & Touchie, C. (2020). Can automated item generation be used to develop high quality MCQs that assess application of knowledge?. Research and Practice in Technology Enhanced Learning, 15, 1 13. https://doi.org/10.1186/s41039-020-00134-8
Qiu, X., Xue, H., Liang, L., Xie, Z., Liao, S., & Shi, G. (2021). Automatic generation of multiple-choice cloze-test questions for lao language learning. In 2021 International Conference on Asian Language Processing (pp. 125 130). IEEE. https://doi.org/10.1109/IALP54817.2021.9675153
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140), 1-67. https://doi.org/10.48550/arXiv.1910.10683
Raina, V., & Gales, M. (2022). Multiple-choice question generation: Towards an automated assessment framework. arXiv preprint arXiv:2209.11830. https://doi.org/10.48550/arXiv.2209.11830
Ratcheva, M.G., Navale, R., & Desai, B.C. (2022). An online MCQ sub-system for CrsMgr. In Proceedings of the 26th International Database Engineered Applications Symposium (pp. 128-133). https://doi.org/10.1145/3548785.3548789
Rodriguez-Torrealba, R., Garcia-Lopez, E., & Garcia-Cabot, A. (2022). End-to-end generation of multiple-xhoice questions using text-to-text transfer transformer models. Expert Systems with Applications, 208, 118258. https://doi.org/10.1016/j.eswa.2022.118258
Rudner, L.M. (2009). Implementing the graduate management admission test computerized adaptive test. In Elements of adaptive testing (pp. 151-165). Springer New York. https://doi.org/10.1007/978-0-387-85461-8_8
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. https://doi.org/10.48550/arXiv.1910.01108
Sayin, A., & Gierl, M. (2024). Using OpenAI GPT to generate reading comprehension items. Educational Measurement: Issues and Practice, 43(1), 5 18. https://doi.org/10.1111/emip.12590
Shan, J., Nishihara, Y., Maeda, A., & Yamanishi, R. (2022). Question generation for reading comprehension test complying with types of question. Journal of Information Science & Engineering, 38(3). https://doi.org/10.6688/JISE.202205_38(3).0005
Shan, J., Nishihara, Y., Yamanishi, R., & Maeda, A. (2019). Question generation for reading comprehension of language learning test: A method using Seq2Seq approach with transformer model. In 2019 International Conference on Technologies and Applications of Artiﬁcial Intelligence (pp. 1-6). IEEE. https://doi.org/10.1109/TAAI48200.2019.8959903
Shridhar, K., Macina, J., El-Assady, M., Sinha, T., Kapur, M., & Sachan, M. (2022). Automatic generation of socratic subquestions for teaching math word problems. arXiv preprint arXiv:2211.12835. https://doi.org/10.48550/arXiv.2211.12835
Singh, J., McCann, B., Socher, R., & Xiong, C. (2019). BERT is not an interlingua and the bias of tokenization. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019) (pp. 47-55). https://doi.org/10.18653/v1/D19-6106
Spector, J.M., Ifenthaler, D., Samspon, D., Yang, L., Mukama, E., Warusavitarana, A., … Gibson, D.C. (2016). Technology enhanced formative assessment for 21st century learning. Educational Technology & Society, 19(3), 58 71. https://www.jstor.org/stable/jeductechsoci.19.3.58
Srivastava, M., & Goodman, N. (2021). Question generation for adaptive education. arXiv preprint arXiv:2106.04262. https://doi.org/10.48550/arXiv.2106.04262
Steuer, T., Filighera, A., & Rensing, C. (2020). Exploring artificial jabbering for automatic text comprehension question generation. In Addressing Global Challenges and Quality Education: 15th European Conference on Technology Enhanced Learning, EC-TEL 2020, Heidelberg, Germany, September 14–18, 2020, Proceedings 15 (pp. 1-14). Springer International Publishing. https://doi.org/10.1007/978-3-030-57717-9_1
Tsai, D.C., Chang, W., & Yang, S. (2021). Short answer questions generation by Fine-Tuning BERT and GPT-2. In Proceedings of the 29th International Conference on Computers in Education Conference (Vol. 64). https://icce2021.apsce.net/wp-content/uploads/2021/12/ICCE2021-Vol.II-PP.-508-514.pdf
von Davier, M. (2019). Training Optimus prime, MD: Generating medical certification items by fine-tuning OpenAI's gpt2 transformer model. arXiv preprint arXiv:1908.08594. https://doi.org/10.48550/arXiv.1908.08594
Vu, N., & Van Nguyen, K. (2022). Enhancing Vietnamese question generation with reinforcement learning. In Asian Conference on Intelligent Information and Database Systems (pp. 559 570). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-21743-2_45
Wang, B., Yao, T., Chen, W., Xu, J., & Wang, X. (2021). Multi-lingual question generation with language agnostic language model. In Findings of the Association for Computational Linguistics: ACL IJCNLP 2021 (pp. 2262 2272). https://aclanthology.org/2021.findings-acl.199.pdf
Wang, H.C., Maslim, M., & Kan, C.H. (2023). A question–answer generation system for an asynchronous distance learning platform. Education and Information Technologies, 28(9), 12059-12088. https://doi.org/10.1007/s10639-023-11675-y
Wang, Z., Valdez, J., Basu Mallick, D., & Baraniuk, R.G. (2022). Towards human-like educational question generation with large language models. In International conference on artificial intelligence in education (pp. 153-166). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-11644-5_13
Wylie, E.C., & Lyon, C.J. (2015). The fidelity of formative assessment implementation: Issues of breadth and quality. Assessment in Education: Principles, Policy & Practice, 22(1), 140-160. https://doi.org/10.1080/0969594X.2014.990416
Xie, J., Peng, N., Cai, Y., Wang, T., & Huang, Q. (2021). Diverse distractor generation for constructing high-quality multiple choice questions. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 280 291. https://doi.org/10.1109/TASLP.2021.3138706
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., & Le, Q.V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32. https://dl.acm.org/doi/10.5555/3454287.3454804
Yen, Y.-C., Ho, R.-G., Liao, W.-W., & Chen, L.-J. (2012). Reducing the impact of inappropriate items on reviewable computerized adaptive testing. Educational Technology & Society, 15(2), 231–243. https://www.jstor.org/stable/jeductechsoci.15.2.231
Zhang, C. (2023). Automatic Generation of Multiple-Choice Questions. arXiv preprint arXiv: 2303.14576v1. https://doi.org/10.48550/arXiv.2303.14576
Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., ... & Wen, J.R. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223. https://doi.org/10.48550/arXiv.2303.18223
Zhao, Z., Hou, Y., Wang, D., Yu, M., Liu, C., & Ma, X. (2022). Educational question generation of children storybooks via question type distribution learning and event-centric summarization. arXiv preprint arXiv:2203.14187. https://doi.org/10.48550/arXiv.2203.14187

Details

Primary Language

English

Subjects

Measurement Theories and Applications in Education and Psychology , Measurement and Evaluation in Education (Other)

Journal Section

Research Article

Authors

Bin Tan ^*
0000-0001-6717-5620
Canada

Nour Armoush
0009-0008-2310-5098
Canada

Elisabetta Mazzullo This is me
0009-0008-4847-9934
Canada

Okan Bulut
0000-0001-5853-1267
Canada

Mark Gierl
0000-0002-2653-1761
Canada

Early Pub Date

May 1, 2025

Publication Date

June 1, 2025

Submission Date

December 16, 2024

Acceptance Date

April 28, 2025

Published in Issue

Year 2025 Volume: 12 Number: 2

DOI

https://doi.org/10.21449/ijate.1602294

IZ

https://izlik.org/JA93DW28YD

Cite

RIS / Bibtex

APA

Tan, B., Armoush, N., Mazzullo, E., Bulut, O., & Gierl, M. (2025). A review of automatic item generation techniques leveraging large language models. International Journal of Assessment Tools in Education, 12(2), 317-340. https://doi.org/10.21449/ijate.1602294

Pushing the boundaries of generative AI: multiple-choice question generation and assessment performance within medical education

Journal of Health Sciences and Medicine

https://doi.org/10.32322/jhsm.1842373

Research challenges and future perspectives for e-assessment technologies in higher education

i-com

https://doi.org/10.1515/icom-2026-0008

Artificial Intelligence in Language Assessment Research: A Scoping Review

Research Synthesis in Applied Linguistics

https://doi.org/10.1080/29984475.2026.2633389

A review of automatic item generation techniques leveraging large language models

A review of automatic item generation techniques leveraging large language models

Abstract

Keywords

Büyük Dil Modelleri Kullanan Otomatik Madde Üretimi Yöntemlerinin İncelenmesi

Öz

Anahtar Kelimeler

References

Details

Primary Language

Subjects

Journal Section

Authors

Early Pub Date

Publication Date

Submission Date

Acceptance Date

Published in Issue

DOI

IZ

Cite

Cited By

Harnessing Generative AI for Assessment Item Development: Comparing AI‐Generated and Human‐Authored Items

Editorial: Special Issue on Artificial Intelligence and Machine Learning in Educational Measurement (Part 3)

Expanding the Team: Integrating Generative Artificial Intelligence into the Assessment Development Process

From human artefact to machine output: automating the “art” of psychological measurement

Mini-review: considering impacts of artificial intelligence on the development of measurement scales

Pushing the boundaries of generative AI: multiple-choice question generation and assessment performance within medical education

Research challenges and future perspectives for e-assessment technologies in higher education

Artificial Intelligence in Language Assessment Research: A Scoping Review