Research Article
BibTex RIS Cite

Integrating Metadiscourse Analysis with Transformer-Based Models for Enhancing Construct Representation and Discourse Competence Assessment in L2 Writing: A Systemic Multidisciplinary Approach

Year 2024, Volume: 15 Issue: Special Issue, 318 - 347, 30.12.2024
https://doi.org/10.21031/epod.1531269

Abstract

In recent years, large-scale language test providers have developed or adapted automated essay scoring systems (AESS) to score L2 writing essays. While the benefits of using AESS are clear, they are not without limitations, such as over-reliance on frequency counts of vocabulary and grammar variables. Discourse competence is one important aspect of L2 writing yet to be fully explored in AEE application. Evidence of discourse competence can be seen in the use of Metadiscourse Markers (MDM) to produce reader-friendly texts. The article presents a multidisciplinary study to explore the feasibility of expanding the construct representation of automated scoring models to assess discourse competence in L2 writing. Combining machine learning, automated textual analysis and corpus-linguistic methods to examine 2000 scripts across two tasks and five proficiency levels, the study investigates (1) in addition to frequency and range, whether accuracy of MDM is worth pursuing as a predictive feature in L2 writing, and (2) how identification and classification of MDM use might be fed into developing an automated scoring model using machine learning techniques. The contributions of this study are three-fold. Firstly, it offers valuable insights within the context of Explainable AI. By integrating MDM usage and accuracy into the scoring framework, this research moves beyond frequency-based evaluation. This study also makes significant contributions to the current understanding of L2 writing development that even lower-proficiency learners exhibit evidence of discourse competence through their accurate use of MDMs as well as their choice of MDMs in response to genre. From the perspective of expanding the construct representation in automated scoring systems, this study provides a critical examination of the limitations of many AEE models, which have heavily relied on vocabulary and grammar features. By exploring the feasibility of incorporating MDMs as predictive features, this research demonstrates the potential for construct expansion of L2 AEE. The results would support test providers in developing competence tests in various contexts and domains including manufacturing, medicine and so on.

References

  • Adel, A. (2006). Metadiscourse in L1 and L2 English. John Benjamins Publishing. https://doi.org/10.1075/scl.24
  • Bachman, L. and Palmer, A. (2010). Language assessment in practice. Oxford University Press.
  • Barkaoui, K. (2016). What changes and what doesn’t? An examination of changes in the linguistic characteristics of IELTS repeaters’ Writing Task 2 scripts. IELTS Research Reports Online Series, vol. 2016/3, 1–55.
  • Bax, S., D. Waller and Nakatsuhara, F. (2019). Researching L2 writers’ use of MDM at intermediate and advanced levels, System, 83, 79-95. https://doi.org/10.1016/j.system.2019.02.010
  • Breiman (2001). Random Forests, Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
  • Brezina V. & Gablasova, D. (2015) Is There a Core General Vocabulary? Introducing the New General Service List, Applied Linguistics, 36(1), 1-22, https://doi.org/10.1093/applin/amt018
  • Burneikaitė, N. (2008) “Metadiscourse in Linguistics Master’s Theses in English L1 and L2”, Kalbotyra, 59, pp. 38–47. doi:10.15388/Klbt.2008.7591.
  • Camiciottoli, B. C. (2003). Metadiscourse and ESP reading comprehension. Reading In A Foreign Language, 15(1), 28–44. https://nflrc.hawaii.edu/rfl/item/69
  • Carlsen, C. (2010). Discourse connectives across CEFR-levels: A corpus based study. In I. Bartning, M. Maatin, & I. Vedder (Eds.), Communicative proficiency and linguistic development: Intersections between SLA and Language testing research (pp. 191-210). European Second Language Association.
  • Chapelle, C. A. and Chung, Y-R. (2010). The Promise of NLP and speech processing technologies in language assessment. Language Testing, 27(3), 301–315. https://doi.org/10.1177/026553221036440
  • Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer W. P. (2002). SMOTE: synthetic minority oversampling technique. Journal of Artificial Intelligence Research 16, 321-357. https://doi.org/10.1613/jair.953
  • Council of Europe (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge University Press. https://doi.org/10.1017/CHOL9780521221283
  • Freund, Y. and Schapire, R. E. (1999). A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14 (5), 771-780. http://www.yorku.ca/gisweb/eats4400/boost.pdf
  • Crompton, P. (2012). Characterising hedging in undergraduate essays by Middle-Eastern students. Asian ESP Journal, 8(2), 55-78. http://asian-esp-journal.com/wp-content/uploads/2013/11/Volume-8-2.pdf
  • Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of deep Bidirectional Transformers for language understanding. https://arxiv.org/pdf/1810.04805.pdf
  • Jarvis, S. (2013). ‘Defining and measuring lexical diversity.’ in S. Jarvis and M.H. Daller (eds.) Vocabulary knowledge: human ratings and automated measures. John Benjamins, pp. 13-44.
  • He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1322-1328). IEEE. https//doi.org/10.1109/IJCNN.2008.4633969
  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  • Hyland, K. (2005). Metadiscourse: Exploring Interaction in Writing. Bloomsbury Publishing.
  • Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T. Y. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems, 30 (NIPS 2017). https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa- Abstract.html
  • Knoch, U., Macqueen, S., & O’Hagan, S. (2014). An Investigation of the Effect of Task Type on the Discourse Produced by Students at Various Score Levels in the TOEFL iBT ® Writing Test. ETS Research Report Series, 23, 14–43. https://doi.org/10.1002/ets2.12038
  • Lee, J. J., & Deakin, L. (2016). Interactions in L1 and L2 undergraduate student writing: Interactional metadiscourse in successful and less-successful argumentative essays. Journal of Second Language Writing, 33, 21-34. https://doi.org/10.1016/j.jslw.2016.06.004
  • Lialin, V., Deshpande, V. & Rumshisky, A. (2023). Scaling down to scale up: A guide to parameter-efficient fine-tuning. arXiv preprint arXiv:2303.15647. https://doi.org/10.48550/arXiv.2303.15647
  • Lin, W., Hasenstab, K., Chnha, G. M. & Schwartzman, A. (2020) Comparison of handcrafted features and convolutional neural networks for liver MR image adequacy assessment, Scientific Reports, 10, 20336. https://doi.org/10.1038/s41598-020-77264-y
  • Lin, T., Wang, Y., Liu, X. and Qiu, X. (2022). A survey of transformers. AI open, 3, 111-132. https://doi.org/10.48550/arXiv.2106.04554
  • Liu, J., Xu, Y. and Zhu, Y. (2019) ‘Automated Essay Scoring based on Two-Stage Learning’, arXiv [cs.CL]. https://doi.org/10.48550/arXiv.1901.07744
  • Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L. and Stoyanov, V., (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692
  • O’Loughlin, K. (2013). Investigating lexical validity in the Pearson Test of English Academic. Pearson Research Reports, p.1-21. https://www.pearsonpte.com/ctfassets/ yqwtwibiobs4/6iHE8HxuGJT3OMAgFoXxwV/88d7be6a2d43a9274d12fe7de838fef0/Investigati ng_lexical_validity_in_the_Pearson_Test_of_English_Academic-_Kieran_O___Loughlin.pdf
  • O'Sullivan, B., Dunlea, J., Spiby, R., Westbrook, C., and Dunn, K. (2020). Aptis General Technical Manual Version 2.2. https://www.britishcouncil.org/sites/default/files/aptis_technical_manual_v_2.2_final.pdf
  • Owen, N., Shrestha, P. and Bax, S. (2021). Researching lexical thresholds and lexical profiles across the Common European Framework of Reference for Language (CEFR) levels assessed in the Aptis test. ARAGs Research Reports Online, AR- G/2021/1.
  • Sanford, S. (2012). A comparison of metadiscourse markers and writing quality in adolescent written narratives. Missoula: Unpublished MSc thesis. The University of Montana.
  • Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. https://doi.org/10.48550/arXiv.1910.01108
  • Schiffrin, D., Tannen, D., & Hamilton, H. (2001). The handbook of discourse analysis. Blackwell Publishers Ltd.
  • Zhang, H. (2004). The optimality of Naive Bayes. https://typeset.io/papers/the-optimality-of-naive-bayesr4zge3fp91
  • Zhang, Z. (2016). Introduction to machine learning: k-nearest neighbors. Annals of Translational Medicine, 4(11), 218. https://doi.org/10.21037/atm.2016.03.37
Year 2024, Volume: 15 Issue: Special Issue, 318 - 347, 30.12.2024
https://doi.org/10.21031/epod.1531269

Abstract

References

  • Adel, A. (2006). Metadiscourse in L1 and L2 English. John Benjamins Publishing. https://doi.org/10.1075/scl.24
  • Bachman, L. and Palmer, A. (2010). Language assessment in practice. Oxford University Press.
  • Barkaoui, K. (2016). What changes and what doesn’t? An examination of changes in the linguistic characteristics of IELTS repeaters’ Writing Task 2 scripts. IELTS Research Reports Online Series, vol. 2016/3, 1–55.
  • Bax, S., D. Waller and Nakatsuhara, F. (2019). Researching L2 writers’ use of MDM at intermediate and advanced levels, System, 83, 79-95. https://doi.org/10.1016/j.system.2019.02.010
  • Breiman (2001). Random Forests, Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
  • Brezina V. & Gablasova, D. (2015) Is There a Core General Vocabulary? Introducing the New General Service List, Applied Linguistics, 36(1), 1-22, https://doi.org/10.1093/applin/amt018
  • Burneikaitė, N. (2008) “Metadiscourse in Linguistics Master’s Theses in English L1 and L2”, Kalbotyra, 59, pp. 38–47. doi:10.15388/Klbt.2008.7591.
  • Camiciottoli, B. C. (2003). Metadiscourse and ESP reading comprehension. Reading In A Foreign Language, 15(1), 28–44. https://nflrc.hawaii.edu/rfl/item/69
  • Carlsen, C. (2010). Discourse connectives across CEFR-levels: A corpus based study. In I. Bartning, M. Maatin, & I. Vedder (Eds.), Communicative proficiency and linguistic development: Intersections between SLA and Language testing research (pp. 191-210). European Second Language Association.
  • Chapelle, C. A. and Chung, Y-R. (2010). The Promise of NLP and speech processing technologies in language assessment. Language Testing, 27(3), 301–315. https://doi.org/10.1177/026553221036440
  • Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer W. P. (2002). SMOTE: synthetic minority oversampling technique. Journal of Artificial Intelligence Research 16, 321-357. https://doi.org/10.1613/jair.953
  • Council of Europe (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge University Press. https://doi.org/10.1017/CHOL9780521221283
  • Freund, Y. and Schapire, R. E. (1999). A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14 (5), 771-780. http://www.yorku.ca/gisweb/eats4400/boost.pdf
  • Crompton, P. (2012). Characterising hedging in undergraduate essays by Middle-Eastern students. Asian ESP Journal, 8(2), 55-78. http://asian-esp-journal.com/wp-content/uploads/2013/11/Volume-8-2.pdf
  • Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of deep Bidirectional Transformers for language understanding. https://arxiv.org/pdf/1810.04805.pdf
  • Jarvis, S. (2013). ‘Defining and measuring lexical diversity.’ in S. Jarvis and M.H. Daller (eds.) Vocabulary knowledge: human ratings and automated measures. John Benjamins, pp. 13-44.
  • He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1322-1328). IEEE. https//doi.org/10.1109/IJCNN.2008.4633969
  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  • Hyland, K. (2005). Metadiscourse: Exploring Interaction in Writing. Bloomsbury Publishing.
  • Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T. Y. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems, 30 (NIPS 2017). https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa- Abstract.html
  • Knoch, U., Macqueen, S., & O’Hagan, S. (2014). An Investigation of the Effect of Task Type on the Discourse Produced by Students at Various Score Levels in the TOEFL iBT ® Writing Test. ETS Research Report Series, 23, 14–43. https://doi.org/10.1002/ets2.12038
  • Lee, J. J., & Deakin, L. (2016). Interactions in L1 and L2 undergraduate student writing: Interactional metadiscourse in successful and less-successful argumentative essays. Journal of Second Language Writing, 33, 21-34. https://doi.org/10.1016/j.jslw.2016.06.004
  • Lialin, V., Deshpande, V. & Rumshisky, A. (2023). Scaling down to scale up: A guide to parameter-efficient fine-tuning. arXiv preprint arXiv:2303.15647. https://doi.org/10.48550/arXiv.2303.15647
  • Lin, W., Hasenstab, K., Chnha, G. M. & Schwartzman, A. (2020) Comparison of handcrafted features and convolutional neural networks for liver MR image adequacy assessment, Scientific Reports, 10, 20336. https://doi.org/10.1038/s41598-020-77264-y
  • Lin, T., Wang, Y., Liu, X. and Qiu, X. (2022). A survey of transformers. AI open, 3, 111-132. https://doi.org/10.48550/arXiv.2106.04554
  • Liu, J., Xu, Y. and Zhu, Y. (2019) ‘Automated Essay Scoring based on Two-Stage Learning’, arXiv [cs.CL]. https://doi.org/10.48550/arXiv.1901.07744
  • Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L. and Stoyanov, V., (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692
  • O’Loughlin, K. (2013). Investigating lexical validity in the Pearson Test of English Academic. Pearson Research Reports, p.1-21. https://www.pearsonpte.com/ctfassets/ yqwtwibiobs4/6iHE8HxuGJT3OMAgFoXxwV/88d7be6a2d43a9274d12fe7de838fef0/Investigati ng_lexical_validity_in_the_Pearson_Test_of_English_Academic-_Kieran_O___Loughlin.pdf
  • O'Sullivan, B., Dunlea, J., Spiby, R., Westbrook, C., and Dunn, K. (2020). Aptis General Technical Manual Version 2.2. https://www.britishcouncil.org/sites/default/files/aptis_technical_manual_v_2.2_final.pdf
  • Owen, N., Shrestha, P. and Bax, S. (2021). Researching lexical thresholds and lexical profiles across the Common European Framework of Reference for Language (CEFR) levels assessed in the Aptis test. ARAGs Research Reports Online, AR- G/2021/1.
  • Sanford, S. (2012). A comparison of metadiscourse markers and writing quality in adolescent written narratives. Missoula: Unpublished MSc thesis. The University of Montana.
  • Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. https://doi.org/10.48550/arXiv.1910.01108
  • Schiffrin, D., Tannen, D., & Hamilton, H. (2001). The handbook of discourse analysis. Blackwell Publishers Ltd.
  • Zhang, H. (2004). The optimality of Naive Bayes. https://typeset.io/papers/the-optimality-of-naive-bayesr4zge3fp91
  • Zhang, Z. (2016). Introduction to machine learning: k-nearest neighbors. Annals of Translational Medicine, 4(11), 218. https://doi.org/10.21037/atm.2016.03.37
There are 35 citations in total.

Details

Primary Language English
Subjects Testing, Assessment and Psychometrics (Other)
Journal Section Articles
Authors

Sathena Chan 0000-0002-7852-6737

Manoranjan Sathyamurthy 0000-0001-8928-2689

Chihiro Inoue 0000-0003-1927-6923

Michael Bax 0000-0002-2753-1990

Johnathan Jones 0000-0003-4158-7971

John Oyekan 0000-0001-6578-9928

Publication Date December 30, 2024
Submission Date August 13, 2024
Acceptance Date December 16, 2024
Published in Issue Year 2024 Volume: 15 Issue: Special Issue

Cite

APA Chan, S., Sathyamurthy, M., Inoue, C., Bax, M., et al. (2024). Integrating Metadiscourse Analysis with Transformer-Based Models for Enhancing Construct Representation and Discourse Competence Assessment in L2 Writing: A Systemic Multidisciplinary Approach. Journal of Measurement and Evaluation in Education and Psychology, 15(Special Issue), 318-347. https://doi.org/10.21031/epod.1531269