Kanıta Dayalı Yazılım Mühendisliği ile Özel NER Üzerine Sistematik Bir Literatür İncelemesi
Year 2023,
Volume: 1 Issue: 2, 75 - 81, 31.12.2023
Hakan Kekül
,
Abdulkadir Şeker
Abstract
Verilerin önemi artmaya devam ettikçe, veri analiz yöntemlerinin önemi de artmaktadır. Halihazırda çeşitli modeller uygulanmakta ve sürekli yeni modeller önerilmektedir. Bu çalışma kapsamında, bir veri analiz modeli olan Adlandırılmış-Varlık Tanıma üzerine detaylı bir inceleme gerçekleştirdik. Analiz yöntemi olarak uzun yıllardır başarıyla kullanılan Kanıta Dayalı Yazılım Mühendisliği yöntemini uyguladık. Çalışmada, bu yöntemle belirlenen 114 farklı araştırma makalesi arasından seçilen 38 makale analiz edildi. Analiz edilen verilerin detaylı bir sunumu yapılmıştır. Çalışma, NER kullanan yöntemler arasında en etkili olanı belirlemeyi amaçlamıştır. Analiz, BERT'in NER çalışmalarında en başarılı yöntem olduğunu göstermektedir. "Haberler" alanının en fazla sayıda NER veri kümesi içerdiği tespit edilmiştir. Çalışma ayrıca tespit edilen diğer yöntemler ve etki alanları hakkında da detaylı bilgi vermektedir. Özgün ve kapsamlı bir rehber olan bu çalışma, alanla ilgilenenler için mükemmel bir kaynak niteliğindedir.
References
- [1] I. H. Sarker, “Deep Learning: A Comprehensive” Overview on Techniques, Taxonomy, Applications and Research Directions,” SN Comput Sci, vol. 2, no. 6, pp. 1–20, Nov. 2021, doi: 10.1007/S42979-021-00815-1/FIGURES/6.
- [2] Z. Nasar, S. W. Jaffry, and M. K. Malik, “Named Entity Recognition and Relation Extraction,” ACM Computing Surveys (CSUR), vol. 54, no. 1, Feb. 2021, doi: 10.1145/3445965.
- [3] B. Jehangir, S. Radhakrishnan, and R. Agarwal, “A survey on Named Entity Recognition — datasets, tools, and methodologies,” Natural Language Processing Journal, vol. 3, p. 100017, Jun. 2023, doi: 10.1016/J.NLP.2023.100017.
- [4] J. Li, A. Sun, J. Han, and C. Li, “A Survey on Deep Learning for Named Entity Recognition,” IEEE Trans Knowl Data Eng, vol. 34, no. 1, pp. 50–70, Jan. 2022, doi: 10.1109/TKDE.2020.2981314.
- [5] H. Shelar, G. Kaur, N. Heda, and P. Agrawal, “Named Entity Recognition Approaches and Their Comparison for Custom NER Model,” 2020, doi: 10.1080/0194262X.2020.1759479.
- [6] L. Stepanyan, “Automated Custom Named Entity Recognition and Disambiguation,” International Journal of Signal Processing, vol. 5, 2020, Accessed: Nov. 12, 2023. [Online]. Available: http://iaras.org/iaras/journals/ijsp
- [7] R. Ramachandran and K. Arutchelvan, “Named entity recognition on bio-medical literature documents using hybrid based approach,” J Ambient Intell Humaniz Comput, vol. 1, pp. 1–10, Mar. 2021, doi: 10.1007/S12652-021-03078-Z/FIGURES/5.
- [8] S. Surana, J. Chekkala, and P. Bihani, “Chatbot based Crime Registration and Crime Awareness System using a custom Named Entity Recognition Model for Extracting Information from Complaints,” International Research Journal of Engineering and Technology, 2021, Accessed: Nov. 12, 2023. [Online]. Available: www.irjet.net
- [9] G. Veena, V. Kanjirangat, and D. Gupta, “AGRONER: An unsupervised agriculture named entity recognition using weighted distributional semantic model,” Expert Syst Appl, vol. 229, p. 120440, Nov. 2023, doi: 10.1016/J.ESWA.2023.120440.
- [10] O. Ozcelik and C. Toraman, “Named entity recognition in Turkish: A comparative study with detailed error analysis,” 2022, doi: 10.1016/j.ipm.2022.103065.
- [11] S. Srivastava, B. Paul, and D. Gupta, “Study of Word Embeddings for Enhanced Cyber Security Named Entity Recognition,” Procedia Comput Sci, vol. 218, pp. 449–460, Jan. 2023, doi: 10.1016/J.PROCS.2023.01.027.
- [12] D. L. Sackett, W. M. C. Rosenberg, J. A. M. Gray, R. B. Haynes, and W. S. Richardson, “Evidence based medicine: what it is and what it isn’t. 1996.,” Clinical orthopaedics and related research, vol. 455, no. 7023. British Medical Journal Publishing Group, pp. 3–5, 2007. doi: 10.1136/bmj.312.7023.71.
- [13] D. Berry, W. T.-I. T. on Software, and undefined 2003, “Comments on" Formal methods application: an empirical tale of software development",” ieeexplore.ieee.org, Accessed: Mar. 10, 2023. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/1205183/
- [14] M. Jorgensen, … T. D.-11th I. I., and undefined 2005, “Teaching evidence-based software engineering to university students,” ieeexplore.ieee.org.
- [15] B. A. Kitchenham, T. Dybå, and M. Jørgensen, “Evidence-based software engineering,” in Proceedings - International Conference on Software Engineering, 2004, pp. 273–281. doi: 10.1109/icse.2004.1317449.
- [16] T. Dybå, B. A. Kitchenham, and M. Jorgensen, “Evidence-based software engineering for practitioners,” IEEE Softw, vol. 22, no. 1, pp. 58–65, Jan. 2005, doi: 10.1109/MS.2005.6.
- [17] E. F. Tjong, K. Sang, and F. De Meulder, “Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition,” 2003, Accessed: Nov. 22, 2023. [Online]. Available: http://lcg-www.uia.ac.be/conll2003/ner/
- [18] J. Li et al., “BioCreative V CDR task corpus: a resource for chemical disease relation extraction,” Database, vol. 2016, p. 68, 2016, doi: 10.1093/database/baw068.
- [19] J.-D. Kim, Y. Tateisi, J.-D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii, “GENIA corpus-A semantically annotated corpus for bio-textmining,” BIOINFORMATICS, vol. 19, pp. 180–182, 2003, doi: 10.1093/bioinformatics/btg1023.
- [20] Y. Peng, S. Yan, and Z. Lu, “Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets”, Accessed: Nov. 22, 2023. [Online]. Available: https://biocreative.
- [21] Y. Luan, L. He, M. Ostendorf, and H. Hajishirzi, “Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction”, Accessed: Nov. 22, 2023. [Online]. Available:
http://labs.semanticscholar.
- [22] L. Derczynski, E. Nichols, and M. Van Erp, “Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition,” pp. 140–147, 2017, Accessed: Nov. 22, 2023. [Online]. Available: https://stackexchange.com
- [23] R. Weischedel et al., “Ontonotes release 5.0 ldc2013t19,” Linguistic Data Consortium, Philadelphia, PA, vol. 23, p. 170, 2013.
- [24] E. F. Tjong Kim Sang, “Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition,” in COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002), 2002. [Online]. Available: https://aclanthology.org/W02-2024
- [25] C. Walker, S. Strassel, J. Medero, and K. Maeda, “Ace 2005 multilingual training corpus-linguistic data consortium,” URL: https://catalog. ldc. upenn. edu/LDC2006T06, 2005.
- [26] N. Ding et al., “Few-NERD:A Few-shot Named Entity Recognition Dataset,” in ACL-IJCNLP, 2021.
- [27] A. Mitchell, S. Strassel, S. Huang, and R. Zakhary, “Ace 2004 multilingual training corpus,” Linguistic Data Consortium, Philadelphia, vol. 1, p. 1, 2005.
- [28] L. Derczynski et al., “Analysis of Named Entity Recognition and Linking for Tweets,” Oct. 2014, doi: 10.1016/j.ipm.2014.10.006.
- [29] “ACE 2004 Multilingual Training Corpus - Linguistic Data Consortium.” Accessed: Nov. 22, 2023. [Online]. Available: https://catalog.ldc.upenn.edu/LDC2005T09
- [30] L. Derczynski, K. Bontcheva, and I. Roberts, “Broad Twitter Corpus: A Diverse Named Entity Recognition Resource,” pp. 1169–1179.
- [31] Z. Liu et al., “CrossNER: Evaluating Cross-Domain Named Entity Recognition,” 2020.
- [32] L. Luo, P.-T. Lai, C.-H. Wei, C. N. Arighi, and Z. Lu, “BioRED: a rich biomedical relation extraction dataset,” Brief Bioinform, vol. 2022, no. 5, pp. 1–12, doi: 10.1093/bib/bbac282.
- [33] R. Bossy, L. Deléger, E. Chaix, M. Ba, and C. Nédellec, “Bacteria biotope at BioNLP open shared tasks 2019,” in Proceedings of the 5th workshop on BioNLP open shared tasks, 2019, pp. 121–131.
- [34] E. F. Tjong, K. Sang, and S. Buchholz, “Introduction to the CoNLL-2000 Shared Task: Chunking,” pp. 127–132, 2000, Accessed: Nov. 23, 2023. [Online]. Available: www.uia.ac.be/conll2000/chunking/
- [35] A. Zeldes, “The GUM corpus: creating multilayer resources in the classroom,” Lang Resour Eval, vol. 51, no. 3, pp. 581–612, Sep. 2017, doi: 10.1007/s10579-016-9343-x.
- [36] M. Gritta, M. T. Pilehvar, and N. Collier, “A Pragmatic Guide to Geoparsing Evaluation,” Oct. 2018.
- [37] L. Charnay and ; France-Telecom R&d, “The French MEDIA/EVALDA project: the evaluation of the understanding capability of Spoken Language Dialogue Systems,” J. Goulian, no. 2.
- [38] S. Tedeschi, V. Maiorca, N. Campolungo, F. Cecconi, and R. Navigli, “WikiNEuRal: Combined neural and knowledge-based silver data creation for multilingual NER,” in Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, pp. 2521–2533.
- [39] L. Gessler, S. Peng, Y. Liu, Y. Zhu, S. Behzad, and A. Zeldes, “AMALGUM -- A Free, Balanced, Multilayer English Web Corpus,” Jun. 2020.
- [40] M. Krallinger et al., “The CHEMDNER corpus of chemicals and drugs and its annotation principles,” J Cheminform, vol. 7, no. S1, p. S2, Dec. 2015, doi: 10.1186/1758-2946-7-S1-S2.
- [41] N. Milosevic, G. Kalappa, H. Dadafarin, M. Azimaee, and G. Nenadic, “MASK: A flexible framework to facilitate de-identification of clinical texts,” May 2020.
- [42] C. Zhang et al., “Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction,” Oct. 2023.
- [43] R. Guan et al., “FindVehicle and VehicleFinder: A NER dataset for natural language-based vehicle retrieval and a keyword-based cross-modal vehicle retrieval system,” Apr. 2023.
- [44] A. Mhaske et al., “Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages”.
[45] D. L. Mowery et al., “Task 2: ShARe/CLEF eHealth Evaluation Lab 2014”.
- [46] T. Almeida, R. Antunes, J. F. Silva, J. R. Almeida, and S. Matos, “Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics,” Database, vol. 2022, 2022, doi: 10.1093/database/baac047.
- [47] H. Dong, V. Suárez-Paniagua, H. Zhang, M. Wang, E. Whitfield, and H. Wu, “Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision”, Accessed: Nov. 21, 2023. [Online]. Available: https://github.com/hanxiao/bert-as-service
- [48] S. Bethard, G. Savova, W.-T. Chen, L. Derczynski, J. Pustejovsky, and M. Verhagen, “SemEval-2016 Task 12: Clinical TempEval”, Accessed: Nov. 21, 2023. [Online]. Available: https://github.com/stylerw/thymedata
- [49] T. Ranasinghe, S. Jones, C. Orăsan, and R. Mitkov, “Biographical: A Semi-Supervised Relation Extraction Dataset; Biographical: A Semi-Supervised Relation Extraction Dataset,” p. 10, doi: 10.1145/3477495.3531742.
- [50] R. Sanjay Shah et al., “WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model for Financial Domain”.
- [51] Y. Zhuang, E. Riloff, K. L. Wagstaff, R. Francis, M. Golombek, and L. Tamppari, “Exploiting Unary Relations with Stacked Learning for Relation Extraction,” pp. 126–137.
- [52] A. Wróblewska et al., “TASTEset-RECIPE DATASET AND FOOD ENTITIES RECOGNITION BENCHMARK A PREPRINT”, Accessed: Nov. 21, 2023. [Online]. Available: https://www.allrecipes.com/,
- [53] J. D’Souza, A. Hoppe, A. Brack, M. Y. Jaradeh, S. Auer, and R. Ewerth, “The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources,” pp. 11–16, 2020, doi: 10.25835/0017546.
A Systematic Literature Review on Custom NER with Evidence-Based Software Engineering
Year 2023,
Volume: 1 Issue: 2, 75 - 81, 31.12.2023
Hakan Kekül
,
Abdulkadir Şeker
Abstract
As the significance of data continues to grows, so does the importance of data analysis methods. Various models are currently being applied, and new models are being proposed all the time. In the context of this study, we conducted a detailed review of Named-Entity Recognition, a data analysis model. We applied the Evidence-Based Software Engineering method, which has been used successfully for many years, as the analysis method. The study analyzed 38 articles selected from a collection of 114 different research articles identified by this method. A detailed presentation of the analyzed data is provided. The study aimed to identify the most effective among the methods using NER. The analysis indicates that BERT was the most successful method in NER studies. It has been found that the "News" domain contains the highest number of NER datasets. The study also provides detailed information on other methods and domains identified. As an original and comprehensive guide, this study serves as an excellent resource for those interested in the field.
References
- [1] I. H. Sarker, “Deep Learning: A Comprehensive” Overview on Techniques, Taxonomy, Applications and Research Directions,” SN Comput Sci, vol. 2, no. 6, pp. 1–20, Nov. 2021, doi: 10.1007/S42979-021-00815-1/FIGURES/6.
- [2] Z. Nasar, S. W. Jaffry, and M. K. Malik, “Named Entity Recognition and Relation Extraction,” ACM Computing Surveys (CSUR), vol. 54, no. 1, Feb. 2021, doi: 10.1145/3445965.
- [3] B. Jehangir, S. Radhakrishnan, and R. Agarwal, “A survey on Named Entity Recognition — datasets, tools, and methodologies,” Natural Language Processing Journal, vol. 3, p. 100017, Jun. 2023, doi: 10.1016/J.NLP.2023.100017.
- [4] J. Li, A. Sun, J. Han, and C. Li, “A Survey on Deep Learning for Named Entity Recognition,” IEEE Trans Knowl Data Eng, vol. 34, no. 1, pp. 50–70, Jan. 2022, doi: 10.1109/TKDE.2020.2981314.
- [5] H. Shelar, G. Kaur, N. Heda, and P. Agrawal, “Named Entity Recognition Approaches and Their Comparison for Custom NER Model,” 2020, doi: 10.1080/0194262X.2020.1759479.
- [6] L. Stepanyan, “Automated Custom Named Entity Recognition and Disambiguation,” International Journal of Signal Processing, vol. 5, 2020, Accessed: Nov. 12, 2023. [Online]. Available: http://iaras.org/iaras/journals/ijsp
- [7] R. Ramachandran and K. Arutchelvan, “Named entity recognition on bio-medical literature documents using hybrid based approach,” J Ambient Intell Humaniz Comput, vol. 1, pp. 1–10, Mar. 2021, doi: 10.1007/S12652-021-03078-Z/FIGURES/5.
- [8] S. Surana, J. Chekkala, and P. Bihani, “Chatbot based Crime Registration and Crime Awareness System using a custom Named Entity Recognition Model for Extracting Information from Complaints,” International Research Journal of Engineering and Technology, 2021, Accessed: Nov. 12, 2023. [Online]. Available: www.irjet.net
- [9] G. Veena, V. Kanjirangat, and D. Gupta, “AGRONER: An unsupervised agriculture named entity recognition using weighted distributional semantic model,” Expert Syst Appl, vol. 229, p. 120440, Nov. 2023, doi: 10.1016/J.ESWA.2023.120440.
- [10] O. Ozcelik and C. Toraman, “Named entity recognition in Turkish: A comparative study with detailed error analysis,” 2022, doi: 10.1016/j.ipm.2022.103065.
- [11] S. Srivastava, B. Paul, and D. Gupta, “Study of Word Embeddings for Enhanced Cyber Security Named Entity Recognition,” Procedia Comput Sci, vol. 218, pp. 449–460, Jan. 2023, doi: 10.1016/J.PROCS.2023.01.027.
- [12] D. L. Sackett, W. M. C. Rosenberg, J. A. M. Gray, R. B. Haynes, and W. S. Richardson, “Evidence based medicine: what it is and what it isn’t. 1996.,” Clinical orthopaedics and related research, vol. 455, no. 7023. British Medical Journal Publishing Group, pp. 3–5, 2007. doi: 10.1136/bmj.312.7023.71.
- [13] D. Berry, W. T.-I. T. on Software, and undefined 2003, “Comments on" Formal methods application: an empirical tale of software development",” ieeexplore.ieee.org, Accessed: Mar. 10, 2023. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/1205183/
- [14] M. Jorgensen, … T. D.-11th I. I., and undefined 2005, “Teaching evidence-based software engineering to university students,” ieeexplore.ieee.org.
- [15] B. A. Kitchenham, T. Dybå, and M. Jørgensen, “Evidence-based software engineering,” in Proceedings - International Conference on Software Engineering, 2004, pp. 273–281. doi: 10.1109/icse.2004.1317449.
- [16] T. Dybå, B. A. Kitchenham, and M. Jorgensen, “Evidence-based software engineering for practitioners,” IEEE Softw, vol. 22, no. 1, pp. 58–65, Jan. 2005, doi: 10.1109/MS.2005.6.
- [17] E. F. Tjong, K. Sang, and F. De Meulder, “Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition,” 2003, Accessed: Nov. 22, 2023. [Online]. Available: http://lcg-www.uia.ac.be/conll2003/ner/
- [18] J. Li et al., “BioCreative V CDR task corpus: a resource for chemical disease relation extraction,” Database, vol. 2016, p. 68, 2016, doi: 10.1093/database/baw068.
- [19] J.-D. Kim, Y. Tateisi, J.-D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii, “GENIA corpus-A semantically annotated corpus for bio-textmining,” BIOINFORMATICS, vol. 19, pp. 180–182, 2003, doi: 10.1093/bioinformatics/btg1023.
- [20] Y. Peng, S. Yan, and Z. Lu, “Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets”, Accessed: Nov. 22, 2023. [Online]. Available: https://biocreative.
- [21] Y. Luan, L. He, M. Ostendorf, and H. Hajishirzi, “Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction”, Accessed: Nov. 22, 2023. [Online]. Available:
http://labs.semanticscholar.
- [22] L. Derczynski, E. Nichols, and M. Van Erp, “Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition,” pp. 140–147, 2017, Accessed: Nov. 22, 2023. [Online]. Available: https://stackexchange.com
- [23] R. Weischedel et al., “Ontonotes release 5.0 ldc2013t19,” Linguistic Data Consortium, Philadelphia, PA, vol. 23, p. 170, 2013.
- [24] E. F. Tjong Kim Sang, “Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition,” in COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002), 2002. [Online]. Available: https://aclanthology.org/W02-2024
- [25] C. Walker, S. Strassel, J. Medero, and K. Maeda, “Ace 2005 multilingual training corpus-linguistic data consortium,” URL: https://catalog. ldc. upenn. edu/LDC2006T06, 2005.
- [26] N. Ding et al., “Few-NERD:A Few-shot Named Entity Recognition Dataset,” in ACL-IJCNLP, 2021.
- [27] A. Mitchell, S. Strassel, S. Huang, and R. Zakhary, “Ace 2004 multilingual training corpus,” Linguistic Data Consortium, Philadelphia, vol. 1, p. 1, 2005.
- [28] L. Derczynski et al., “Analysis of Named Entity Recognition and Linking for Tweets,” Oct. 2014, doi: 10.1016/j.ipm.2014.10.006.
- [29] “ACE 2004 Multilingual Training Corpus - Linguistic Data Consortium.” Accessed: Nov. 22, 2023. [Online]. Available: https://catalog.ldc.upenn.edu/LDC2005T09
- [30] L. Derczynski, K. Bontcheva, and I. Roberts, “Broad Twitter Corpus: A Diverse Named Entity Recognition Resource,” pp. 1169–1179.
- [31] Z. Liu et al., “CrossNER: Evaluating Cross-Domain Named Entity Recognition,” 2020.
- [32] L. Luo, P.-T. Lai, C.-H. Wei, C. N. Arighi, and Z. Lu, “BioRED: a rich biomedical relation extraction dataset,” Brief Bioinform, vol. 2022, no. 5, pp. 1–12, doi: 10.1093/bib/bbac282.
- [33] R. Bossy, L. Deléger, E. Chaix, M. Ba, and C. Nédellec, “Bacteria biotope at BioNLP open shared tasks 2019,” in Proceedings of the 5th workshop on BioNLP open shared tasks, 2019, pp. 121–131.
- [34] E. F. Tjong, K. Sang, and S. Buchholz, “Introduction to the CoNLL-2000 Shared Task: Chunking,” pp. 127–132, 2000, Accessed: Nov. 23, 2023. [Online]. Available: www.uia.ac.be/conll2000/chunking/
- [35] A. Zeldes, “The GUM corpus: creating multilayer resources in the classroom,” Lang Resour Eval, vol. 51, no. 3, pp. 581–612, Sep. 2017, doi: 10.1007/s10579-016-9343-x.
- [36] M. Gritta, M. T. Pilehvar, and N. Collier, “A Pragmatic Guide to Geoparsing Evaluation,” Oct. 2018.
- [37] L. Charnay and ; France-Telecom R&d, “The French MEDIA/EVALDA project: the evaluation of the understanding capability of Spoken Language Dialogue Systems,” J. Goulian, no. 2.
- [38] S. Tedeschi, V. Maiorca, N. Campolungo, F. Cecconi, and R. Navigli, “WikiNEuRal: Combined neural and knowledge-based silver data creation for multilingual NER,” in Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, pp. 2521–2533.
- [39] L. Gessler, S. Peng, Y. Liu, Y. Zhu, S. Behzad, and A. Zeldes, “AMALGUM -- A Free, Balanced, Multilayer English Web Corpus,” Jun. 2020.
- [40] M. Krallinger et al., “The CHEMDNER corpus of chemicals and drugs and its annotation principles,” J Cheminform, vol. 7, no. S1, p. S2, Dec. 2015, doi: 10.1186/1758-2946-7-S1-S2.
- [41] N. Milosevic, G. Kalappa, H. Dadafarin, M. Azimaee, and G. Nenadic, “MASK: A flexible framework to facilitate de-identification of clinical texts,” May 2020.
- [42] C. Zhang et al., “Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction,” Oct. 2023.
- [43] R. Guan et al., “FindVehicle and VehicleFinder: A NER dataset for natural language-based vehicle retrieval and a keyword-based cross-modal vehicle retrieval system,” Apr. 2023.
- [44] A. Mhaske et al., “Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages”.
[45] D. L. Mowery et al., “Task 2: ShARe/CLEF eHealth Evaluation Lab 2014”.
- [46] T. Almeida, R. Antunes, J. F. Silva, J. R. Almeida, and S. Matos, “Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics,” Database, vol. 2022, 2022, doi: 10.1093/database/baac047.
- [47] H. Dong, V. Suárez-Paniagua, H. Zhang, M. Wang, E. Whitfield, and H. Wu, “Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision”, Accessed: Nov. 21, 2023. [Online]. Available: https://github.com/hanxiao/bert-as-service
- [48] S. Bethard, G. Savova, W.-T. Chen, L. Derczynski, J. Pustejovsky, and M. Verhagen, “SemEval-2016 Task 12: Clinical TempEval”, Accessed: Nov. 21, 2023. [Online]. Available: https://github.com/stylerw/thymedata
- [49] T. Ranasinghe, S. Jones, C. Orăsan, and R. Mitkov, “Biographical: A Semi-Supervised Relation Extraction Dataset; Biographical: A Semi-Supervised Relation Extraction Dataset,” p. 10, doi: 10.1145/3477495.3531742.
- [50] R. Sanjay Shah et al., “WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model for Financial Domain”.
- [51] Y. Zhuang, E. Riloff, K. L. Wagstaff, R. Francis, M. Golombek, and L. Tamppari, “Exploiting Unary Relations with Stacked Learning for Relation Extraction,” pp. 126–137.
- [52] A. Wróblewska et al., “TASTEset-RECIPE DATASET AND FOOD ENTITIES RECOGNITION BENCHMARK A PREPRINT”, Accessed: Nov. 21, 2023. [Online]. Available: https://www.allrecipes.com/,
- [53] J. D’Souza, A. Hoppe, A. Brack, M. Y. Jaradeh, S. Auer, and R. Ewerth, “The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources,” pp. 11–16, 2020, doi: 10.25835/0017546.