TÜBİTAK
5190074
Named Entity Recognition (NER) is the process of automatically recognizing entity names such as person, organization, and date in a document. In this study, we focus on bank documents written in Turkish and propose a Conditional Random Fields (CRF) model to extract named entities. The main contribution of this study is twofold: (i) we propose domain-specific features to extract entity names such as law, regulation, and reference which frequently appear in bank documents; and (ii) we contribute to NER research in Turkish document which is not as mature as other languages such as English and German. Experimental results based on 10-fold cross validation conducted on 551 real-life, anonymized bank documents show the proposed CRF-NER model achieves 0.962 micro average F1 score. More specifically, F1 score for the identification of law names is 0.979, regulation name is 0.850, and article no is 0.850.
Bank Document Conditional Random Fields Named Entity Recognition Natural Language Processing Turkish Documents
5190074
Primary Language | English |
---|---|
Subjects | Computer Software |
Journal Section | Articles |
Authors | |
Project Number | 5190074 |
Publication Date | November 30, 2021 |
Acceptance Date | April 13, 2021 |
Published in Issue | Year 2021 |