Araştırma Makalesi

Identification of OOV words in Turkish texts

Cilt: 8 Sayı: 2 31 Ekim 2019
PDF İndir
TR EN

Identification of OOV words in Turkish texts

Öz

In this study, we present a semantic graph network model which is capable of detecting out-of-vocabulary (OOV) words in Turkish texts. In natural language processing (NLP) field, morphological analyzers can encounter unknown words (UW) during word processing. This mostly occurs when these kind of tools depend on a dictionary to find the probable lemmas in order to further process parsing. Sometimes, an analyzer is unable to find any candidates because of the non-existence of the lemma candidates in the dictionary. This results in degraded parsing output. The proposed model for OOV detection is able to define OOV words which are suitable for dictionaries. Also co-occurrence relations of the lemmas in texts are modelled as a semantic sub-graph and it is used to discover collocations to propose as new lemma candidates.  

Anahtar Kelimeler

Kaynakça

  1. Arısoy, E., Dutağacı, H., Arslan, L.M., 2006. A unified language model for large vocabulary continuous speech recognition of Turkish. Signal Processing, 86(10), pp.2844-2862.
  2. Arısoy, E., Can, D., Parlak, S., Sak, H. and Saraçlar, M., 2009. Turkish broadcast news transcription and retrieval. IEEE Transactions on Audio, Speech, and Language Processing, 17(5), pp.874-883.
  3. Arslan, E, Orhan, U. 2017. Using Graphs in Construction of a Lemmatization Model for Turkish, International Mediteranean Science and Engineering Congress, IMSEC.Asahara, M., Matsumoto, Y., 2004, August. Japanese unknown word identification by character-based chunking. In Proceedings of the 20th international conference on Computational Linguistics (p. 459). Association for Computational Linguistics.
  4. Bazzi, I., Glass, J., 2002. A multi-class approach for modelling out-of-vocabulary words. In Seventh International Conference on Spoken Language Processing.
  5. Brill, E., 1995. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational linguistics, 21(4), pp.543-565.
  6. Çöltekin, Ç., 2014. A set of open source tools for Turkish natural language processing. In LREC (pp. 1079-1086).Daciuk, J., 1999, July. Treatment of unknown words. In International Workshop on Implementing Automata (pp. 71-80). Springer, Berlin, Heidelberg.
  7. Erjavec, T., Džeroski, S., 2004. Machine learning of morphosyntactic structure: Lemmatizing unknown Slovene words. Applied Artificial Intelligence, 18(1), pp.17-41.
  8. Jongejan, B., Dalianis, H., 2009. August. Automatic training of lemmatization rules that handle morphological changes in pre-, in-and suffixes alike. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1 (pp. 145-153). Association for Computational Linguistics.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Mühendislik

Bölüm

Araştırma Makalesi

Yazarlar

Umut Orhan
Türkiye

Yayımlanma Tarihi

31 Ekim 2019

Gönderilme Tarihi

30 Aralık 2018

Kabul Tarihi

26 Eylül 2019

Yayımlandığı Sayı

Yıl 2019 Cilt: 8 Sayı: 2

Kaynak Göster

APA
Arslan, E., & Orhan, U. (2019). Identification of OOV words in Turkish texts. Gaziosmanpaşa Bilimsel Araştırma Dergisi, 8(2), 35-48. https://izlik.org/JA76LY73PM
AMA
1.Arslan E, Orhan U. Identification of OOV words in Turkish texts. GBAD. 2019;8(2):35-48. https://izlik.org/JA76LY73PM
Chicago
Arslan, Enis, ve Umut Orhan. 2019. “Identification of OOV words in Turkish texts”. Gaziosmanpaşa Bilimsel Araştırma Dergisi 8 (2): 35-48. https://izlik.org/JA76LY73PM.
EndNote
Arslan E, Orhan U (01 Ekim 2019) Identification of OOV words in Turkish texts. Gaziosmanpaşa Bilimsel Araştırma Dergisi 8 2 35–48.
IEEE
[1]E. Arslan ve U. Orhan, “Identification of OOV words in Turkish texts”, GBAD, c. 8, sy 2, ss. 35–48, Eki. 2019, [çevrimiçi]. Erişim adresi: https://izlik.org/JA76LY73PM
ISNAD
Arslan, Enis - Orhan, Umut. “Identification of OOV words in Turkish texts”. Gaziosmanpaşa Bilimsel Araştırma Dergisi 8/2 (01 Ekim 2019): 35-48. https://izlik.org/JA76LY73PM.
JAMA
1.Arslan E, Orhan U. Identification of OOV words in Turkish texts. GBAD. 2019;8:35–48.
MLA
Arslan, Enis, ve Umut Orhan. “Identification of OOV words in Turkish texts”. Gaziosmanpaşa Bilimsel Araştırma Dergisi, c. 8, sy 2, Ekim 2019, ss. 35-48, https://izlik.org/JA76LY73PM.
Vancouver
1.Enis Arslan, Umut Orhan. Identification of OOV words in Turkish texts. GBAD [Internet]. 01 Ekim 2019;8(2):35-48. Erişim adresi: https://izlik.org/JA76LY73PM