Araştırma Makalesi

Statistical Machine Translation Customization between Turkish and 11 Languages

Cilt: 3 Sayı: 1 30 Haziran 2020
  • Gökhan Doğru
PDF İndir
EN

Statistical Machine Translation Customization between Turkish and 11 Languages

Öz

Statistical Machine Translation (SMT) has been the dominant corpus-based machine translation (MT) approach in the last twenty years. While SMT has been studied in detail among European languages, it has not been studied sufficiently in language pairs including Turkish as source or target language, and its study has been limited mostly to English ↔ Turkish language pair. This study aims to broaden the perspective on Turkish corpus-based MT studies by training MT engines between Turkish and a wide variety of languages with different features. It surveys customized SMT between Turkish and 11 different languages. Twenty-two SMT engines have been trained in KantanMT with open parallel corpora using Turkish as both source and target language. Three automatic evaluation metrics F-Measure, BLEU, and TER have been used for evaluating MT quality. Due to the variations in the corpus quality and size, highly varying results have been achieved. While Turkish ↔ Catalan engines have had the highest automatic evaluation scores, Turkish ↔ Arabic engines have had the lowest automatic scores. While the quality results are highly varying across languages, we obtain baseline scores for a wide variety of languages coupled with Turkish. These results may provide a reference point for evaluating future MT systems including Turkish.

Anahtar Kelimeler

Kaynakça

  1. Bektaş, Emre, Ertuğrul Yılmaz, Coşkun Mermer, and İlknur Durgar El-Kahlout. 2016. “TÜBİTAK SMT System Submissions for WMT 2016.” In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, 246–251. doi:10.18653/v1/W16-2305.
  2. Castilho, Sheila, Joss Moorkens, Federico Gaspari, Rico Sennrich, Andy Way, and Panayota Georgakopoulou. 2018. “Evaluating MT for Massive Open Online Courses: A Multifaceted Comparison between PBSMT and NMT Systems.” Machine Translation, no. 32, 255–278. doi:10.1007/s10590-018-9221-y.
  3. Durgar El-Kahlout, İlknur, and Kemal Oflazer. 2006. “Initial Explorations in English to Turkish Statistical Machine Translation.” In HLT-NAACL 06 Statistical Machine Translation Proceedings of the Workshop, 8–9 June 2006, New York City, USA, 7–14. Madison, WI: Omnipress. https://www.aclweb.org/anthology/W06-3102.pdf.
  4. Durgar El-Kahlout, İlknur, and Kemal Oflazer. 2010. “Exploiting Morphology and Local Word Reordering in English to Turkish Phrase-Based Statistical Machine Translation.” IEEE Transactions on Audio, Speech and Language Processing 18 (6): 1313–1322. doi:10.1109/TASL.2009.2033321.
  5. Hearne, Mary, and Andy Way. 2011. “Statistical Machine Translation: A Guide for Linguists and Translators.” Language and Linguistics Compass 5 (5): 205–226. doi:10.1111/j.1749-818X.2011.00274.x.
  6. Koehn, Philipp. 2005. “Europarl: A Parallel Corpus for Statistical Machine Translation.” In The Tenth Machine Translation Summit: Proceedings of Conference, September 12–16, 2005, Phuket, Thailand, 79–86. http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf.
  7. Koehn, Philipp. 2009. Statistical Machine Translation. Cambridge: Cambridge University Press.
  8. Koehn, Philipp, and Christof Monz. 2006. “Manual and Automatic Evaluation of Machine Translation between European Languages.” In HLT-NAACL 06 Statistical Machine Translation: Proceedings of the Workshop, 8–9 June 2006, New York City, USA, 102–121. Madison, WI: Omnipress. http://www.statmt.org/wmt06/proceedings/pdf/WMT14.pdf.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Dil Çalışmaları

Bölüm

Araştırma Makalesi

Yazarlar

Yayımlanma Tarihi

30 Haziran 2020

Gönderilme Tarihi

23 Nisan 2020

Kabul Tarihi

10 Haziran 2020

Yayımlandığı Sayı

Yıl 2020 Cilt: 3 Sayı: 1

Kaynak Göster

APA
Doğru, G. (2020). Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal, 3(1), 98-121. https://doi.org/10.29228/transLogos.23
AMA
1.Doğru G. Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal. 2020;3(1):98-121. doi:10.29228/transLogos.23
Chicago
Doğru, Gökhan. 2020. “Statistical Machine Translation Customization between Turkish and 11 Languages”. transLogos Translation Studies Journal 3 (1): 98-121. https://doi.org/10.29228/transLogos.23.
EndNote
Doğru G (01 Haziran 2020) Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal 3 1 98–121.
IEEE
[1]G. Doğru, “Statistical Machine Translation Customization between Turkish and 11 Languages”, transLogos Translation Studies Journal, c. 3, sy 1, ss. 98–121, Haz. 2020, doi: 10.29228/transLogos.23.
ISNAD
Doğru, Gökhan. “Statistical Machine Translation Customization between Turkish and 11 Languages”. transLogos Translation Studies Journal 3/1 (01 Haziran 2020): 98-121. https://doi.org/10.29228/transLogos.23.
JAMA
1.Doğru G. Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal. 2020;3:98–121.
MLA
Doğru, Gökhan. “Statistical Machine Translation Customization between Turkish and 11 Languages”. transLogos Translation Studies Journal, c. 3, sy 1, Haziran 2020, ss. 98-121, doi:10.29228/transLogos.23.
Vancouver
1.Gökhan Doğru. Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal. 01 Haziran 2020;3(1):98-121. doi:10.29228/transLogos.23