Research Article

Statistical Machine Translation Customization between Turkish and 11 Languages

Volume: 3 Number: 1 June 30, 2020
  • Gökhan Doğru
EN

Statistical Machine Translation Customization between Turkish and 11 Languages

Abstract

Statistical Machine Translation (SMT) has been the dominant corpus-based machine translation (MT) approach in the last twenty years. While SMT has been studied in detail among European languages, it has not been studied sufficiently in language pairs including Turkish as source or target language, and its study has been limited mostly to English ↔ Turkish language pair. This study aims to broaden the perspective on Turkish corpus-based MT studies by training MT engines between Turkish and a wide variety of languages with different features. It surveys customized SMT between Turkish and 11 different languages. Twenty-two SMT engines have been trained in KantanMT with open parallel corpora using Turkish as both source and target language. Three automatic evaluation metrics F-Measure, BLEU, and TER have been used for evaluating MT quality. Due to the variations in the corpus quality and size, highly varying results have been achieved. While Turkish ↔ Catalan engines have had the highest automatic evaluation scores, Turkish ↔ Arabic engines have had the lowest automatic scores. While the quality results are highly varying across languages, we obtain baseline scores for a wide variety of languages coupled with Turkish. These results may provide a reference point for evaluating future MT systems including Turkish.

Keywords

References

  1. Bektaş, Emre, Ertuğrul Yılmaz, Coşkun Mermer, and İlknur Durgar El-Kahlout. 2016. “TÜBİTAK SMT System Submissions for WMT 2016.” In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, 246–251. doi:10.18653/v1/W16-2305.
  2. Castilho, Sheila, Joss Moorkens, Federico Gaspari, Rico Sennrich, Andy Way, and Panayota Georgakopoulou. 2018. “Evaluating MT for Massive Open Online Courses: A Multifaceted Comparison between PBSMT and NMT Systems.” Machine Translation, no. 32, 255–278. doi:10.1007/s10590-018-9221-y.
  3. Durgar El-Kahlout, İlknur, and Kemal Oflazer. 2006. “Initial Explorations in English to Turkish Statistical Machine Translation.” In HLT-NAACL 06 Statistical Machine Translation Proceedings of the Workshop, 8–9 June 2006, New York City, USA, 7–14. Madison, WI: Omnipress. https://www.aclweb.org/anthology/W06-3102.pdf.
  4. Durgar El-Kahlout, İlknur, and Kemal Oflazer. 2010. “Exploiting Morphology and Local Word Reordering in English to Turkish Phrase-Based Statistical Machine Translation.” IEEE Transactions on Audio, Speech and Language Processing 18 (6): 1313–1322. doi:10.1109/TASL.2009.2033321.
  5. Hearne, Mary, and Andy Way. 2011. “Statistical Machine Translation: A Guide for Linguists and Translators.” Language and Linguistics Compass 5 (5): 205–226. doi:10.1111/j.1749-818X.2011.00274.x.
  6. Koehn, Philipp. 2005. “Europarl: A Parallel Corpus for Statistical Machine Translation.” In The Tenth Machine Translation Summit: Proceedings of Conference, September 12–16, 2005, Phuket, Thailand, 79–86. http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf.
  7. Koehn, Philipp. 2009. Statistical Machine Translation. Cambridge: Cambridge University Press.
  8. Koehn, Philipp, and Christof Monz. 2006. “Manual and Automatic Evaluation of Machine Translation between European Languages.” In HLT-NAACL 06 Statistical Machine Translation: Proceedings of the Workshop, 8–9 June 2006, New York City, USA, 102–121. Madison, WI: Omnipress. http://www.statmt.org/wmt06/proceedings/pdf/WMT14.pdf.

Details

Primary Language

English

Subjects

Language Studies

Journal Section

Research Article

Authors

Publication Date

June 30, 2020

Submission Date

April 23, 2020

Acceptance Date

June 10, 2020

Published in Issue

Year 2020 Volume: 3 Number: 1

APA
Doğru, G. (2020). Statistical Machine Translation Customization between Turkish and 11 Languages. TransLogos Translation Studies Journal, 3(1), 98-121. https://doi.org/10.29228/transLogos.23
AMA
1.Doğru G. Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal. 2020;3(1):98-121. doi:10.29228/transLogos.23
Chicago
Doğru, Gökhan. 2020. “Statistical Machine Translation Customization Between Turkish and 11 Languages”. TransLogos Translation Studies Journal 3 (1): 98-121. https://doi.org/10.29228/transLogos.23.
EndNote
Doğru G (June 1, 2020) Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal 3 1 98–121.
IEEE
[1]G. Doğru, “Statistical Machine Translation Customization between Turkish and 11 Languages”, transLogos Translation Studies Journal, vol. 3, no. 1, pp. 98–121, June 2020, doi: 10.29228/transLogos.23.
ISNAD
Doğru, Gökhan. “Statistical Machine Translation Customization Between Turkish and 11 Languages”. transLogos Translation Studies Journal 3/1 (June 1, 2020): 98-121. https://doi.org/10.29228/transLogos.23.
JAMA
1.Doğru G. Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal. 2020;3:98–121.
MLA
Doğru, Gökhan. “Statistical Machine Translation Customization Between Turkish and 11 Languages”. TransLogos Translation Studies Journal, vol. 3, no. 1, June 2020, pp. 98-121, doi:10.29228/transLogos.23.
Vancouver
1.Gökhan Doğru. Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal. 2020 Jun. 1;3(1):98-121. doi:10.29228/transLogos.23