Statistical Machine Translation Customization between Turkish and 11 Languages

Gökhan Doğru

doi:10.29228/transLogos.23

EN

Statistical Machine Translation Customization between Turkish and 11 Languages

Öz

Statistical Machine Translation (SMT) has been the dominant corpus-based machine translation (MT) approach in the last twenty years. While SMT has been studied in detail among European languages, it has not been studied sufficiently in language pairs including Turkish as source or target language, and its study has been limited mostly to English ↔ Turkish language pair. This study aims to broaden the perspective on Turkish corpus-based MT studies by training MT engines between Turkish and a wide variety of languages with different features. It surveys customized SMT between Turkish and 11 different languages. Twenty-two SMT engines have been trained in KantanMT with open parallel corpora using Turkish as both source and target language. Three automatic evaluation metrics F-Measure, BLEU, and TER have been used for evaluating MT quality. Due to the variations in the corpus quality and size, highly varying results have been achieved. While Turkish ↔ Catalan engines have had the highest automatic evaluation scores, Turkish ↔ Arabic engines have had the lowest automatic scores. While the quality results are highly varying across languages, we obtain baseline scores for a wide variety of languages coupled with Turkish. These results may provide a reference point for evaluating future MT systems including Turkish.

Anahtar Kelimeler

Kaynakça

Bektaş, Emre, Ertuğrul Yılmaz, Coşkun Mermer, and İlknur Durgar El-Kahlout. 2016. “TÜBİTAK SMT System Submissions for WMT 2016.” In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, 246–251. doi:10.18653/v1/W16-2305.
Castilho, Sheila, Joss Moorkens, Federico Gaspari, Rico Sennrich, Andy Way, and Panayota Georgakopoulou. 2018. “Evaluating MT for Massive Open Online Courses: A Multifaceted Comparison between PBSMT and NMT Systems.” Machine Translation, no. 32, 255–278. doi:10.1007/s10590-018-9221-y.
Durgar El-Kahlout, İlknur, and Kemal Oflazer. 2006. “Initial Explorations in English to Turkish Statistical Machine Translation.” In HLT-NAACL 06 Statistical Machine Translation Proceedings of the Workshop, 8–9 June 2006, New York City, USA, 7–14. Madison, WI: Omnipress. https://www.aclweb.org/anthology/W06-3102.pdf.
Durgar El-Kahlout, İlknur, and Kemal Oflazer. 2010. “Exploiting Morphology and Local Word Reordering in English to Turkish Phrase-Based Statistical Machine Translation.” IEEE Transactions on Audio, Speech and Language Processing 18 (6): 1313–1322. doi:10.1109/TASL.2009.2033321.
Hearne, Mary, and Andy Way. 2011. “Statistical Machine Translation: A Guide for Linguists and Translators.” Language and Linguistics Compass 5 (5): 205–226. doi:10.1111/j.1749-818X.2011.00274.x.
Koehn, Philipp. 2005. “Europarl: A Parallel Corpus for Statistical Machine Translation.” In The Tenth Machine Translation Summit: Proceedings of Conference, September 12–16, 2005, Phuket, Thailand, 79–86. http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf.
Koehn, Philipp. 2009. Statistical Machine Translation. Cambridge: Cambridge University Press.
Koehn, Philipp, and Christof Monz. 2006. “Manual and Automatic Evaluation of Machine Translation between European Languages.” In HLT-NAACL 06 Statistical Machine Translation: Proceedings of the Workshop, 8–9 June 2006, New York City, USA, 102–121. Madison, WI: Omnipress. http://www.statmt.org/wmt06/proceedings/pdf/WMT14.pdf.

Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, et al. 2007. “Moses: Open Source Toolkit for Statistical Machine Translation.” In ACL 2007: Proceedings of the Interactive Poster and Demonstration Sessions, June 25–27, 2007, Prague, Czech Republic, 177–180. Madison, WI: Omnipress. https://www.aclweb.org/anthology/P07-2045.pdf.
Lumeras, Maite Aragonés, and Andy Way. 2017. “On the Complementarity between Human Translators and Machine Translation.” HERMES, no. 56, 21–42. doi:10.7146/hjlcb.v0i56.97200.
Oflazer, Kemal, and İlknur Durgar El-Kahlout. 2007. “Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation.” In ACL 2007: Proceedings of the Second Workshop on Statistical Machine Translation, June 23, 2007, Prague, Czech Republic, 25–32. Madison, WI: Omnipress. https://www.aclweb.org/anthology/W07-0704.pdf.
Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. “Bleu: A Method for Automatic Evaluation of Machine Translation.” In ACL 2002: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 7–12 July 2002, Philadelphia, USA, 311–318. Association for Computational Linguistics. doi:10.3115/1073083.1073135.
Tantuğ, A. Cüneyd, and Eşref Adalı. 2018. “Machine Translation between Turkic Languages.” In Turkish Natural Language Processing, edited by Kemal Oflazer and Murat Saraçlar, 237–254. Cham, Switzerland: Springer International.
Tantuğ, A. Cüneyd, Kemal Oflazer, and İlknur Durgar El-Kahlout. 2008. “BLEU+: A Tool for Fine-Grained BLEU Computation.” In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), 1493–1499. http://www.lrec-conf.org/proceedings/lrec2008/pdf/382_paper.pdf.
Tiedemann, Jörg. 2012. “Parallel Data, Tools and Interfaces in OPUS.” In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), 2214–2218. http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf.
Tiedemann, Jörg, and Lars Nygaard. 2004. “The OPUS Corpus - Parallel and Free: http://logos.uio.no/opus.” In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), 1183–1186. http://www.lrec-conf.org/proceedings/lrec2004/pdf/320.pdf.
Tyers, Francis Morton, and Murat Serdar Alperen. 2010. “South-East European Times: A Parallel Corpus of Balkan Languages.” In Proceedings of the LREC’10 Workshop on Exploitation of Multilingual Resources and Tools for Central and (South-) Eastern European Languages, 49–53. http://www.lrec-conf.org/proceedings/lrec2010/workshops/W22.pdf.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Dil Çalışmaları

Bölüm

Araştırma Makalesi

Yazarlar

Gökhan Doğru Bu kişi benim
0000-0001-7141-2350
Spain

Yayımlanma Tarihi

30 Haziran 2020

Gönderilme Tarihi

23 Nisan 2020

Kabul Tarihi

10 Haziran 2020

Yayımlandığı Sayı

Yıl 2020 Cilt: 3 Sayı: 1

DOI

https://doi.org/10.29228/transLogos.23

IZ

https://izlik.org/JA87AS48YK

Kaynak Göster

RIS / Bibtex

APA

Doğru, G. (2020). Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal, 3(1), 98-121. https://doi.org/10.29228/transLogos.23

AMA

1.Doğru G. Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal. 2020;3(1):98-121. doi:10.29228/transLogos.23

Chicago

Doğru, Gökhan. 2020. “Statistical Machine Translation Customization between Turkish and 11 Languages”. transLogos Translation Studies Journal 3 (1): 98-121. https://doi.org/10.29228/transLogos.23.

EndNote

Doğru G (01 Haziran 2020) Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal 3 1 98–121.

IEEE

[1]G. Doğru, “Statistical Machine Translation Customization between Turkish and 11 Languages”, transLogos Translation Studies Journal, c. 3, sy 1, ss. 98–121, Haz. 2020, doi: 10.29228/transLogos.23.

ISNAD

Doğru, Gökhan. “Statistical Machine Translation Customization between Turkish and 11 Languages”. transLogos Translation Studies Journal 3/1 (01 Haziran 2020): 98-121. https://doi.org/10.29228/transLogos.23.

JAMA

1.Doğru G. Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal. 2020;3:98–121.

MLA

Doğru, Gökhan. “Statistical Machine Translation Customization between Turkish and 11 Languages”. transLogos Translation Studies Journal, c. 3, sy 1, Haziran 2020, ss. 98-121, doi:10.29228/transLogos.23.

Vancouver

1.Gökhan Doğru. Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal. 01 Haziran 2020;3(1):98-121. doi:10.29228/transLogos.23