Statistical Machine Translation Customization between Turkish and 11 Languages

Gökhan Doğru

doi:10.29228/transLogos.23

Research Article

Year 2020, Volume: 3 Issue: 1, 98 - 121, 30.06.2020

Gökhan Doğru

https://doi.org/10.29228/transLogos.23

Abstract

References

Bektaş, Emre, Ertuğrul Yılmaz, Coşkun Mermer, and İlknur Durgar El-Kahlout. 2016. “TÜBİTAK SMT System Submissions for WMT 2016.” In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, 246–251. doi:10.18653/v1/W16-2305.
Castilho, Sheila, Joss Moorkens, Federico Gaspari, Rico Sennrich, Andy Way, and Panayota Georgakopoulou. 2018. “Evaluating MT for Massive Open Online Courses: A Multifaceted Comparison between PBSMT and NMT Systems.” Machine Translation, no. 32, 255–278. doi:10.1007/s10590-018-9221-y.
Durgar El-Kahlout, İlknur, and Kemal Oflazer. 2006. “Initial Explorations in English to Turkish Statistical Machine Translation.” In HLT-NAACL 06 Statistical Machine Translation Proceedings of the Workshop, 8–9 June 2006, New York City, USA, 7–14. Madison, WI: Omnipress. https://www.aclweb.org/anthology/W06-3102.pdf.
Durgar El-Kahlout, İlknur, and Kemal Oflazer. 2010. “Exploiting Morphology and Local Word Reordering in English to Turkish Phrase-Based Statistical Machine Translation.” IEEE Transactions on Audio, Speech and Language Processing 18 (6): 1313–1322. doi:10.1109/TASL.2009.2033321.
Hearne, Mary, and Andy Way. 2011. “Statistical Machine Translation: A Guide for Linguists and Translators.” Language and Linguistics Compass 5 (5): 205–226. doi:10.1111/j.1749-818X.2011.00274.x.
Koehn, Philipp. 2005. “Europarl: A Parallel Corpus for Statistical Machine Translation.” In The Tenth Machine Translation Summit: Proceedings of Conference, September 12–16, 2005, Phuket, Thailand, 79–86. http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf.
Koehn, Philipp. 2009. Statistical Machine Translation. Cambridge: Cambridge University Press.
Koehn, Philipp, and Christof Monz. 2006. “Manual and Automatic Evaluation of Machine Translation between European Languages.” In HLT-NAACL 06 Statistical Machine Translation: Proceedings of the Workshop, 8–9 June 2006, New York City, USA, 102–121. Madison, WI: Omnipress. http://www.statmt.org/wmt06/proceedings/pdf/WMT14.pdf.
Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, et al. 2007. “Moses: Open Source Toolkit for Statistical Machine Translation.” In ACL 2007: Proceedings of the Interactive Poster and Demonstration Sessions, June 25–27, 2007, Prague, Czech Republic, 177–180. Madison, WI: Omnipress. https://www.aclweb.org/anthology/P07-2045.pdf.
Lumeras, Maite Aragonés, and Andy Way. 2017. “On the Complementarity between Human Translators and Machine Translation.” HERMES, no. 56, 21–42. doi:10.7146/hjlcb.v0i56.97200.
Oflazer, Kemal, and İlknur Durgar El-Kahlout. 2007. “Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation.” In ACL 2007: Proceedings of the Second Workshop on Statistical Machine Translation, June 23, 2007, Prague, Czech Republic, 25–32. Madison, WI: Omnipress. https://www.aclweb.org/anthology/W07-0704.pdf.
Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. “Bleu: A Method for Automatic Evaluation of Machine Translation.” In ACL 2002: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 7–12 July 2002, Philadelphia, USA, 311–318. Association for Computational Linguistics. doi:10.3115/1073083.1073135.
Tantuğ, A. Cüneyd, and Eşref Adalı. 2018. “Machine Translation between Turkic Languages.” In Turkish Natural Language Processing, edited by Kemal Oflazer and Murat Saraçlar, 237–254. Cham, Switzerland: Springer International.
Tantuğ, A. Cüneyd, Kemal Oflazer, and İlknur Durgar El-Kahlout. 2008. “BLEU+: A Tool for Fine-Grained BLEU Computation.” In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), 1493–1499. http://www.lrec-conf.org/proceedings/lrec2008/pdf/382_paper.pdf.
Tiedemann, Jörg. 2012. “Parallel Data, Tools and Interfaces in OPUS.” In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), 2214–2218. http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf.
Tiedemann, Jörg, and Lars Nygaard. 2004. “The OPUS Corpus - Parallel and Free: http://logos.uio.no/opus.” In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), 1183–1186. http://www.lrec-conf.org/proceedings/lrec2004/pdf/320.pdf.
Tyers, Francis Morton, and Murat Serdar Alperen. 2010. “South-East European Times: A Parallel Corpus of Balkan Languages.” In Proceedings of the LREC’10 Workshop on Exploitation of Multilingual Resources and Tools for Central and (South-) Eastern European Languages, 49–53. http://www.lrec-conf.org/proceedings/lrec2010/workshops/W22.pdf.

Statistical Machine Translation Customization between Turkish and 11 Languages

Year 2020, Volume: 3 Issue: 1, 98 - 121, 30.06.2020

Gökhan Doğru

https://doi.org/10.29228/transLogos.23

Abstract

Statistical Machine Translation (SMT) has been the dominant corpus-based machine translation (MT) approach in the last twenty years. While SMT has been studied in detail among European languages, it has not been studied sufficiently in language pairs including Turkish as source or target language, and its study has been limited mostly to English ↔ Turkish language pair. This study aims to broaden the perspective on Turkish corpus-based MT studies by training MT engines between Turkish and a wide variety of languages with different features. It surveys customized SMT between Turkish and 11 different languages. Twenty-two SMT engines have been trained in KantanMT with open parallel corpora using Turkish as both source and target language. Three automatic evaluation metrics F-Measure, BLEU, and TER have been used for evaluating MT quality. Due to the variations in the corpus quality and size, highly varying results have been achieved. While Turkish ↔ Catalan engines have had the highest automatic evaluation scores, Turkish ↔ Arabic engines have had the lowest automatic scores. While the quality results are highly varying across languages, we obtain baseline scores for a wide variety of languages coupled with Turkish. These results may provide a reference point for evaluating future MT systems including Turkish.

Keywords

statistical machine translation customization, Turkish, automatic evaluation metrics, translation quality evaluation, parallel corpus

References

Bektaş, Emre, Ertuğrul Yılmaz, Coşkun Mermer, and İlknur Durgar El-Kahlout. 2016. “TÜBİTAK SMT System Submissions for WMT 2016.” In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, 246–251. doi:10.18653/v1/W16-2305.
Castilho, Sheila, Joss Moorkens, Federico Gaspari, Rico Sennrich, Andy Way, and Panayota Georgakopoulou. 2018. “Evaluating MT for Massive Open Online Courses: A Multifaceted Comparison between PBSMT and NMT Systems.” Machine Translation, no. 32, 255–278. doi:10.1007/s10590-018-9221-y.
Durgar El-Kahlout, İlknur, and Kemal Oflazer. 2006. “Initial Explorations in English to Turkish Statistical Machine Translation.” In HLT-NAACL 06 Statistical Machine Translation Proceedings of the Workshop, 8–9 June 2006, New York City, USA, 7–14. Madison, WI: Omnipress. https://www.aclweb.org/anthology/W06-3102.pdf.
Durgar El-Kahlout, İlknur, and Kemal Oflazer. 2010. “Exploiting Morphology and Local Word Reordering in English to Turkish Phrase-Based Statistical Machine Translation.” IEEE Transactions on Audio, Speech and Language Processing 18 (6): 1313–1322. doi:10.1109/TASL.2009.2033321.
Hearne, Mary, and Andy Way. 2011. “Statistical Machine Translation: A Guide for Linguists and Translators.” Language and Linguistics Compass 5 (5): 205–226. doi:10.1111/j.1749-818X.2011.00274.x.
Koehn, Philipp. 2005. “Europarl: A Parallel Corpus for Statistical Machine Translation.” In The Tenth Machine Translation Summit: Proceedings of Conference, September 12–16, 2005, Phuket, Thailand, 79–86. http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf.
Koehn, Philipp. 2009. Statistical Machine Translation. Cambridge: Cambridge University Press.
Koehn, Philipp, and Christof Monz. 2006. “Manual and Automatic Evaluation of Machine Translation between European Languages.” In HLT-NAACL 06 Statistical Machine Translation: Proceedings of the Workshop, 8–9 June 2006, New York City, USA, 102–121. Madison, WI: Omnipress. http://www.statmt.org/wmt06/proceedings/pdf/WMT14.pdf.
Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, et al. 2007. “Moses: Open Source Toolkit for Statistical Machine Translation.” In ACL 2007: Proceedings of the Interactive Poster and Demonstration Sessions, June 25–27, 2007, Prague, Czech Republic, 177–180. Madison, WI: Omnipress. https://www.aclweb.org/anthology/P07-2045.pdf.
Lumeras, Maite Aragonés, and Andy Way. 2017. “On the Complementarity between Human Translators and Machine Translation.” HERMES, no. 56, 21–42. doi:10.7146/hjlcb.v0i56.97200.
Oflazer, Kemal, and İlknur Durgar El-Kahlout. 2007. “Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation.” In ACL 2007: Proceedings of the Second Workshop on Statistical Machine Translation, June 23, 2007, Prague, Czech Republic, 25–32. Madison, WI: Omnipress. https://www.aclweb.org/anthology/W07-0704.pdf.
Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. “Bleu: A Method for Automatic Evaluation of Machine Translation.” In ACL 2002: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 7–12 July 2002, Philadelphia, USA, 311–318. Association for Computational Linguistics. doi:10.3115/1073083.1073135.
Tantuğ, A. Cüneyd, and Eşref Adalı. 2018. “Machine Translation between Turkic Languages.” In Turkish Natural Language Processing, edited by Kemal Oflazer and Murat Saraçlar, 237–254. Cham, Switzerland: Springer International.
Tantuğ, A. Cüneyd, Kemal Oflazer, and İlknur Durgar El-Kahlout. 2008. “BLEU+: A Tool for Fine-Grained BLEU Computation.” In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), 1493–1499. http://www.lrec-conf.org/proceedings/lrec2008/pdf/382_paper.pdf.
Tiedemann, Jörg. 2012. “Parallel Data, Tools and Interfaces in OPUS.” In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), 2214–2218. http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf.
Tiedemann, Jörg, and Lars Nygaard. 2004. “The OPUS Corpus - Parallel and Free: http://logos.uio.no/opus.” In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), 1183–1186. http://www.lrec-conf.org/proceedings/lrec2004/pdf/320.pdf.
Tyers, Francis Morton, and Murat Serdar Alperen. 2010. “South-East European Times: A Parallel Corpus of Balkan Languages.” In Proceedings of the LREC’10 Workshop on Exploitation of Multilingual Resources and Tools for Central and (South-) Eastern European Languages, 49–53. http://www.lrec-conf.org/proceedings/lrec2010/workshops/W22.pdf.

There are 17 citations in total.

Details

Primary Language	English
Subjects	Language Studies
Journal Section	Research Articles
Authors	Gökhan Doğru This is me 0000-0001-7141-2350
Publication Date	June 30, 2020
Published in Issue	Year 2020 Volume: 3 Issue: 1

Cite

APA	Doğru, G. (2020). Statistical Machine Translation Customization between Turkish and 11 Languages. TransLogos Translation Studies Journal, 3(1), 98-121. https://doi.org/10.29228/transLogos.23
AMA	Doğru G. Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal. June 2020;3(1):98-121. doi:10.29228/transLogos.23
Chicago	Doğru, Gökhan. “Statistical Machine Translation Customization Between Turkish and 11 Languages”. TransLogos Translation Studies Journal 3, no. 1 (June 2020): 98-121. https://doi.org/10.29228/transLogos.23.
EndNote	Doğru G (June 1, 2020) Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal 3 1 98–121.
IEEE	G. Doğru, “Statistical Machine Translation Customization between Turkish and 11 Languages”, transLogos Translation Studies Journal, vol. 3, no. 1, pp. 98–121, 2020, doi: 10.29228/transLogos.23.
ISNAD	Doğru, Gökhan. “Statistical Machine Translation Customization Between Turkish and 11 Languages”. transLogos Translation Studies Journal 3/1 (June 2020), 98-121. https://doi.org/10.29228/transLogos.23.
JAMA	Doğru G. Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal. 2020;3:98–121.
MLA	Doğru, Gökhan. “Statistical Machine Translation Customization Between Turkish and 11 Languages”. TransLogos Translation Studies Journal, vol. 3, no. 1, 2020, pp. 98-121, doi:10.29228/transLogos.23.
Vancouver	Doğru G. Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal. 2020;3(1):98-121.

Download Cover Image

Article Files

Full Text