Research Article
PDF Zotero Mendeley EndNote BibTex Cite

Year 2020, Volume 3, Issue 1, 98 - 121, 30.06.2020
https://doi.org/10.29228/transLogos.23

Abstract

References

  • Bektaş, Emre, Ertuğrul Yılmaz, Coşkun Mermer, and İlknur Durgar El-Kahlout. 2016. “TÜBİTAK SMT System Submissions for WMT 2016.” In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, 246–251. doi:10.18653/v1/W16-2305.
  • Castilho, Sheila, Joss Moorkens, Federico Gaspari, Rico Sennrich, Andy Way, and Panayota Georgakopoulou. 2018. “Evaluating MT for Massive Open Online Courses: A Multifaceted Comparison between PBSMT and NMT Systems.” Machine Translation, no. 32, 255–278. doi:10.1007/s10590-018-9221-y.
  • Durgar El-Kahlout, İlknur, and Kemal Oflazer. 2006. “Initial Explorations in English to Turkish Statistical Machine Translation.” In HLT-NAACL 06 Statistical Machine Translation Proceedings of the Workshop, 8–9 June 2006, New York City, USA, 7–14. Madison, WI: Omnipress. https://www.aclweb.org/anthology/W06-3102.pdf.
  • Durgar El-Kahlout, İlknur, and Kemal Oflazer. 2010. “Exploiting Morphology and Local Word Reordering in English to Turkish Phrase-Based Statistical Machine Translation.” IEEE Transactions on Audio, Speech and Language Processing 18 (6): 1313–1322. doi:10.1109/TASL.2009.2033321.
  • Hearne, Mary, and Andy Way. 2011. “Statistical Machine Translation: A Guide for Linguists and Translators.” Language and Linguistics Compass 5 (5): 205–226. doi:10.1111/j.1749-818X.2011.00274.x.
  • Koehn, Philipp. 2005. “Europarl: A Parallel Corpus for Statistical Machine Translation.” In The Tenth Machine Translation Summit: Proceedings of Conference, September 12–16, 2005, Phuket, Thailand, 79–86. http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf.
  • Koehn, Philipp. 2009. Statistical Machine Translation. Cambridge: Cambridge University Press.
  • Koehn, Philipp, and Christof Monz. 2006. “Manual and Automatic Evaluation of Machine Translation between European Languages.” In HLT-NAACL 06 Statistical Machine Translation: Proceedings of the Workshop, 8–9 June 2006, New York City, USA, 102–121. Madison, WI: Omnipress. http://www.statmt.org/wmt06/proceedings/pdf/WMT14.pdf.
  • Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, et al. 2007. “Moses: Open Source Toolkit for Statistical Machine Translation.” In ACL 2007: Proceedings of the Interactive Poster and Demonstration Sessions, June 25–27, 2007, Prague, Czech Republic, 177–180. Madison, WI: Omnipress. https://www.aclweb.org/anthology/P07-2045.pdf.
  • Lumeras, Maite Aragonés, and Andy Way. 2017. “On the Complementarity between Human Translators and Machine Translation.” HERMES, no. 56, 21–42. doi:10.7146/hjlcb.v0i56.97200.
  • Oflazer, Kemal, and İlknur Durgar El-Kahlout. 2007. “Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation.” In ACL 2007: Proceedings of the Second Workshop on Statistical Machine Translation, June 23, 2007, Prague, Czech Republic, 25–32. Madison, WI: Omnipress. https://www.aclweb.org/anthology/W07-0704.pdf.
  • Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. “Bleu: A Method for Automatic Evaluation of Machine Translation.” In ACL 2002: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 7–12 July 2002, Philadelphia, USA, 311–318. Association for Computational Linguistics. doi:10.3115/1073083.1073135.
  • Tantuğ, A. Cüneyd, and Eşref Adalı. 2018. “Machine Translation between Turkic Languages.” In Turkish Natural Language Processing, edited by Kemal Oflazer and Murat Saraçlar, 237–254. Cham, Switzerland: Springer International.
  • Tantuğ, A. Cüneyd, Kemal Oflazer, and İlknur Durgar El-Kahlout. 2008. “BLEU+: A Tool for Fine-Grained BLEU Computation.” In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), 1493–1499. http://www.lrec-conf.org/proceedings/lrec2008/pdf/382_paper.pdf.
  • Tiedemann, Jörg. 2012. “Parallel Data, Tools and Interfaces in OPUS.” In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), 2214–2218. http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf.
  • Tiedemann, Jörg, and Lars Nygaard. 2004. “The OPUS Corpus - Parallel and Free: http://logos.uio.no/opus.” In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), 1183–1186. http://www.lrec-conf.org/proceedings/lrec2004/pdf/320.pdf.
  • Tyers, Francis Morton, and Murat Serdar Alperen. 2010. “South-East European Times: A Parallel Corpus of Balkan Languages.” In Proceedings of the LREC’10 Workshop on Exploitation of Multilingual Resources and Tools for Central and (South-) Eastern European Languages, 49–53. http://www.lrec-conf.org/proceedings/lrec2010/workshops/W22.pdf.

Statistical Machine Translation Customization between Turkish and 11 Languages

Year 2020, Volume 3, Issue 1, 98 - 121, 30.06.2020
https://doi.org/10.29228/transLogos.23

Abstract

Statistical Machine Translation (SMT) has been the dominant corpus-based machine translation (MT) approach in the last twenty years. While SMT has been studied in detail among European languages, it has not been studied sufficiently in language pairs including Turkish as source or target language, and its study has been limited mostly to English ↔ Turkish language pair. This study aims to broaden the perspective on Turkish corpus-based MT studies by training MT engines between Turkish and a wide variety of languages with different features. It surveys customized SMT between Turkish and 11 different languages. Twenty-two SMT engines have been trained in KantanMT with open parallel corpora using Turkish as both source and target language. Three automatic evaluation metrics F-Measure, BLEU, and TER have been used for evaluating MT quality. Due to the variations in the corpus quality and size, highly varying results have been achieved. While Turkish ↔ Catalan engines have had the highest automatic evaluation scores, Turkish ↔ Arabic engines have had the lowest automatic scores. While the quality results are highly varying across languages, we obtain baseline scores for a wide variety of languages coupled with Turkish. These results may provide a reference point for evaluating future MT systems including Turkish.

References

  • Bektaş, Emre, Ertuğrul Yılmaz, Coşkun Mermer, and İlknur Durgar El-Kahlout. 2016. “TÜBİTAK SMT System Submissions for WMT 2016.” In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, 246–251. doi:10.18653/v1/W16-2305.
  • Castilho, Sheila, Joss Moorkens, Federico Gaspari, Rico Sennrich, Andy Way, and Panayota Georgakopoulou. 2018. “Evaluating MT for Massive Open Online Courses: A Multifaceted Comparison between PBSMT and NMT Systems.” Machine Translation, no. 32, 255–278. doi:10.1007/s10590-018-9221-y.
  • Durgar El-Kahlout, İlknur, and Kemal Oflazer. 2006. “Initial Explorations in English to Turkish Statistical Machine Translation.” In HLT-NAACL 06 Statistical Machine Translation Proceedings of the Workshop, 8–9 June 2006, New York City, USA, 7–14. Madison, WI: Omnipress. https://www.aclweb.org/anthology/W06-3102.pdf.
  • Durgar El-Kahlout, İlknur, and Kemal Oflazer. 2010. “Exploiting Morphology and Local Word Reordering in English to Turkish Phrase-Based Statistical Machine Translation.” IEEE Transactions on Audio, Speech and Language Processing 18 (6): 1313–1322. doi:10.1109/TASL.2009.2033321.
  • Hearne, Mary, and Andy Way. 2011. “Statistical Machine Translation: A Guide for Linguists and Translators.” Language and Linguistics Compass 5 (5): 205–226. doi:10.1111/j.1749-818X.2011.00274.x.
  • Koehn, Philipp. 2005. “Europarl: A Parallel Corpus for Statistical Machine Translation.” In The Tenth Machine Translation Summit: Proceedings of Conference, September 12–16, 2005, Phuket, Thailand, 79–86. http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf.
  • Koehn, Philipp. 2009. Statistical Machine Translation. Cambridge: Cambridge University Press.
  • Koehn, Philipp, and Christof Monz. 2006. “Manual and Automatic Evaluation of Machine Translation between European Languages.” In HLT-NAACL 06 Statistical Machine Translation: Proceedings of the Workshop, 8–9 June 2006, New York City, USA, 102–121. Madison, WI: Omnipress. http://www.statmt.org/wmt06/proceedings/pdf/WMT14.pdf.
  • Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, et al. 2007. “Moses: Open Source Toolkit for Statistical Machine Translation.” In ACL 2007: Proceedings of the Interactive Poster and Demonstration Sessions, June 25–27, 2007, Prague, Czech Republic, 177–180. Madison, WI: Omnipress. https://www.aclweb.org/anthology/P07-2045.pdf.
  • Lumeras, Maite Aragonés, and Andy Way. 2017. “On the Complementarity between Human Translators and Machine Translation.” HERMES, no. 56, 21–42. doi:10.7146/hjlcb.v0i56.97200.
  • Oflazer, Kemal, and İlknur Durgar El-Kahlout. 2007. “Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation.” In ACL 2007: Proceedings of the Second Workshop on Statistical Machine Translation, June 23, 2007, Prague, Czech Republic, 25–32. Madison, WI: Omnipress. https://www.aclweb.org/anthology/W07-0704.pdf.
  • Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. “Bleu: A Method for Automatic Evaluation of Machine Translation.” In ACL 2002: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 7–12 July 2002, Philadelphia, USA, 311–318. Association for Computational Linguistics. doi:10.3115/1073083.1073135.
  • Tantuğ, A. Cüneyd, and Eşref Adalı. 2018. “Machine Translation between Turkic Languages.” In Turkish Natural Language Processing, edited by Kemal Oflazer and Murat Saraçlar, 237–254. Cham, Switzerland: Springer International.
  • Tantuğ, A. Cüneyd, Kemal Oflazer, and İlknur Durgar El-Kahlout. 2008. “BLEU+: A Tool for Fine-Grained BLEU Computation.” In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), 1493–1499. http://www.lrec-conf.org/proceedings/lrec2008/pdf/382_paper.pdf.
  • Tiedemann, Jörg. 2012. “Parallel Data, Tools and Interfaces in OPUS.” In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), 2214–2218. http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf.
  • Tiedemann, Jörg, and Lars Nygaard. 2004. “The OPUS Corpus - Parallel and Free: http://logos.uio.no/opus.” In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), 1183–1186. http://www.lrec-conf.org/proceedings/lrec2004/pdf/320.pdf.
  • Tyers, Francis Morton, and Murat Serdar Alperen. 2010. “South-East European Times: A Parallel Corpus of Balkan Languages.” In Proceedings of the LREC’10 Workshop on Exploitation of Multilingual Resources and Tools for Central and (South-) Eastern European Languages, 49–53. http://www.lrec-conf.org/proceedings/lrec2010/workshops/W22.pdf.

Details

Primary Language English
Subjects Language and Linguistics
Journal Section Articles
Authors

Gökhan DOĞRU This is me
Universitat Autònoma de Barcelona
0000-0001-7141-2350
Spain

Publication Date June 30, 2020
Published in Issue Year 2020, Volume 3, Issue 1

Cite

Bibtex @research article { translogos761548, journal = {transLogos Translation Studies Journal}, issn = {}, eissn = {2667-4629}, address = {}, publisher = {Diye Yayınları}, year = {2020}, volume = {3}, pages = {98 - 121}, doi = {10.29228/transLogos.23}, title = {Statistical Machine Translation Customization between Turkish and 11 Languages}, key = {cite}, author = {Doğru, Gökhan} }
APA Doğru, G. (2020). Statistical Machine Translation Customization between Turkish and 11 Languages . transLogos Translation Studies Journal , 3 (1) , 98-121 . DOI: 10.29228/transLogos.23
MLA Doğru, G. "Statistical Machine Translation Customization between Turkish and 11 Languages" . transLogos Translation Studies Journal 3 (2020 ): 98-121 <https://dergipark.org.tr/en/pub/translogos/issue/55550/761548>
Chicago Doğru, G. "Statistical Machine Translation Customization between Turkish and 11 Languages". transLogos Translation Studies Journal 3 (2020 ): 98-121
RIS TY - JOUR T1 - Statistical Machine Translation Customization between Turkish and 11 Languages AU - Gökhan Doğru Y1 - 2020 PY - 2020 N1 - doi: 10.29228/transLogos.23 DO - 10.29228/transLogos.23 T2 - transLogos Translation Studies Journal JF - Journal JO - JOR SP - 98 EP - 121 VL - 3 IS - 1 SN - -2667-4629 M3 - doi: 10.29228/transLogos.23 UR - https://doi.org/10.29228/transLogos.23 Y2 - 2020 ER -
EndNote %0 transLogos Translation Studies Journal Statistical Machine Translation Customization between Turkish and 11 Languages %A Gökhan Doğru %T Statistical Machine Translation Customization between Turkish and 11 Languages %D 2020 %J transLogos Translation Studies Journal %P -2667-4629 %V 3 %N 1 %R doi: 10.29228/transLogos.23 %U 10.29228/transLogos.23
ISNAD Doğru, Gökhan . "Statistical Machine Translation Customization between Turkish and 11 Languages". transLogos Translation Studies Journal 3 / 1 (June 2020): 98-121 . https://doi.org/10.29228/transLogos.23
AMA Doğru G. Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal. 2020; 3(1): 98-121.
Vancouver Doğru G. Statistical Machine Translation Customization between Turkish and 11 Languages. transLogos Translation Studies Journal. 2020; 3(1): 98-121.
IEEE G. Doğru , "Statistical Machine Translation Customization between Turkish and 11 Languages", transLogos Translation Studies Journal, vol. 3, no. 1, pp. 98-121, Jun. 2020, doi:10.29228/transLogos.23