Research Article
BibTex RIS Cite

CHATGBT, DEEPL VE GOOGLE ÇEVİRİ: FARKLI METİN TÜRLERİNDE ÇEVİRİ KALİTESİ DEĞERLENDİRMESİ

Year 2024, Volume: 9 Issue: 15, 120 - 173, 29.06.2024

Abstract

Dil çeşitliliği, çok çeşitli kültürleri ve deneyimleri temsil etmesi bakımından bir zenginlik olarak değerlendirilebilir. Bununla birlikte, bu çeşitliliğin özellikle farklı bir dil konuşan bireylerle iletişim kurarken zaman zaman bir engel teşkil edebileceği de yadsınamaz bir gerçektir. Ancak, makine çevirisi (MACHINE TRANSLATION) sayesinde dil engellerinin etkisi azaltılabilir. MT sayesinde bilgi hızlı bir şekilde anlaşılabilir, fikirler başarılı bir şekilde iletebilir ve farklı kültürlerden diğer kişilerle bağlantı kurulabilir. Bu doğrultuda Google MT ve DeepL günümüzde kullanılan en popüler çeviri araçları arasındadır. Bunlar dışında çok sayıda başka araçlar da bulunmaktadır. Son aylarda ise ChatGPT çeviri aracı olarak öne çıkan uygulamalar arasında değerlendirilmektedir. ChatGPT modern yapay zekanın adıdır ve giderek yaygınlaşmaktadır. OpenAI'nin Kasım 2022'de ChatGPT'yi piyasaya sürmesinden bu yana, yapay zekanın birçok çalışanın işini elinden alacağı endişesi yaygınlaşmaktadır. “ChatGPT iyi bir çevirmen mi?” sorusu çeviri alanında sıklıkla sorulan bir soru olarak değerlendirilmektedir. ChatGPT'nin, diğer makine öğrenimi modelleri gibi, bağlama dayalı olarak çok daha doğru çeviriler ürettiği iddia edilmektedir. Bu açıdan ele alındığında, mevcut literatür bulgularına dayanarak, ChatGPT'nin etkileyici bir şekilde yapabildiği şeylerden biri metin çevirisi olması nedeniyle farklı metin türlerinde Google MT ve DeepL ile nasıl bir performans sergileyeceği araştırılmaya değer bir konu olarak değerlendirilebilir. Bu araştırmada söz konusu bu çeviri araçlarını karşılaştırmak için, Katharina Reiss'ın yaygın çeviri sorunlarını vurgulayan metin türü modeli referans alınmıştır. Reiss’a göre, iletişimsel işlevlerine göre üç metin türü bulunmaktadır: bilgilendirici metinler, anlatımcı (dışavurumsal) metinler ve işlevsel metinler. Buna göre bu araştırmanın amacı eğitim, sağlık ve hukuk alanlarından metinlerinin insan çevirisi, Google MT çevirisi, DeepL çevirisi ve ChatGPT çevirisi arasında karşılaştırmalar yapmak ve buna göre bazı çıkarımlarda bulunmaktadır. Bu araştırma nitel bir çalışmadır. Doküman analizine dayalı olan bu araştırmada, ChatGPT, DeepL, Google MT insan çevirmen tarafından yapılan çeviriler Çok Boyutlu Kalite Ölçütleri (ÇBKÖ) modeline göre değerlendirilmiştir. Elde edilen bulguların, makine çevirisiyle ilgilenen araştırmacılarının yanı sıra bu teknolojilerin kullanıcıları için de faydalı olması beklenmektedir.

References

  • Agung, I. G. A. M., Budiartha, P. G., & Suryani, N. W. (2024, January). Translation performance of Google Translate and DeepL in translating Indonesian short stories into English. In Proceedings: Linguistics, Literature, Culture and Arts International Seminar (LITERATES) (pp. 178-185).
  • Ali, G., Ali, N., Syed, K. (2023). Understanding shifting paradigms of translation studies in 21st century.
  • Almahasees, Z. (2021). Analyzing English-Arabic machine translation: Google Translate, Microsoft Translator and Sakhr. Routledge.
  • Bahdanau, D., Cho, K., Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
  • Banerjee, S., Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (pp. 65-72).
  • Bansal, R., Samanta, B., Dalmia, S., Gupta, N., Vashishth, S., Ganapathy, S., ..., Talukdar, P. (2024). LLM Augmented LLMs: Expanding Capabilities through Composition. arXiv preprint arXiv:2401.02412.
  • Blain, F., Senellart, J., Schwenk, H., Plitt, M., Roturier, J. (2011). Qualitative analysis of post-editing for high quality machine translation. In Proceedings of Machine Translation Summit XIII: Papers.
  • Bloomberg, L. D., Volpe, M. (2008). Completing your qualitative dissertation: A roadmap from beginning to end. London: SAGE.
  • Bowker, L. (2023). De-mystifying translation: Introducing translation to non-translators (p. 217). Taylor & Francis.
  • Brown, P. F., Cocke, J., Della Pietra, S. A., Della Pietra, V. J., Jelinek, F., Lafferty, J. D., ..., Mercer, R. L. (1990). A statistical approach to machine translation. Computational Linguistics, 16(2), p.79-85.
  • Callison-Burch, C., Osborne, M., Koehn, P. (2006). Re-evaluating the role of BLEU in machine translation research. In Proceedings of the workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization at the 2006 conference of the North American chapter of the association for computational linguistics (pp. 29-36).
  • Comelles, E., Arranz, V., Castellón, I. (2017). Guiding automatic MT evaluation by means of linguistic features. Digital Scholarship in the Humanities, 32(4), p.761-778.
  • Costa, A., Ling, W., Luis, T., Correia, R., Coheur, L. (2015). A Linguistically Motivated Taxonomy for Machine Translation Error Analysis. Machine Translation, 29(2), p.127–161.
  • Girletti, S., Lefer, M. A. (2024). Introducing MTPE pricing in translator training: a concrete proposal for MT instructors. The Interpreter and Translator Trainer, p.1-18.
  • Huang, X., Zhang, Z., Geng, X., Du, Y., Chen, J., Huang, S. (2024). Lost in the source language: How large language models evaluate the quality of machine translation. arXiv preprint arXiv:2401.06568.
  • Hutchins, W. J. (2003). Machine translation: Past, present, future. Research Studies Press Ltd. Karabayeva, I.,Kalizhanova, A. (2024). Evaluating machine translation of literature through rhetorical analysis. Journal of Translation and Language Studies, 5(1), p.1-9.
  • Khanna, R. R., Karliner, L. S., Eck, M., Vittinghoff, E., Koenig, C. J., Fang, M. C. (2011). Performance of an online translation tool when applied to patient educational material. Journal of Hospital Medicine, 6(9), p.519-525.
  • Khoong, E. C., Steinbrook, E., Brown, C., Fernandez, A. (2019). Assessing the use of Google Translate for Spanish and Chinese translations of emergency department discharge instructions. JAMA Internal Medicine, 179(4), p.580-582.
  • Khoshafah, F. (2023). ChatGPT for Arabic-English translation: Evaluating the accuracy. Research Square.
  • Koehn, P. (2010). Statistical machine translation. Cambridge University Press.
  • Kunchukuttan, A., Bhattacharyya, P. (2021). Machine translation and transliteration involving related, low-resource languages. CRC Press.
  • Lauscher, S. (2000). Assessing the quality of translations: a practical guide for users. Manchester: St. Jerome Publishing.
  • Li, B., Weng, Y., Xia, F., Deng, H. (2024). Towards better Chinese-centric neural machine translation for low-resource languages. Computer Speech & Language, 84, 101566.
  • Li, J., Dada, A., Puladi, B., Kleesiek, J., Egger, J. (2024). ChatGPT in healthcare: a taxonomy and systematic review. Computer Methods and Programs in Biomedicine, 108013.
  • Lommel, A. (2018). Metrics for translation quality assessment: a case for standardizing error typologies. Translation quality assessment: From principles to practice, p.109-127.
  • Lommel, A. R., Uszkoreit, H. Burchardt, A. (2014). Multidimensional Quality Metrics (MQM): A Framework for Declaring and Describing Translation Quality Metrics. Revista Tradumàtica: Traducció i Tecnologies de la Informació i la Comunicació, 12, p.455–463.
  • Mellinger, C. D., Hanson, T. A. (2016). Quantitative research methods in translation and interpreting studies. Taylor & Francis.
  • Papineni, K., Roukos, S., Ward, T., Zhu, W. J. (2002). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311-318).
  • Popović, M. (2018). Error classification and analysis for machine translation quality assessment. In Translation quality assessment (pp. 129-158). Springer, Cham.
  • Qian, M. (2023). Performance evaluation on human-machine teaming augmented machine translation enabled by GPT-4. In Proceedings of the First Workshop on NLP Tools and Resources for Translation and Interpreting Applications (pp. 20-31).
  • Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems.
  • Reiss, K. (1971). Möglichkeiten und grenzen der übersetzungskritik: Kategorien und kriterien für eine sachgerechte beurteilung von übersetzungen. O. Schwarz.
  • Rusadi, A. M., Setiajid, H. H. (2023). Evaluating the accuracy of google translate and chatgpt in translating windows 11 education installation guı texts to indonesian: an application of koponen’s error category. In English Language and Literature International Conference (ELLiC) Proceedings (pp. 698-713).
  • Sahari, Y., Al-Kadi, A. M. T., Ali, J. K. M. (2023). A Cross Sectional study of ChatGPT in translation: magnitude of use, attitudes, and uncertainties. Journal of Psycholinguistic Research, p.1-18.
  • Seidman, I. (2006). Interviewing as qualitative research: A guide for researchers in education and the social sciences. New York, NY: Teachers College.
  • Siu, S. C. (2023). ChatGPT and GPT-4 for professional translators: Exploring the potential of large language models in translation. Available at SSRN 4448091.
  • Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas (AMTA) (pp. 223-231).
  • Snow, T. A. (2015). Establishing the viability of the multidimensional quality metrics framework. Brigham Young University.
  • Son, J., Kim, B. (2023). Translation performance from the user’s perspective of large language models and neural machine translation systems. Information, 14(10),p.574.
  • Stymne, S., Ahrenberg, L. (2012, May). On the practice of error analysis for machine translation evaluation. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12) (pp. 1785-1790).
  • Sutskever, I., Vinyals, O., Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104-3112).
  • Vilar, D., Xu, J., d’Haro, L. F., Ney, H. (2006). Error analysis of statistical machine translation output. In Proceedings of the fifth international conference on language resources and evaluation (LREC’06).
  • Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., ... Dean, J. (2016). Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.

A COMPARATIVE ANALYSIS OF THE PERFORMANCES OF CHATGPT, DEEPL, GOOGLE TRANSLATE AND A HUMAN TRANSLATOR IN COMMUNITY BASED SETTINGS

Year 2024, Volume: 9 Issue: 15, 120 - 173, 29.06.2024

Abstract

The diversity of languages is a remarkable aspect of human civilization, reflecting a wide range of cultures and life experiences. However, this diversity can sometimes pose challenges, especially during interactions with speakers of different languages. Machine translation (MT) offers a solution to minimize the impact of these linguistic barriers. MT enables swift understanding of information, effective idea exchange, and the building of relationships across varied cultural backgrounds. Prominent translation tools include Google MACHINE TRANSLATION, DeepL, Bing Microsoft Translator, and Amazon Translate. Additionally, a newer AI technology, ChatGPT by OpenAI, introduced in November 2022, has been making strides in this domain. This has sparked a debate in various industries about the potential of ChatGPT to replace human roles. A pertinent question in Translation Studies (TS) is the effectiveness of ChatGPT as a translator. It is posited that ChatGPT, akin to other machine learning models, delivers contextually richer translations. This study compares ChatGPT's translation capabilities with those of Google MT and DeepL across different text types, informed by past literature. To conduct this comparison, we selected text types that are traditionally challenging to translate, guided by Katharina Reiss' Text Type Model, which categorizes texts based on their communicative purposes: informative, expressive, and operative. This study assesses the translations of source texts on education, heathcare and law by ChatGBT, DeepL, Google MT, and a human translator, drawing certain conclusions in consideration of these categories. Our research adopts a qualitative approach, evaluating the translations using a machine translation quality model, called the Multidimensional Quality Metrics (MQM) model. The insights from this study will benefit T&I researchers interested in machine translation and the users of these technologies.Keywords: ChatGPT, DeepL, Google Translate, Artificial Intelligence, Machine Translation, Translation Quality

References

  • Agung, I. G. A. M., Budiartha, P. G., & Suryani, N. W. (2024, January). Translation performance of Google Translate and DeepL in translating Indonesian short stories into English. In Proceedings: Linguistics, Literature, Culture and Arts International Seminar (LITERATES) (pp. 178-185).
  • Ali, G., Ali, N., Syed, K. (2023). Understanding shifting paradigms of translation studies in 21st century.
  • Almahasees, Z. (2021). Analyzing English-Arabic machine translation: Google Translate, Microsoft Translator and Sakhr. Routledge.
  • Bahdanau, D., Cho, K., Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
  • Banerjee, S., Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (pp. 65-72).
  • Bansal, R., Samanta, B., Dalmia, S., Gupta, N., Vashishth, S., Ganapathy, S., ..., Talukdar, P. (2024). LLM Augmented LLMs: Expanding Capabilities through Composition. arXiv preprint arXiv:2401.02412.
  • Blain, F., Senellart, J., Schwenk, H., Plitt, M., Roturier, J. (2011). Qualitative analysis of post-editing for high quality machine translation. In Proceedings of Machine Translation Summit XIII: Papers.
  • Bloomberg, L. D., Volpe, M. (2008). Completing your qualitative dissertation: A roadmap from beginning to end. London: SAGE.
  • Bowker, L. (2023). De-mystifying translation: Introducing translation to non-translators (p. 217). Taylor & Francis.
  • Brown, P. F., Cocke, J., Della Pietra, S. A., Della Pietra, V. J., Jelinek, F., Lafferty, J. D., ..., Mercer, R. L. (1990). A statistical approach to machine translation. Computational Linguistics, 16(2), p.79-85.
  • Callison-Burch, C., Osborne, M., Koehn, P. (2006). Re-evaluating the role of BLEU in machine translation research. In Proceedings of the workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization at the 2006 conference of the North American chapter of the association for computational linguistics (pp. 29-36).
  • Comelles, E., Arranz, V., Castellón, I. (2017). Guiding automatic MT evaluation by means of linguistic features. Digital Scholarship in the Humanities, 32(4), p.761-778.
  • Costa, A., Ling, W., Luis, T., Correia, R., Coheur, L. (2015). A Linguistically Motivated Taxonomy for Machine Translation Error Analysis. Machine Translation, 29(2), p.127–161.
  • Girletti, S., Lefer, M. A. (2024). Introducing MTPE pricing in translator training: a concrete proposal for MT instructors. The Interpreter and Translator Trainer, p.1-18.
  • Huang, X., Zhang, Z., Geng, X., Du, Y., Chen, J., Huang, S. (2024). Lost in the source language: How large language models evaluate the quality of machine translation. arXiv preprint arXiv:2401.06568.
  • Hutchins, W. J. (2003). Machine translation: Past, present, future. Research Studies Press Ltd. Karabayeva, I.,Kalizhanova, A. (2024). Evaluating machine translation of literature through rhetorical analysis. Journal of Translation and Language Studies, 5(1), p.1-9.
  • Khanna, R. R., Karliner, L. S., Eck, M., Vittinghoff, E., Koenig, C. J., Fang, M. C. (2011). Performance of an online translation tool when applied to patient educational material. Journal of Hospital Medicine, 6(9), p.519-525.
  • Khoong, E. C., Steinbrook, E., Brown, C., Fernandez, A. (2019). Assessing the use of Google Translate for Spanish and Chinese translations of emergency department discharge instructions. JAMA Internal Medicine, 179(4), p.580-582.
  • Khoshafah, F. (2023). ChatGPT for Arabic-English translation: Evaluating the accuracy. Research Square.
  • Koehn, P. (2010). Statistical machine translation. Cambridge University Press.
  • Kunchukuttan, A., Bhattacharyya, P. (2021). Machine translation and transliteration involving related, low-resource languages. CRC Press.
  • Lauscher, S. (2000). Assessing the quality of translations: a practical guide for users. Manchester: St. Jerome Publishing.
  • Li, B., Weng, Y., Xia, F., Deng, H. (2024). Towards better Chinese-centric neural machine translation for low-resource languages. Computer Speech & Language, 84, 101566.
  • Li, J., Dada, A., Puladi, B., Kleesiek, J., Egger, J. (2024). ChatGPT in healthcare: a taxonomy and systematic review. Computer Methods and Programs in Biomedicine, 108013.
  • Lommel, A. (2018). Metrics for translation quality assessment: a case for standardizing error typologies. Translation quality assessment: From principles to practice, p.109-127.
  • Lommel, A. R., Uszkoreit, H. Burchardt, A. (2014). Multidimensional Quality Metrics (MQM): A Framework for Declaring and Describing Translation Quality Metrics. Revista Tradumàtica: Traducció i Tecnologies de la Informació i la Comunicació, 12, p.455–463.
  • Mellinger, C. D., Hanson, T. A. (2016). Quantitative research methods in translation and interpreting studies. Taylor & Francis.
  • Papineni, K., Roukos, S., Ward, T., Zhu, W. J. (2002). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311-318).
  • Popović, M. (2018). Error classification and analysis for machine translation quality assessment. In Translation quality assessment (pp. 129-158). Springer, Cham.
  • Qian, M. (2023). Performance evaluation on human-machine teaming augmented machine translation enabled by GPT-4. In Proceedings of the First Workshop on NLP Tools and Resources for Translation and Interpreting Applications (pp. 20-31).
  • Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems.
  • Reiss, K. (1971). Möglichkeiten und grenzen der übersetzungskritik: Kategorien und kriterien für eine sachgerechte beurteilung von übersetzungen. O. Schwarz.
  • Rusadi, A. M., Setiajid, H. H. (2023). Evaluating the accuracy of google translate and chatgpt in translating windows 11 education installation guı texts to indonesian: an application of koponen’s error category. In English Language and Literature International Conference (ELLiC) Proceedings (pp. 698-713).
  • Sahari, Y., Al-Kadi, A. M. T., Ali, J. K. M. (2023). A Cross Sectional study of ChatGPT in translation: magnitude of use, attitudes, and uncertainties. Journal of Psycholinguistic Research, p.1-18.
  • Seidman, I. (2006). Interviewing as qualitative research: A guide for researchers in education and the social sciences. New York, NY: Teachers College.
  • Siu, S. C. (2023). ChatGPT and GPT-4 for professional translators: Exploring the potential of large language models in translation. Available at SSRN 4448091.
  • Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas (AMTA) (pp. 223-231).
  • Snow, T. A. (2015). Establishing the viability of the multidimensional quality metrics framework. Brigham Young University.
  • Son, J., Kim, B. (2023). Translation performance from the user’s perspective of large language models and neural machine translation systems. Information, 14(10),p.574.
  • Stymne, S., Ahrenberg, L. (2012, May). On the practice of error analysis for machine translation evaluation. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12) (pp. 1785-1790).
  • Sutskever, I., Vinyals, O., Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104-3112).
  • Vilar, D., Xu, J., d’Haro, L. F., Ney, H. (2006). Error analysis of statistical machine translation output. In Proceedings of the fifth international conference on language resources and evaluation (LREC’06).
  • Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., ... Dean, J. (2016). Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
There are 43 citations in total.

Details

Primary Language English
Subjects Modern Turkic Languages and Literatures (Other)
Journal Section Makaleler
Authors

Özge Çetin 0000-0002-7249-8755

Ali Duran 0000-0001-6132-4066

Early Pub Date June 29, 2024
Publication Date June 29, 2024
Submission Date January 29, 2024
Acceptance Date February 8, 2024
Published in Issue Year 2024 Volume: 9 Issue: 15

Cite

APA Çetin, Ö., & Duran, A. (2024). A COMPARATIVE ANALYSIS OF THE PERFORMANCES OF CHATGPT, DEEPL, GOOGLE TRANSLATE AND A HUMAN TRANSLATOR IN COMMUNITY BASED SETTINGS. Amasya Üniversitesi Sosyal Bilimler Dergisi, 9(15), 120-173.

ISSN: (online) 2602-2567