Systematic Reviews and Meta Analysis
BibTex RIS Cite

Yapay Zekâ Modellerinin Türkçe Dilindeki Yeterliliklerinin Ölçülmesi: Yükseköğretim Kurumları Alan Yeterlilik Sınavı (AYT) Soruları ile Bir Değerlendirme

Year 2025, Volume: 8 Issue: 2, 24 - 37, 24.12.2025

Abstract

Bu çalışmada, yapay zekâ modellerinin Türkçe dilindeki yeterliliklerini ölçmek için bir yöntem geliştirilmiş ve uygulanmıştır. Yükseköğretim Kurumları, Alan Yeterlilik Sınavı (AYT) kapsamında öğrencilere sorulan 24 Türk Dili ve Edebiyatı sorusu, seçilen yapay zekâ modellerine sorularak bu modellerin Türkçe bağlam anlama, dil bilgisi ve metin yorumlama yetenekleri değerlendirilmiştir. Çalışmada Open AI, Google AI, Anthropic, xAI, Mistral AI, Microsoft, DeepSeek AI, Moonshot AI, MiniMax ve Alibaba Cloud isimli geliştiricilere ait 26 adet güncel model güncel modeler analiz edilmiştir. Bu modeller, doğal dil işleme alanında en büyük ve en gelişmiş dil modelleri arasında yer almakta olup çok dilli görevlerdeki yüksek performanslarıyla tanınan modellerdir.
Çalışmada, geniş bir model yelpazesi üzerinde Alan Yeterlilik Sınavı (AYT) gibi yüksek standartlı bir ölçüm aracı kullanılarak, yapay zekâ modellerinin Türkçe dilindeki yeterlilikleri analiz edilmiştir. AYT, öğrencilerin Türkçe dil becerilerini detaylı şekilde ölçen bir sınav olup, yapay zekâ modellerinin performanslarını gerçek dünya standartlarında değerlendirmek için ideal bir araç sunmaktadır.
Çalışmada modellerin AYT sınavında gösterdikleri performanslar verdikleri doğru cevap oranı ile hesaplanarak, Türkçe dili açısından yeterlilik ve uygunlukları ölçülmüştür. Doğru cevap sayısının yüksek olduğu modellerin, Türkçe diline daha uygun oldukları ve bağlamı anlama, dil bilgisi ve metin yorumlama kapasiteleri konusunda daha başarılı oldukları tespit edilmiştir. Bu modellerin eğitim verisinin geniş kapsamlı olduğu, özelleştirilebilirlik potansiyellerinin olduğu, model ölçeği ve token kapasitesi ile Türkçe gibi yerel dillerdeki eğitim verisi oranı ile yanıt doğruluğunu etkilediği görülmüş, en yüksek doğru cevap oranları DeepSeek AI modellerinde tespit edilmiştir.
Çalışmada, Türkçe dilinde yapay zekâ modellerinin performansları arasında çok belirgin farklılıklar olması itibariyle eğitim veri setlerinde daha fazla Türkçe içerik olması, Türkçe’ nin eklemeli ve serbest söz dizim özelliklerine özgün modeler geliştirilmesi ve bağlam kapasitesinin artırılabilmesi için Türkçe odaklı veri setlerinin geliştirilmesi önerilmiştir.

References

  • Referans1 Anthropic. “Claude 3.5 Sonnet model overview”. https://www.anthropic.com/news/claude-3-5-sonnet, 2024.
  • Referans2 Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I. ve Amodei, D. “Language models are few-shot learners”. Advances in Neural Information Processing Systems, 33, 1877-1901. https://doi.org/10.48550/arXiv.2005.14165, 2020.
  • Referans3 Grok AI. Grok 2 model overview. https://x.ai/grok, 2024.
  • Referans4 Mistral AI. “Mistral model card”. https://www.mistral.ai/models, 2024.
  • Referans5 OpenAI. “GPT-4 technical report”. https://openai.com/index/gpt-4-research/, 2023.
  • Referans6 Techtarget. “OpenAI o1 Explained Everything you need to know”. https://www.techtarget.com/whatis/feature/OpenAI-o1-explained-Everything-you-need-to-know, 2024.
  • Referans7 Seo AI. “ChatGPT Language List How Many Languages Does ChatGPT Support”. https://seo.ai/blog/how-many-languages-does-chatgpt-support, 2023.
  • Referans8 Open AI. “OpenAI Fine-Tuning Guide Explore developer resources tutorials API docs”. https://platform.openai.com/docs/guides/fine-tuning, 2023.
  • Referans9 Open AI. “OpenAI Training Data Better language models and their implications”. https://openai.com/research/better-language-models, 2019.
  • Referans10 Google AI. “Gemini Models Learn about Google' s most advanced AI models”. https://ai.google.dev/gemini-api/docs/models, 2024.
  • Referans11 Google AI. “Gemini Fine-Tuning Tune Gemini models by using supervised fine-tuning”. https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-use-supervised-tuning, 2024.
  • Referans12 Google AI. “Gemini 2.0 Benchmarks Gemini model updates February 2025”. https://blog.google/technology/google-deepmind/gemini-model-updates-february-2025/, 2025.
  • Referans13 Google AI. “Gemini 2.0 Pro Gemini 2.0 Generative AI Google Cloud”. https://cloud.google.com/vertex-ai/generative-ai/docs/models, 2025.
  • Referans14 Google AI. “Gemini 2.0 Flash Thinking Gemini 2.0 Flash Thinking Generative AI Google Cloud”. https://cloud.google.com/vertex-ai/generative-ai/docs/thinking, 2025.
  • Referans15 Anthropic. “Claude 3.7 Sonnet Today we’re announcing Claude 3.7 Sonnet”. https://www.anthropic.com/news/claude-3-7-sonnet, 2025.
  • Referans16 Anthropic. “Claude Language Support Anthropic is an AI safety and research company”. https://www.anthropic.com/claude/sonnet, 2024.
  • Referans17 Amazon AWS. “Anthropic API Anthropic’ s Claude 3.7 Sonnet hybrid reasoning model”. https://aws.amazon.com/blogs/aws/anthropics-claude-3-7-sonnet-the-first-hybrid-reasoning-model-is-now-available-in-amazon-bedrock/, 2025.
  • Referans18 xAI. “Grok 3 Overview We are thrilled to unveil an early preview of Grok 3”. https://x.ai/news/grok-3, 2025.
  • Referans19 xAI. “Grok API Website & API are live now Try DeepThink at chat.deepseek.com today”. https://api-docs.deepseek.com/news/news250120, 2025.
  • Referans20 Mistral AI. “Large Enough”. https://mistral.ai/en/news/mistral-large-2407, 2024.
  • Referans21 Mistral AI. “Models Overview”. https://docs.mistral.ai/getting-started/models/models_overview/, 2024.
  • Referans22 Mistral AI. “Changelog”. https://docs.mistral.ai/getting-started/changelog/, 2024.
  • Referans23 Microsoft. “Microsoft Copilot”. https://en.wikipedia.org/wiki/Microsoft_Copilot, 2023.
  • Referans24 Fireworks AI. “DeepSeek R1: All you need to know”. https://fireworks.ai/blog/deepseek-r1-deepdive, 2025.
  • Referans25 DeepSeek AI. “DeepSeek-R1 Release”. https://api-docs.deepseek.com/news/news250120, 2025.
  • Referans26 Testingcatalog. “Moonshot AI launches Kimi k1.5 with free real-time search and file analysis”. https://www.testingcatalog.com/moonshot-ai-launches-kimi-k1-5-with-free-real-time-search-and-file-analysis/, 2025.
  • Referans27 GitHub. “Kimi k1.5: Scaling Reinforcement Learning with LLMs”. https://github.com/MoonshotAI/Kimi-k1.5, 2025.
  • Referans28 Indianexpress. “After DeepSeek-R1, Kimi k1.5 model by Chinese startup Moonshot AI outshines OpenAI-o1”. https://indianexpress.com/article/technology/artificial-intelligence/deepseek-r1-kimi-k1-5-model-by-chinese-openai-o1-9804116/, 2025.
  • Referans29 Analyticsvidhya. “Kimi k1.5 vs DeepSeek R1: Battle of the Best Chinese LLMs”. https://www.analyticsvidhya.com/blog/2025/01/kimi-k1-5-vs-deepseek-r1/, 2025.
  • Referans30 Huggingface. “MiniMax-Text-01”. https://huggingface.co/MiniMaxAI/MiniMax-Text-01, 2024.
  • Referans31 Artificialanalysis. “MiniMax-Text-01: Intelligence, Performance & Price Analysis”. https://artificialanalysis.ai/models/minimax-text-01, 2024.
  • Referans32 GitHub. “MiniMax-01”. https://github.com/MiniMax-AI/MiniMax-01, 2024.
  • Referans33 Qwenlm. “Qwen2 Technical Report”. https://huggingface.co/papers/2407.10671, 2024.
  • Referans34 Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L. ve Stoyanov, V. “Unsupervised Cross-lingual Representation Learning at Scale”. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 8440–8451. https://arxiv.org/abs/1911.02116, 2020.
  • Referans35 Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W. ve Liu, P. J. “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”. Journal of Machine Learning Research, 21(140), 1–67. https://arxiv.org/abs/1910.10683, 2020.
  • Referans36 Devlin, J., Chang, M.-W., Lee, K. ve Toutanova, K. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. https://arxiv.org/abs/1810.04805, 2018.
  • Referans37 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. ve Polosukhin, I. “Attention is All You Need”. Advances in Neural Information Processing Systems, 30, 5998–6008. https://arxiv.org/abs/1706.03762, 2017.
  • Referans38 Howard, J. ve Ruder, S. “Universal Language Model Fine-tuning for Text Classification”. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 328–339. https://arxiv.org/abs/1801.06146, 2018.

Measuring the Proficiency of Artificial Intelligence Models in Turkish Language: An Evaluation with Higher Education Institutions Field Proficiency Examination (AYT) Questions

Year 2025, Volume: 8 Issue: 2, 24 - 37, 24.12.2025

Abstract

In this study, a method was developed and implemented to measure the proficiency of artificial intelligence models in Turkish language. Higher Education Institutions, Field Proficiency Examination (AYT), 24 Turkish Language and Literature questions were asked to selected artificial intelligence models and their Turkish context comprehension, grammar and text interpretation abilities were evaluated. In the study, 26 current models from Open AI, Google AI, Anthropic, xAI, Mistral AI, Microsoft, DeepSeek AI, Moonshot AI, MiniMax and Alibaba Cloud were analyzed. These models are among the largest and most advanced language models in the field of natural language processing and are known for their high performance in multilingual tasks.
In this study, the Turkish language proficiency of artificial intelligence models is analyzed on a wide range of models using a high standard measurement tool such as the Field Proficiency Test (AYT). The AYT is an exam that measures students’ Turkish language skills in detail and provides an ideal tool to evaluate the performance of AI models at real-world standards.
In the study, the performance of the models in the AYT exam was measured by the correct answer rate and their competence and suitability in terms of Turkish language were measured. Models with a higher number of correct answers were found to be more suitable for Turkish language and more successful in terms of context understanding, grammar and text interpretation capacities. It was observed that the training data of these models is comprehensive, they have the potential for customizability, model scale and token capacity and the ratio of training data in local languages such as Turkish affect the response accuracy, and the highest correct answer rates were found in DeepSeek AI models.
In the study, since there are significant differences between the performances of AI models in Turkish language, it is suggested to include more Turkish content in the training data sets, to develop models specific to Turkish’ s agglutinative and free syntax features, and to develop Turkish-oriented data sets in order to increase the context capacity.

References

  • Referans1 Anthropic. “Claude 3.5 Sonnet model overview”. https://www.anthropic.com/news/claude-3-5-sonnet, 2024.
  • Referans2 Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I. ve Amodei, D. “Language models are few-shot learners”. Advances in Neural Information Processing Systems, 33, 1877-1901. https://doi.org/10.48550/arXiv.2005.14165, 2020.
  • Referans3 Grok AI. Grok 2 model overview. https://x.ai/grok, 2024.
  • Referans4 Mistral AI. “Mistral model card”. https://www.mistral.ai/models, 2024.
  • Referans5 OpenAI. “GPT-4 technical report”. https://openai.com/index/gpt-4-research/, 2023.
  • Referans6 Techtarget. “OpenAI o1 Explained Everything you need to know”. https://www.techtarget.com/whatis/feature/OpenAI-o1-explained-Everything-you-need-to-know, 2024.
  • Referans7 Seo AI. “ChatGPT Language List How Many Languages Does ChatGPT Support”. https://seo.ai/blog/how-many-languages-does-chatgpt-support, 2023.
  • Referans8 Open AI. “OpenAI Fine-Tuning Guide Explore developer resources tutorials API docs”. https://platform.openai.com/docs/guides/fine-tuning, 2023.
  • Referans9 Open AI. “OpenAI Training Data Better language models and their implications”. https://openai.com/research/better-language-models, 2019.
  • Referans10 Google AI. “Gemini Models Learn about Google' s most advanced AI models”. https://ai.google.dev/gemini-api/docs/models, 2024.
  • Referans11 Google AI. “Gemini Fine-Tuning Tune Gemini models by using supervised fine-tuning”. https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-use-supervised-tuning, 2024.
  • Referans12 Google AI. “Gemini 2.0 Benchmarks Gemini model updates February 2025”. https://blog.google/technology/google-deepmind/gemini-model-updates-february-2025/, 2025.
  • Referans13 Google AI. “Gemini 2.0 Pro Gemini 2.0 Generative AI Google Cloud”. https://cloud.google.com/vertex-ai/generative-ai/docs/models, 2025.
  • Referans14 Google AI. “Gemini 2.0 Flash Thinking Gemini 2.0 Flash Thinking Generative AI Google Cloud”. https://cloud.google.com/vertex-ai/generative-ai/docs/thinking, 2025.
  • Referans15 Anthropic. “Claude 3.7 Sonnet Today we’re announcing Claude 3.7 Sonnet”. https://www.anthropic.com/news/claude-3-7-sonnet, 2025.
  • Referans16 Anthropic. “Claude Language Support Anthropic is an AI safety and research company”. https://www.anthropic.com/claude/sonnet, 2024.
  • Referans17 Amazon AWS. “Anthropic API Anthropic’ s Claude 3.7 Sonnet hybrid reasoning model”. https://aws.amazon.com/blogs/aws/anthropics-claude-3-7-sonnet-the-first-hybrid-reasoning-model-is-now-available-in-amazon-bedrock/, 2025.
  • Referans18 xAI. “Grok 3 Overview We are thrilled to unveil an early preview of Grok 3”. https://x.ai/news/grok-3, 2025.
  • Referans19 xAI. “Grok API Website & API are live now Try DeepThink at chat.deepseek.com today”. https://api-docs.deepseek.com/news/news250120, 2025.
  • Referans20 Mistral AI. “Large Enough”. https://mistral.ai/en/news/mistral-large-2407, 2024.
  • Referans21 Mistral AI. “Models Overview”. https://docs.mistral.ai/getting-started/models/models_overview/, 2024.
  • Referans22 Mistral AI. “Changelog”. https://docs.mistral.ai/getting-started/changelog/, 2024.
  • Referans23 Microsoft. “Microsoft Copilot”. https://en.wikipedia.org/wiki/Microsoft_Copilot, 2023.
  • Referans24 Fireworks AI. “DeepSeek R1: All you need to know”. https://fireworks.ai/blog/deepseek-r1-deepdive, 2025.
  • Referans25 DeepSeek AI. “DeepSeek-R1 Release”. https://api-docs.deepseek.com/news/news250120, 2025.
  • Referans26 Testingcatalog. “Moonshot AI launches Kimi k1.5 with free real-time search and file analysis”. https://www.testingcatalog.com/moonshot-ai-launches-kimi-k1-5-with-free-real-time-search-and-file-analysis/, 2025.
  • Referans27 GitHub. “Kimi k1.5: Scaling Reinforcement Learning with LLMs”. https://github.com/MoonshotAI/Kimi-k1.5, 2025.
  • Referans28 Indianexpress. “After DeepSeek-R1, Kimi k1.5 model by Chinese startup Moonshot AI outshines OpenAI-o1”. https://indianexpress.com/article/technology/artificial-intelligence/deepseek-r1-kimi-k1-5-model-by-chinese-openai-o1-9804116/, 2025.
  • Referans29 Analyticsvidhya. “Kimi k1.5 vs DeepSeek R1: Battle of the Best Chinese LLMs”. https://www.analyticsvidhya.com/blog/2025/01/kimi-k1-5-vs-deepseek-r1/, 2025.
  • Referans30 Huggingface. “MiniMax-Text-01”. https://huggingface.co/MiniMaxAI/MiniMax-Text-01, 2024.
  • Referans31 Artificialanalysis. “MiniMax-Text-01: Intelligence, Performance & Price Analysis”. https://artificialanalysis.ai/models/minimax-text-01, 2024.
  • Referans32 GitHub. “MiniMax-01”. https://github.com/MiniMax-AI/MiniMax-01, 2024.
  • Referans33 Qwenlm. “Qwen2 Technical Report”. https://huggingface.co/papers/2407.10671, 2024.
  • Referans34 Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L. ve Stoyanov, V. “Unsupervised Cross-lingual Representation Learning at Scale”. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 8440–8451. https://arxiv.org/abs/1911.02116, 2020.
  • Referans35 Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W. ve Liu, P. J. “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”. Journal of Machine Learning Research, 21(140), 1–67. https://arxiv.org/abs/1910.10683, 2020.
  • Referans36 Devlin, J., Chang, M.-W., Lee, K. ve Toutanova, K. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. https://arxiv.org/abs/1810.04805, 2018.
  • Referans37 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. ve Polosukhin, I. “Attention is All You Need”. Advances in Neural Information Processing Systems, 30, 5998–6008. https://arxiv.org/abs/1706.03762, 2017.
  • Referans38 Howard, J. ve Ruder, S. “Universal Language Model Fine-tuning for Text Classification”. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 328–339. https://arxiv.org/abs/1801.06146, 2018.
There are 38 citations in total.

Details

Primary Language Turkish
Subjects Knowledge Representation and Reasoning, Natural Language Processing
Journal Section Systematic Reviews and Meta Analysis
Authors

Berker Kılıç 0000-0001-8751-8192

Esra Kılıç 0000-0003-3009-9462

Umur Kaan Kaçan 0009-0002-6746-8305

Submission Date March 22, 2025
Acceptance Date September 28, 2025
Publication Date December 24, 2025
Published in Issue Year 2025 Volume: 8 Issue: 2

Cite

APA Kılıç, B., Kılıç, E., & Kaçan, U. K. (2025). Yapay Zekâ Modellerinin Türkçe Dilindeki Yeterliliklerinin Ölçülmesi: Yükseköğretim Kurumları Alan Yeterlilik Sınavı (AYT) Soruları ile Bir Değerlendirme. Veri Bilimi, 8(2), 24-37.