Türkçe LLM’lerde Nicemleme Dengelemesi: Doğruluk, Hız ve Bellek Kullanımı Üzerine Bir Değerlendirme

Cengizhan Bayram; Cevdet Ahmet Turan; Ferhat Kürkçüoğlu; Volkan Altıntaş

Araştırma Makalesi

Türkçe LLM’lerde Nicemleme Dengelemesi: Doğruluk, Hız ve Bellek Kullanımı Üzerine Bir Değerlendirme

Yıl 2025, Cilt: 8 Sayı: 2, 70 - 80, 24.12.2025

Cengizhan Bayram , Cevdet Ahmet Turan , Ferhat Kürkçüoğlu , Volkan Altıntaş

Öz

Bu çalışma, Türkçe bir büyük dil modeline (Trendyol/Trendyol-LLM-7b-chat-dpo-v1.0) uygulanan yalnızca ağırlık nicemlemesi (INT8 ve INT4) yöntemlerinin model doğruluğu, üretim kalitesi ve sistem düzeyi verimlilik üzerindeki etkilerini kapsamlı biçimde incelemektedir. Nicemlenmemiş BFLOAT16 sürümü temel alınarak, aynı modelin INT8 ve INT4 biçimlerinde nicemlenmiş karşılıkları Türkçeye uyarlanmış GLUE-benzeri sınıflandırma görevlerinde ve üretim odaklı özetleme görevlerinde değerlendirilmiştir.
Elde edilen bulgular, nicemlemenin görev türüne bağlı olmakla birlikte çoğu durumda sınırlı düzeyde performans kaybına yol açtığını; buna karşılık belirgin bellek tasarrufu ve dağıtım kolaylığı sağladığını göstermektedir. Sonuçlar doğruluk, hız ve bellek kullanımı arasındaki kaçınılmaz dengelemelere (trade-off) dikkat çekmektedir. Özellikle INT4 nicemleme, kaynak kısıtlı ortamlarda bellek verimliliği ile kabul edilebilir doğruluk arasında dengeli bir çözüm sunarken; INT8 nicemleme bazı sınıflandırma görevlerinde yüksek doğruluğu koruyabilmekte, ancak belirli yazılım veya donanım yapılandırmalarında hız açısından dezavantaj oluşturabilmektedir.
Genel olarak bulgular, Türkçe büyük dil modellerinin saha koşullarında dağıtımı açısından nicemleme yöntemlerinin uygulanabilir ve etkili bir optimizasyon stratejisi olduğunu ortaya koymaktadır. Bu çalışma, Türkçe LLM ekosisteminde görev-duyarlı nicemleme stratejilerinin geliştirilmesi ve uygulanmasına yönelik somut ilkeler sunarak, gelecekteki model verimliliği çalışmalarına anlamlı bir katkı sağlamaktadır.

Anahtar Kelimeler

Nicemleme , Büyük Dil Modelleri , Model Verimliliği , Türkçe Doğal Dil İşleme , Düşük Hassasiyetli Çıkarım

Kaynakça

[1]. T. Brown vd., “Language Models are Few-Shot Learners”, içinde Advances in Neural Information Processing Systems, Curran Associates, Inc., 2020, ss. 1877-1901. Erişim: 27 Temmuz 2025. [Çevrimiçi]. Erişim adresi: https://proceedings.neurips.cc/paper_files/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
[2]. A. Vaswani vd., “Attention is All you Need”, içinde Advances in Neural Information Processing Systems, Curran Associates, Inc., 2017. Erişim: 27 Temmuz 2025. [Çevrimiçi]. Erişim adresi: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
[3]. J. Lin, J. Tang, H. Tang, S. Yang, G. Xiao, ve S. Han, “AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration”, GetMobile: Mobile Comp. and Comm., c. 28, sy 4, ss. 12-17, Oca. 2025, doi: 10.1145/3714983.3714987.
[4]. Z. Liu vd., “LLM-QAT: Data-Free Quantization Aware Training for Large Language Models”, 29 Mayıs 2023, arXiv: arXiv:2305.17888. doi: 10.48550/arXiv.2305.17888.
[5]. Z. Liu vd., “SpinQuant: LLM quantization with learned rotations”, 20 Şubat 2025, arXiv: arXiv:2405.16406. doi: 10.48550/arXiv.2405.16406.
[6]. Y. Zhao vd., “Atom: Low-Bit Quantization for Efficient and Accurate LLM Serving”, Proceedings of Machine Learning and Systems, c. 6, ss. 196-209, May. 2024.
[7]. S. Dong, W. Cheng, J. Qin, ve W. Wang, “QAQ: Quality Adaptive Quantization for LLM KV Cache”, 12 Nisan 2024, arXiv: arXiv:2403.04643. doi: 10.48550/arXiv.2403.04643.
[8]. J. Lang, Z. Guo, ve S. Huang, “A Comprehensive Study on Quantization Techniques for Large Language Models”, 30 Ekim 2024, arXiv: arXiv:2411.02530. doi: 10.48550/arXiv.2411.02530.
[9]. S. Roy, “Understanding the Impact of Post-Training Quantization on Large Language Models”, 17 Eylül 2023, arXiv: arXiv:2309.05210. doi: 10.48550/arXiv.2309.05210.
[10]. R.-G. Dumitru, V. Yadav, R. Maheshwary, P. I. Clotan, S. T. Madhusudhan, ve M. Surdeanu, “Variable Layerwise Quantization: A Simple and Effective Approach to Quantize LLMs”, içinde Findings of the Association for Computational Linguistics: ACL 2025, W. Che, J. Nabende, E. Shutova, ve M. T. Pilehvar, Ed., Vienna, Austria: Association for Computational Linguistics, Tem. 2025, ss. 534-550. Erişim: 27 Temmuz 2025. [Çevrimiçi]. Erişim adresi: https://aclanthology.org/2025.findings-acl.29/
[11]. R. Jin vd., “A Comprehensive Evaluation of Quantization Strategies for Large Language Models”, içinde Findings of the Association for Computational Linguistics: ACL 2024, L.-W. Ku, A. Martins, ve V. Srikumar, Ed., Bangkok, Thailand: Association for Computational Linguistics, Ağu. 2024, ss. 12186-12215. doi: 10.18653/v1/2024.findings-acl.726.
[12]. Y. Park, J. Hyun, S. Cho, B. Sim, ve J. W. Lee, “Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs”, 21 Haziran 2024, arXiv: arXiv:2402.10517. doi: 10.48550/arXiv.2402.10517.
[13]. S. Li vd., “Evaluating Quantized Large Language Models”, 06 Haziran 2024, arXiv: arXiv:2402.18158. doi: 10.48550/arXiv.2402.18158.
[14]. E. Kurtic, A. Marques, S. Pandit, M. Kurtz, ve D. Alistarh, “‘Give Me BF16 or Give Me Death’? Accuracy-Performance Trade-Offs in LLM Quantization”, 30 Mayıs 2025, arXiv: arXiv:2411.02355. doi: 10.48550/arXiv.2411.02355.
[15]. Ç. Çöltekin, “A Corpus of Turkish Offensive Language on Social Media”, içinde Proceedings of the Twelfth Language Resources and Evaluation Conference, N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, ve S. Piperidis, Ed., Marseille, France: European Language Resources Association, May. 2020, ss. 6174-6184. Erişim: 27 Temmuz 2025. [Çevrimiçi]. Erişim adresi: https://aclanthology.org/2020.lrec-1.758/
[16]. “Trendyol/Trendyol-LLM-7b-chat-dpo-v1.0 · Hugging Face”. Erişim: 06 Aralık 2025. [Çevrimiçi]. Erişim adresi: https://huggingface.co/Trendyol/Trendyol-LLM-7b-chat-dpo-v1.0.
[17]. H. T. Kesgin vd., “Optimizing Large Language Models for Turkish: New Methodologies in Corpus Selection and Training”, içinde 2024 Innovations in Intelligent Systems and Applications Conference (ASYU), Eki. 2024, ss. 1-6. doi: 10.1109/ASYU62119.2024.10757019.
[18]. “TURKCELL/Turkcell-LLM-7b-v1 · Hugging Face”. Erişim: 20 Aralık 2025. [Çevrimiçi]. Erişim adresi: https://huggingface.co/TURKCELL/Turkcell-LLM-7b-v1.
[19]. A. Warstadt, A. Singh, ve S. R. Bowman, “CoLA: The Corpus of Linguistic Acceptability (with added annotations)”. 2019. Erişim: 06 Aralık 2025. [Çevrimiçi]. Erişim adresi: http://archive.nyu.edu/handle/2451/60441
[20]. turkish-nlp-suite/TrGLUE. (03 Ekim 2025). Python. Turkish NLP Suite. Erişim: 06 Aralık 2025. [Çevrimiçi]. Erişim adresi: https://github.com/turkish-nlp-suite/TrGLUE.
[21]. T. Hasan vd., “XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages”, içinde Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, C. Zong, F. Xia, W. Li, ve R. Navigli, Ed., Online: Association for Computational Linguistics, Ağu. 2021, ss. 4693-4703. doi: 10.18653/v1/2021.findings-acl.413.
[22]. C. Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries”, içinde Text Summarization Branches Out, Association for Computational Linguistics, Tem. 2004, ss. 74-81. Erişim: 09 Ağustos 2025. [Çevrimiçi]. Erişim adresi: https://aclanthology.org/W04-1013/.
[23]. Y. Zhang vd., “S4T-GPTQ: A Space-for-Time Strategy For Optimizing GPTQ 4-bit Quantization”, May. 2025, ss. 383-387. doi: 10.1109/ICMLT65785.2025.11193187.
[24]. E. Frantar, S. Ashkboos, T. Hoefler, ve D. Alistarh, “GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”, 22 Mart 2023, arXiv: arXiv:2210.17323. doi: 10.48550/arXiv.2210.17323.
[25]. J. Tamura, Y. Itaya, K. Hayashi, ve K. Yamamoto, “Statistical Inference of the Matthews Correlation Coefficient for Multiclass Classification”, 09 Mart 2025, arXiv: arXiv:2503.06450. doi: 10.48550/arXiv.2503.06450.

Quantization Compensation in Turkish LLMs: An Evaluation of Accuracy, Speed, and Memory Usage

Yıl 2025, Cilt: 8 Sayı: 2, 70 - 80, 24.12.2025

Cengizhan Bayram , Cevdet Ahmet Turan , Ferhat Kürkçüoğlu , Volkan Altıntaş

Öz

This study provides a comprehensive examination of the impact of weight-only quantization methods (INT8 and INT4) on model accuracy, generative quality, and system-level efficiency in a Turkish large language model (Trendyol/Trendyol-LLM-7b-chat-dpo-v1.0). Using the non-quantized BFLOAT16 configuration as the reference, the quantized INT8 and INT4 variants of the same model were evaluated on Turkish-adapted GLUE-like classification tasks as well as a summarization task focusing on generation quality.
The findings indicate that, although the effects of quantization vary across tasks, performance degradation generally remains limited. At the same time, quantization yields substantial memory savings and facilitates model deployment. The results further highlight the inherent trade-offs between accuracy, inference speed, and memory consumption. In particular, INT4 quantization offers a balanced alternative between memory efficiency and acceptable accuracy in resource-constrained settings, whereas INT8 quantization can preserve competitive accuracy on certain classification tasks but may exhibit speed disadvantages depending on the underlying software or hardware configuration.
Overall, the study demonstrates that quantization methods constitute a viable optimization strategy for deploying Turkish large language models in practical environments. By outlining task-sensitive quantization considerations, this work provides actionable insights that can inform future efforts toward efficient model optimization within the Turkish LLM ecosystem.

Anahtar Kelimeler

Quantization , Large Language Models , Model Efficiency , Turkish NLP , Low-Precision Inference

Kaynakça

[1]. T. Brown vd., “Language Models are Few-Shot Learners”, içinde Advances in Neural Information Processing Systems, Curran Associates, Inc., 2020, ss. 1877-1901. Erişim: 27 Temmuz 2025. [Çevrimiçi]. Erişim adresi: https://proceedings.neurips.cc/paper_files/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
[2]. A. Vaswani vd., “Attention is All you Need”, içinde Advances in Neural Information Processing Systems, Curran Associates, Inc., 2017. Erişim: 27 Temmuz 2025. [Çevrimiçi]. Erişim adresi: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
[3]. J. Lin, J. Tang, H. Tang, S. Yang, G. Xiao, ve S. Han, “AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration”, GetMobile: Mobile Comp. and Comm., c. 28, sy 4, ss. 12-17, Oca. 2025, doi: 10.1145/3714983.3714987.
[4]. Z. Liu vd., “LLM-QAT: Data-Free Quantization Aware Training for Large Language Models”, 29 Mayıs 2023, arXiv: arXiv:2305.17888. doi: 10.48550/arXiv.2305.17888.
[5]. Z. Liu vd., “SpinQuant: LLM quantization with learned rotations”, 20 Şubat 2025, arXiv: arXiv:2405.16406. doi: 10.48550/arXiv.2405.16406.
[6]. Y. Zhao vd., “Atom: Low-Bit Quantization for Efficient and Accurate LLM Serving”, Proceedings of Machine Learning and Systems, c. 6, ss. 196-209, May. 2024.
[7]. S. Dong, W. Cheng, J. Qin, ve W. Wang, “QAQ: Quality Adaptive Quantization for LLM KV Cache”, 12 Nisan 2024, arXiv: arXiv:2403.04643. doi: 10.48550/arXiv.2403.04643.
[8]. J. Lang, Z. Guo, ve S. Huang, “A Comprehensive Study on Quantization Techniques for Large Language Models”, 30 Ekim 2024, arXiv: arXiv:2411.02530. doi: 10.48550/arXiv.2411.02530.
[9]. S. Roy, “Understanding the Impact of Post-Training Quantization on Large Language Models”, 17 Eylül 2023, arXiv: arXiv:2309.05210. doi: 10.48550/arXiv.2309.05210.
[10]. R.-G. Dumitru, V. Yadav, R. Maheshwary, P. I. Clotan, S. T. Madhusudhan, ve M. Surdeanu, “Variable Layerwise Quantization: A Simple and Effective Approach to Quantize LLMs”, içinde Findings of the Association for Computational Linguistics: ACL 2025, W. Che, J. Nabende, E. Shutova, ve M. T. Pilehvar, Ed., Vienna, Austria: Association for Computational Linguistics, Tem. 2025, ss. 534-550. Erişim: 27 Temmuz 2025. [Çevrimiçi]. Erişim adresi: https://aclanthology.org/2025.findings-acl.29/
[11]. R. Jin vd., “A Comprehensive Evaluation of Quantization Strategies for Large Language Models”, içinde Findings of the Association for Computational Linguistics: ACL 2024, L.-W. Ku, A. Martins, ve V. Srikumar, Ed., Bangkok, Thailand: Association for Computational Linguistics, Ağu. 2024, ss. 12186-12215. doi: 10.18653/v1/2024.findings-acl.726.
[12]. Y. Park, J. Hyun, S. Cho, B. Sim, ve J. W. Lee, “Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs”, 21 Haziran 2024, arXiv: arXiv:2402.10517. doi: 10.48550/arXiv.2402.10517.
[13]. S. Li vd., “Evaluating Quantized Large Language Models”, 06 Haziran 2024, arXiv: arXiv:2402.18158. doi: 10.48550/arXiv.2402.18158.
[14]. E. Kurtic, A. Marques, S. Pandit, M. Kurtz, ve D. Alistarh, “‘Give Me BF16 or Give Me Death’? Accuracy-Performance Trade-Offs in LLM Quantization”, 30 Mayıs 2025, arXiv: arXiv:2411.02355. doi: 10.48550/arXiv.2411.02355.
[15]. Ç. Çöltekin, “A Corpus of Turkish Offensive Language on Social Media”, içinde Proceedings of the Twelfth Language Resources and Evaluation Conference, N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, ve S. Piperidis, Ed., Marseille, France: European Language Resources Association, May. 2020, ss. 6174-6184. Erişim: 27 Temmuz 2025. [Çevrimiçi]. Erişim adresi: https://aclanthology.org/2020.lrec-1.758/
[16]. “Trendyol/Trendyol-LLM-7b-chat-dpo-v1.0 · Hugging Face”. Erişim: 06 Aralık 2025. [Çevrimiçi]. Erişim adresi: https://huggingface.co/Trendyol/Trendyol-LLM-7b-chat-dpo-v1.0.
[17]. H. T. Kesgin vd., “Optimizing Large Language Models for Turkish: New Methodologies in Corpus Selection and Training”, içinde 2024 Innovations in Intelligent Systems and Applications Conference (ASYU), Eki. 2024, ss. 1-6. doi: 10.1109/ASYU62119.2024.10757019.
[18]. “TURKCELL/Turkcell-LLM-7b-v1 · Hugging Face”. Erişim: 20 Aralık 2025. [Çevrimiçi]. Erişim adresi: https://huggingface.co/TURKCELL/Turkcell-LLM-7b-v1.
[19]. A. Warstadt, A. Singh, ve S. R. Bowman, “CoLA: The Corpus of Linguistic Acceptability (with added annotations)”. 2019. Erişim: 06 Aralık 2025. [Çevrimiçi]. Erişim adresi: http://archive.nyu.edu/handle/2451/60441
[20]. turkish-nlp-suite/TrGLUE. (03 Ekim 2025). Python. Turkish NLP Suite. Erişim: 06 Aralık 2025. [Çevrimiçi]. Erişim adresi: https://github.com/turkish-nlp-suite/TrGLUE.
[21]. T. Hasan vd., “XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages”, içinde Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, C. Zong, F. Xia, W. Li, ve R. Navigli, Ed., Online: Association for Computational Linguistics, Ağu. 2021, ss. 4693-4703. doi: 10.18653/v1/2021.findings-acl.413.
[22]. C. Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries”, içinde Text Summarization Branches Out, Association for Computational Linguistics, Tem. 2004, ss. 74-81. Erişim: 09 Ağustos 2025. [Çevrimiçi]. Erişim adresi: https://aclanthology.org/W04-1013/.
[23]. Y. Zhang vd., “S4T-GPTQ: A Space-for-Time Strategy For Optimizing GPTQ 4-bit Quantization”, May. 2025, ss. 383-387. doi: 10.1109/ICMLT65785.2025.11193187.
[24]. E. Frantar, S. Ashkboos, T. Hoefler, ve D. Alistarh, “GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”, 22 Mart 2023, arXiv: arXiv:2210.17323. doi: 10.48550/arXiv.2210.17323.
[25]. J. Tamura, Y. Itaya, K. Hayashi, ve K. Yamamoto, “Statistical Inference of the Matthews Correlation Coefficient for Multiclass Classification”, 09 Mart 2025, arXiv: arXiv:2503.06450. doi: 10.48550/arXiv.2503.06450.

Toplam 25 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Doğal Dil İşleme
Bölüm	Araştırma Makalesi
Yazarlar	Cengizhan Bayram 0009-0002-3915-6881 Cevdet Ahmet Turan 0009-0000-4154-4504 Ferhat Kürkçüoğlu 0009-0005-1870-2160 Volkan Altıntaş 0000-0002-1560-9017
Gönderilme Tarihi	6 Aralık 2025
Kabul Tarihi	21 Aralık 2025
Yayımlanma Tarihi	24 Aralık 2025
Yayımlandığı Sayı	Yıl 2025 Cilt: 8 Sayı: 2

Kaynak Göster

APA	Bayram, C., Turan, C. A., Kürkçüoğlu, F., Altıntaş, V. (2025). Türkçe LLM’lerde Nicemleme Dengelemesi: Doğruluk, Hız ve Bellek Kullanımı Üzerine Bir Değerlendirme. Veri Bilimi, 8(2), 70-80.