TY - JOUR T1 - Fine-tuning Large Language Models for Turkish Flutter Code Generation AU - Uluırmak, Bugra AU - Kurban, Rifat PY - 2025 DA - December Y2 - 2025 DO - 10.35377/saucis...1722643 JF - Sakarya University Journal of Computer and Information Sciences JO - SAUCIS PB - Sakarya University WT - DergiPark SN - 2636-8129 SP - 637 EP - 650 VL - 8 IS - 4 LA - en AB - The rapid advancement of large language models (LLMs) for code generation has largely centered on English programming queries. This paper focuses on a low-resource language scenario, specifically Turkish, in the context of Flutter mobile app development. Two representative LLMs (a 4B-parameter multilingual model and a 3B code-specialized model) on a new Turkish question-and-answer dataset for Flutter/Dart are fine-tuned in this study. Fine-tuning with parameter-efficient techniques yields dramatic improvements in code generation quality: Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (ROUGE-L), Metric for Evaluation of Translation with Explicit Ordering (METEOR), Bidirectional Encoder Representations from Transformers Score (BERTScore), and CodeBLEU scores show significant increases. The rate of correct solutions increased from ~30–70% (for base models) to 80–90% after fine-tuning. The performance trade-offs between models are analyzed, revealing that the multilingual model slightly outperforms the code-focused model in accuracy after fine-tuning. However, the code-focused model demonstrates faster inference speeds. These results demonstrate that even with very limited non-English training data, customizing LLMs can bridge the gap in code generation, enabling high-quality assistance for Turkish developers comparable to that for English. The dataset was released on GitHub to facilitate further research in multilingual code generation. KW - Code generation KW - Large language models KW - Fine-tuning KW - Low-resource languages KW - Flutter CR - J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, & G. Neubig, "Towards a unified view of parameter-efficient transfer learning", 2021. doi: 10.48550/arxiv.2110.04366 CR - N. Houlsby, A. Giurgiu, S. Jastrzȩbski, B. Morrone, Q. Laroussilhe, A. Gesmundoet al., "Parameter-efficient transfer learning for nlp", 2019. doi: 10.48550/arxiv.1902.00751 CR - X. Liu, P. He, W. Chen, & J. Gao, "Multi-task deep neural networks for natural language understanding", 2019. doi: 10.18653/v1/p19-1441 CR - M. Anschütz, D. Lozano, & G. Groh, "This is not correct! negation-aware evaluation of language generation systems", 2023. doi: 10.18653/v1/2023.inlg-main.12 CR - Lodha, G. Belapurkar, S. Chalkapurkar, Y. Tao, R. Ghosh, S. Basuet al., "On surgical fine-tuning for language encoders", 2023. doi: 10.18653/v1/2023.findings-emnlp.204 CR - J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wanget al., "Lora: low-rank adaptation of large language models", 2021. doi: 10.48550/arxiv.2106.09685 CR - Y. Hu, Y. Xie, T. Wang, M. Chen, & Z. Pan, "Structure-aware low-rank adaptation for parameter-efficient fine-tuning", Mathematics, vol. 11, no. 20, p. 4317, 2023. doi: 10.3390/math11204317 CR - N. Dhinagar, S. Ozarkar, K. Buwa, S. Thomopoulos, C. Owens‐Walton, E. Laltooet al., "Parameter efficient fine-tuning of transformer-based masked autoencoder enhances resource constrained neuroimage analysis", 2025. doi: 10.1101/2025.02.15.638442 CR - H. Wu, "Large language models capsule: a research analysis of in-context learning (icl) and parameter-efficient fine-tuning (peft) methods", Applied and Computational Engineering, vol. 43, no. 1, pp. 327-331, 2024. doi: 10.54254/2755-2721/43/20230858 CR - N. Sulaiman and F. Hamzah, "Optimizing llama 7b for medical question answering: a study on fine-tuning strategies and performance on the multimedqa dataset", 2024. doi: 10.31219/osf.io/g5aes CR - J. Bogaert, E. Jean, C. Bodt, & F. Standaert, "Fine-tuning is not (always) overfitting artifacts", 2023. doi: 10.14428/esann/2023.es2023-152 CR - G. Wiedemann, S. Yimam, & C. Biemann, "Uhh-lt at semeval-2020 task 12: fine-tuning of pre-trained transformer networks for offensive language detection", pp. 1638-1644, 2020. doi: 10.18653/v1/2020.semeval-1.213 CR - Aghajanyan, S. Gupta, & L. Zettlemoyer, "Intrinsic dimensionality explains the effectiveness of language model fine-tuning", 2021. doi: 10.18653/v1/2021.acl-long.568 CR - L. Feng, Y. Yang, M. Tan, T. Zeng, Z. Li, H. Tanget al., "Adaptive multi-source domain collaborative fine-tuning for transfer learning", 2023. doi: 10.20944/preprints202311.0124.v1 CR - F. Ullah, U. Azam, A. Faheem, F. Kamiran, & A. Karim, "Comparing prompt-based and standard fine-tuning for urdu text classification", pp. 6747-6754, 2023. doi: 10.18653/v1/2023.findings-emnlp.449 CR - M. Mosbach, M. Andriushchenko, & D. Klakow, "On the stability of fine-tuning bert: misconceptions, explanations, and strong baselines", 2020. doi: 10.48550/arxiv.2006.04884 CR - X. Li and P. Liang, "Prefix-tuning: optimizing continuous prompts for generation", 2021. doi: 10.18653/v1/2021.acl-long.353 CR - X. Ma, C. Santos, & A. Arnold, "Contrastive fine-tuning improves robustness for neural rankers", 2021. doi: 10.18653/v1/2021.findings-acl.51 CR - L. Pan, C. Hang, A. Sil, & S. Potdar, "Improved text classification via contrastive adversarial training", 2021. doi: 10.48550/arxiv.2107.10137 CR - Chen M., Tworek J., Jun H., Kaplan J., Yuan Q. and Zarinelli E., “Evaluating Large Language Models Trained on Code”, arXiv preprint arXiv:2107.03374, (2021). doi: 10.48550/arXiv.2107.03374 CR - Xu X., Sharma P., Kinne J. F., O’Neill M., Mazaitis K. and Bhatia S., “A Systematic Evaluation of Large Language Models of Code”, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 662-678, (2022). doi: 10.48550/arXiv.2202.13169 CR - Wang Z., Cuenca G., Zhou S., Chen T., Lin B. and Matsuo Y., “MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages”, Findings of the Association for Computational Linguistics: EACL 2023, 265-273, (2023). doi: 10.48550/arXiv.2203.08388 CR - Cassano F., Gouwar J., Nguyen D., Bartolo M., Serrano S. and Sabour A., “MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation”, arXiv preprint arXiv:2208.08227, (2022). doi: 10.48550/arXiv.2208.08227 UR - https://doi.org/10.35377/saucis...1722643 L1 - https://dergipark.org.tr/en/download/article-file/4972081 ER -