Fine-Tuning LLaMA2, LLaMA3, and Phi3 Models for Code Generation with Turkish Instructions

Emir Öztürk; Aydın Carus

doi:10.47000/tjmcs.1699086

Research Article

Fine-Tuning LLaMA2, LLaMA3, and Phi3 Models for Code Generation with Turkish Instructions

Year 2025, Volume: 17 Issue: 2, 519 - 526, 30.12.2025

Emir Öztürk , Aydın Carus

https://doi.org/10.47000/tjmcs.1699086

Abstract

With the advancement of artificial intelligence and large language models, various models have begun
to be utilized at every stage of software development processes, leading to changes in coding habits. Instead of
writing entire pieces of code themselves, developers now leave certain patterns to language models or use these
models to query issues they encounter. Similarly, code translation from one programming language to another is
also carried out with the help of language models. While such studies are observed in the English language, a
model designed specifically for generating code in response to queries posed in Turkish does not yet exist. In this
study, the “python code instructions 18k alpaca” dataset, which includes data for translation, code generation from scratch, and error querying, has been translated into Turkish, and different language models have been trained on this dataset. The performance of the trained language models has been evaluated using the ROUGE, BLEU and METEOR metrics and models that can generate code are presented

Keywords

Code generation , Llama , Phi3 , Python

References

Abdin, M., Aneja, J., Behl, H., Bubeck, S., Eldan, R. et al., Phi-4 technical report, arXiv preprint arXiv:2412.08905, (2024).
Allamanis, M., Tarlow, D., Gordon, A.,Wei, Y., Bimodal modelling of source code and natural language, International Conference on Machine Learning, PMLR, (2015), 2123–2132.
Allamanis, M., Barr, E.T., Devanbu, P., Sutton, C., A survey of machine learning for big code and naturalness, ACM Computing Surveys, 51(2018), 1–37.
Banerjee, S., Lavie, A., METEOR: An automatic metric for MT evaluation with improved correlation with human judgments, Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, ( 2005), 65–72.
Campbell, B.A., Treude, C., NLP2Code: Code snippet content assist via natural language tasks, Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), (2017), 628–632.
Dehaerne, E., Dey, B., Halder, S., De Gendt, S., Meert, W., Code generation using machine learning: A systematic review, IEEE Access, 10(2022), 82434–82455.
Desai, A., Gulwani, S., Hingorani, V., Jain, N., Karkare, A. et al., Program synthesis using natural language, Proceedings of the 38th International Conference on Software Engineering, (2016), 345–356.
Falcini, F., Lami, G., Costanza, A. M., Deep learning in automotive software, IEEE Software, 34(2017), 56–63.
Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X. et al., CodeBERT: A pre-trained model for programming and natural languages, arXiv preprint arXiv:2002.08155, (2020).
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A. et al., The LLaMA 3 herd of models, arXiv preprint arXiv:2407.21783, (2024).
Guo, Q., Xie, X., Li, Y., Zhang, X., Liu, Y. et al., Audee: Automated testing for deep learning frameworks, Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 2020, 486–498.
Jiang, D., Liu, Y., Liu, S., Zhao, J., Zhang, H. et al., From CLIP to DINO: Visual encoders shout in multi-modal large language models, arXiv preprint arXiv:2310.08825, (2023).
Li, K., Zhu, A., Zhao, P., Song, J., Liu, J., Utilizing deep learning to optimize software development processes, arXiv preprint arXiv:2404.13630, (2024).
Lin, C.-Y., ROUGE: A package for automatic evaluation of summaries, Text Summarization Branches Out, 2004, 74–81.
Narayanan, A., Wang, J., Shi, L., Wei, M., Wang, S., Automatic unit test generation for deep learning frameworks based on API knowledge, arXiv preprint arXiv:2307.00404, (2023).
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J., BLEU: A method for automatic evaluation of machine translation, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, (2002), 311–318.
Royce, W.W., Managing the development of large software systems: concepts and techniques, Proceedings of the 9th International Conference on Software Engineering, (1987) , 328–336.
Shin, J., Nam, J., A survey of automatic code generation from natural language, Journal of Information Processing Systems, 17(2021), 537–555.
Sreekanth, N., Rama Devi, J., Shukla, K.A., Mohanty, D.K., Srinivas, A. et al., Evaluation of estimation in software development using deep learning-modified neural network, Applied Nanoscience, 13(2023), 2405–2417.
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A. et al., LLaMA 2: Open foundation and fine-tuned chat models, arXiv preprintarXiv:2307.09288, (2023).

There are 20 citations in total.

Details

Primary Language	English
Subjects	Deep Learning
Journal Section	Research Article
Authors	Emir Öztürk 0000-0002-3734-5171 Aydın Carus 0000-0003-3370-5974
Submission Date	May 14, 2025
Acceptance Date	June 30, 2025
Publication Date	December 30, 2025
Published in Issue	Year 2025 Volume: 17 Issue: 2

Cite

APA	Öztürk, E., & Carus, A. (2025). Fine-Tuning LLaMA2, LLaMA3, and Phi3 Models for Code Generation with Turkish Instructions. Turkish Journal of Mathematics and Computer Science, 17(2), 519-526. https://doi.org/10.47000/tjmcs.1699086
AMA	Öztürk E, Carus A. Fine-Tuning LLaMA2, LLaMA3, and Phi3 Models for Code Generation with Turkish Instructions. TJMCS. December 2025;17(2):519-526. doi:10.47000/tjmcs.1699086
Chicago	Öztürk, Emir, and Aydın Carus. “Fine-Tuning LLaMA2, LLaMA3, and Phi3 Models for Code Generation With Turkish Instructions”. Turkish Journal of Mathematics and Computer Science 17, no. 2 (December 2025): 519-26. https://doi.org/10.47000/tjmcs.1699086.
EndNote	Öztürk E, Carus A (December 1, 2025) Fine-Tuning LLaMA2, LLaMA3, and Phi3 Models for Code Generation with Turkish Instructions. Turkish Journal of Mathematics and Computer Science 17 2 519–526.
IEEE	E. Öztürk and A. Carus, “Fine-Tuning LLaMA2, LLaMA3, and Phi3 Models for Code Generation with Turkish Instructions”, TJMCS, vol. 17, no. 2, pp. 519–526, 2025, doi: 10.47000/tjmcs.1699086.
ISNAD	Öztürk, Emir - Carus, Aydın. “Fine-Tuning LLaMA2, LLaMA3, and Phi3 Models for Code Generation With Turkish Instructions”. Turkish Journal of Mathematics and Computer Science 17/2 (December2025), 519-526. https://doi.org/10.47000/tjmcs.1699086.
JAMA	Öztürk E, Carus A. Fine-Tuning LLaMA2, LLaMA3, and Phi3 Models for Code Generation with Turkish Instructions. TJMCS. 2025;17:519–526.
MLA	Öztürk, Emir and Aydın Carus. “Fine-Tuning LLaMA2, LLaMA3, and Phi3 Models for Code Generation With Turkish Instructions”. Turkish Journal of Mathematics and Computer Science, vol. 17, no. 2, 2025, pp. 519-26, doi:10.47000/tjmcs.1699086.
Vancouver	Öztürk E, Carus A. Fine-Tuning LLaMA2, LLaMA3, and Phi3 Models for Code Generation with Turkish Instructions. TJMCS. 2025;17(2):519-26.

Article Files

Full Text