Research Article

Evaluating the Impact of Prompt Formats on Llama2 and Phi3 Using Turkish Language Instruction Dataset

Volume: 14 Number: 1 January 21, 2026
TR EN

Evaluating the Impact of Prompt Formats on Llama2 and Phi3 Using Turkish Language Instruction Dataset

Abstract

Language models can be trained and customized by anyone using effective fine-tuning methods, making them versatile tools across various domains. While metrics like loss and accuracy are commonly used to assess the performance of language models, the choice of prompt formats also plays a crucial role. In this study, a large language model and a small language model are trained on two different datasets using various prompt formats, and their performance results are compared. Additionally, a custom prompt format tailored for selected Turkish datasets is introduced. The results indicate that changes in prompt format significantly impact the performance of large language models. In addition, the customized prompt format achieved the best loss values for both models and the best metric results for the large language model.

Keywords

Llama2, LLM, Phi3, Prompt Format

Supporting Institution

This research received no external funding.

Ethical Statement

This study does not involve human or animal participants. All procedures followed scientific and ethical principles, and all referenced studies are appropriately cited.

Thanks

The author/authors do not wish to acknowledge any individual or institution.

References

  1. Abdin, M., Aneja, J., Behl, H., Bubeck, S., Eldan, R., Gunasekar, S., Harrison, M., Hewett, R. J., Javaheripi, M., Kauffmann, P., Lee, J. R., Lee, Y. T., Li, Y., Liu, W., Mendes, C. C. T., Nguyen, A., Price, E., de Rosa, G., Saarikivi, O., … Zhang, Y. (2024). Phi-4 technical report. arXiv. https://arxiv.org/abs/2412.08905
  2. Ateş, M., & Başarslan, M. S. (2025). Performance comparison of traditional and contextual representations for cryptocurrency sentiment analysis on Twitter. Düzce University Journal of Science and Technology, 13(3), 1431–1444.
  3. Austin, J., Odena, A., Nye, M., Bosma, M., Michalewski, H., Dohan, D., Jiang, E., Cai, C., Terry, M., Le, Q., & Sutton, C. (2021). Program synthesis with large language models. arXiv. https://arxiv.org/abs/2108.07732
  4. Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv. https://arxiv.org/abs/1412.3555
  5. Dang, N. C., Moreno-García, M. N., & De la Prieta, F. (2020). Sentiment analysis based on deep learning: A comparative study. Electronics, 9(3), Article 483. https://doi.org/10.3390/electronics9030483
  6. Erdi, B., Şahin, E. A., Toydemir, M. S., & Dökeroğlu, T. (2021). Makine öğrenmesi algoritmaları ile trol hesapların tespiti. Düzce Üniversitesi Bilim ve Teknoloji Dergisi, 9(1), 430–442.
  7. Esmer, S., Uçar, M. K., Çil, İ., & Bozkurt, M. R. (2020). Parkinson hastalığı teşhisi için makine öğrenmesi tabanlı yeni bir yöntem. Duzce University Journal of Science and Technology, 8(3), 1877–1893. https://doi.org/10.29130/dubited.688223
  8. Goyal, M., Tatwawadi, K., Chandak, S., & Ochoa, I. (2021). DZip: Improved general-purpose loss less compression based on novel neural network modeling. In 2021 Data Compression Conference (DCC) (pp. 153–162). IEEE. https://doi.org/10.1109/DCC50243.2021.00023
  9. He, J., Rungta, M., Koleczek, D., Sekhon, A., Wang, F. X., & Hasan, S. (2024). Does prompt formatting have any impact on LLM performance? arXiv. https://arxiv.org/abs/2411.10541
  10. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
APA
Öztürk, E. (2026). Evaluating the Impact of Prompt Formats on Llama2 and Phi3 Using Turkish Language Instruction Dataset. Duzce University Journal of Science and Technology, 14(1), 49-59. https://doi.org/10.29130/dubited.1533514
AMA
1.Öztürk E. Evaluating the Impact of Prompt Formats on Llama2 and Phi3 Using Turkish Language Instruction Dataset. DUBİTED. 2026;14(1):49-59. doi:10.29130/dubited.1533514
Chicago
Öztürk, Emir. 2026. “Evaluating the Impact of Prompt Formats on Llama2 and Phi3 Using Turkish Language Instruction Dataset”. Duzce University Journal of Science and Technology 14 (1): 49-59. https://doi.org/10.29130/dubited.1533514.
EndNote
Öztürk E (January 1, 2026) Evaluating the Impact of Prompt Formats on Llama2 and Phi3 Using Turkish Language Instruction Dataset. Duzce University Journal of Science and Technology 14 1 49–59.
IEEE
[1]E. Öztürk, “Evaluating the Impact of Prompt Formats on Llama2 and Phi3 Using Turkish Language Instruction Dataset”, DUBİTED, vol. 14, no. 1, pp. 49–59, Jan. 2026, doi: 10.29130/dubited.1533514.
ISNAD
Öztürk, Emir. “Evaluating the Impact of Prompt Formats on Llama2 and Phi3 Using Turkish Language Instruction Dataset”. Duzce University Journal of Science and Technology 14/1 (January 1, 2026): 49-59. https://doi.org/10.29130/dubited.1533514.
JAMA
1.Öztürk E. Evaluating the Impact of Prompt Formats on Llama2 and Phi3 Using Turkish Language Instruction Dataset. DUBİTED. 2026;14:49–59.
MLA
Öztürk, Emir. “Evaluating the Impact of Prompt Formats on Llama2 and Phi3 Using Turkish Language Instruction Dataset”. Duzce University Journal of Science and Technology, vol. 14, no. 1, Jan. 2026, pp. 49-59, doi:10.29130/dubited.1533514.
Vancouver
1.Emir Öztürk. Evaluating the Impact of Prompt Formats on Llama2 and Phi3 Using Turkish Language Instruction Dataset. DUBİTED. 2026 Jan. 1;14(1):49-5. doi:10.29130/dubited.1533514