Araştırma Makalesi
BibTex RIS Kaynak Göster

An Integrative Approach to LLM Literature with the Combination of QLoRa, SFT and Agentic RAG

Yıl 2025, Cilt: 9 Sayı: 2, 249 - 261, 30.11.2025

Öz

This study offers a solution for document-based question-answering systems for both mobile and web-based applications. This solution combines the fine-tuning of the transformer architecture found suitable for the problem using the appropriate dataset and the agentic Retrieval-Augmented Generation (RAG) methodology. This allows the system to handle not only document-based questions but also non-document questions through the web search agent. A separate agent structure was also incorporated into the solution to facilitate communication with the model in various languages. In the first phase, the Llama 3.1–8B Instruct model was quantized using the quantized Low-Rank Adaptation (QLoRa) method using a dataset with a context-question-answer structure and trained with Supervised Fine-Tuning (SFT). To overcome common problems encountered in the classical RAG architecture, such as hallucination existence, inaccurate document analysis, and missing answers due to insufficient context, agents such as web search, language translation, and techniques like document ranking, and hallucination checking were included, and the agentic RAG architecture was proposed. This system provides a dynamic structure, where user questions and answers are systematically orchestrated. The model's performance has been tested using metrics such as Exact Match, ROUGE-L, BLEU, and F1, and performance improvements have been observed. The test results demonstrate that the modular system achieved through agent integration significantly improves contextual accuracy.

Proje Numarası

-

Kaynakça

  • [1] Meta AI, Llama 3.1 8B Instruct, 2024. [Online]. Available: https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct [Accessed: Jul. 17, 2025].
  • [2] K. Kwiatkowski, et al., “Natural Questions:a Benchmark for Question Answering Research,” Trans. Assoc. Comput. Linguistics, vol. 7, pp. 453–466, 2019.
  • [3] S. Zhang et al., “Instruction Tuning for Large Language Models: A Survey,”arXiv,Aug.21,2023.[Online]. Available:https://arxiv.org/abs/2308.10792
  • [4] Z. Li et al., “Label Supervised Llama Finetuning (LS- Llama),” arXiv preprint 2310.01208, Oct. 2023. [Online]. Available: https://arxiv.org/abs/2310.01208
  • [5] T. Dettmers, “The Best GPUs for Deep Learning in 2023,” Tim DettmersBlog,2023.[Online].Available:https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/ [Accessed: 18-Jul-2025].
  • [6] J. Liang, G. Su, H. Lin, Y. Wu, R. Zhao, and Z. Li, “Reasoning RAG via System 1 or System 2: A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges,” arXiv preprint, arXiv:2506.10408, Jun. 2025. [Online]. Available: https://arxiv.org/abs/2506.10408
  • [7] “Natural Questions Filtered Dataset,” Kaggle, [Online]. Available: https://www.kaggle.com/datasets/allen-institute-for-ai/natural-questions.
  • [8] TensorFlow Datasets, “natural_questions_open” dataset, Splits: train (87,925), validation (3,610),2022 [Online]. Available:https://www.tensorflow.org/datasets/catalog/natural_questions_open.
  • [9] K. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “SQuAD: 100,000+ Questions for Machine Comprehension of Text,” in Proc. EMNLP, 2016.
  • [10] datarpit, distilbert-base-uncased-finetuned-natural-questions model card,2022.[Online].Available: https://huggingface.co/datarpit/distilbert-base-uncased-finetuned- natural-questions
  • [11] K. Lee, M.-W. Chang, and K. Toutanova, “Latent Retrieval for Weakly Supervised Open Domain Question Answering,” in Proc. ACL, pp. 6086–6096, Jul. 2019.
  • [12] Reece Shuttleworth, Jacob Andreas, Antonio Torralba, Pratyusha Sharma. "LoRA vs Full Fine-tuning: An Illusion of Equivalence." MIT CSAIL, 2023. [Online]. Available: https://arxiv.org/abs/2410.21228
  • [13] Tejaskumar Pujari, Anshul Goel, Ashwin Sharma. "Ensuring Responsible AI: The Role of Supervised Fine-Tuning (SFT) in Upholding Integrity and Privacy Regulations." Independent Researchers,India,2023.[Online].Available:https://www.researchgate.net/profile/TejaskumarPujari/publication/390944911_Ensuring_Responsible_AI_The_Role_of_Supervised_FineTuning_SFT_in_Upholding_Integrity_and_Privacy_Regulations/links/68085448bd3f1930dd630e4f/Ensuring-Responsible-AI-The-Role-of-Supervised-Fine-Tuning-SFT-in-Upholding-Integrity-and-Privacy-Regulations.pdf
  • [14] Joey Hong, Anca Dragan, Sergey Levine. "Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning." University of California,Berkeley,2023.[Online].Available:https://arxiv.org/abs/2411.05193
  • [15] Zhiqiang Hu, Lei Wang, Yihuai Lan, Wanyu Xu, Ee-Peng Lim, Lidong Bing, Xing Xu, Soujanya Poria, Roy Ka-Wei Lee. "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models." Singapore University of Technology and Design, 2023. [Online]. Available: https://arxiv.org/abs/2304.01933
  • [16] Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer. "QLoRA: Efficient Finetuning of Quantized LLMs." University of Washington,2023.[Online].Available:https://proceedings.neurips.cc/paper_files/paper/2023/hash/1feb87871436031bdc0f2beaa62a049b-Abstract-Conference.html
  • [17] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Riedel, S. (2021). Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv preprint arXiv:2005.11401. https://arxiv.org/pdf/2005.11401.pdf
  • [18] Barnett, S., Kurniawan, S., Thudumu, S., Brannelly, Z., & Abdelrazek, M. (2024). Seven failure points when engineering a retrieval augmented generation system. arXiv preprint arXiv:2401.05856. https://arxiv.org/pdf/2401.05856.pdf
  • [19] Singh, A., Ehtesham, A., Kumar, S., & Khoei, T. T. (2025). Agentic retrieval-augmented generation: A survey on agentic RAG. arXiv preprint arXiv:2501.09136. https://arxiv.org/pdf/2501.09136.pdf
  • [20] D. Patterson et al., “The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink,” arXiv preprint arXiv:2204.05149, 2022. [Online]. Available: https://arxiv.org/abs/2204.05149
  • [21] Y. Wang, F. Wei, S. Feng, M. Wang, and M. Zhou, MiniLM: Deep Self- Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers, in Advances in Neural Information Processing Systems 33 (NeurIPS 2020) Proceedings, 2020, arXiv preprint arXiv:2002.10957.[Online].Available:https://arxiv.org/abs/2002.10957
  • [22] “Comparing Enterprise NVIDIA GPUs: A100, H100, T4...,” Server-Parts.eu, 2024. [Online]. Available: https://www.server-parts.eu/post/nvidia-gpu. [Accessed: 18-Jul-2025].
  • [23] O. Wehrens, “OpenAI Whisper Benchmark Nvidia Tesla T4 / A100,” Owehrens.com,Jan.2023.[Online].Available:https://owehrens.com/openai-whisper-benchmark-on-nvidia-tesla-t4-a100/ [Accessed: 18-Jul-2025].
  • [24] “NVIDIA A100 Tensor Core GPU Architecture In-Depth,” NVIDIA Blog,2020.[Online].Available:https://blogs.nvidia.com/blog/2020/05/14/nvidia-a100/. [Accessed: 18-Jul-2025].
  • [25] “NVIDIA L4 vs. A100 GPUs: Choosing the Right Option...,” e2enetworks.com,2024.[Online].Available:https://e2enetworks.com/blog/nvidia-l4-vs-a100-gpus. [Accessed: 18-Jul-2025].
  • [26] Baseten, “NVIDIA A10 vs A100 GPUs for LLM and Stable Diffusion Inference,” Baseten Blog, 2024. [Online]. Available: https://www.baseten.co/blog/nvidia-a10-vs-a100-gpus-for-llm-and-stable-diffusion-inferenc/ [Accessed: 18-Jul-2025].
  • [27] Hugging Face, “Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA,” Hugging Face Blog, May 24, 2023. [Online]. Available: https://huggingface.co/blog/4bit-transformers-bitsandbytes
  • [28] A. Coelho, “Quantization in LLMs: Why Does It Matter?”, Dataiku Blog,Jan 10, 2024.[Online].Available:https://blog.dataiku.com/quantization-in-llms-why-does-it-matter/
  • [29] S. Bajpai, “Shrinking Elephants: A Funny Guide to 4bit and 8bit Quantization for LLMs with LoRA,” Medium, Sep 27, 2024. [Online].Available: https://medium.com/@shikharstruck/shrinking-elephants-a-funny-guide-to-4-bit-and-8-bit-quantization-for-llms-with-lora-ddf9f1a62070
  • [30] J. Kim, J. Lee, and S. Moon, “MemoryEfficient FineTuning of Compressed Large Language Models via sub4bit Integer Quantization,” arXiv preprint, arXiv:2305.14152, May 23, 2023. [Online]. Available: https://arxiv.org/abs/2305.14152
  • [31] “Why does the falcon QLoRA tutorial code use eos_token as pad_token?”, Hugging Face Discuss, Jul. 7, 2023. [Online]. Available: https://discuss.huggingface.co/t/why-does-the-falcon-qlora-tutorial-code-use-eos-token-as-pad-token/45954
  • [32] “Finetuning GPT: problems with padding”, Issue #8452, Hugging FaceTransformers,Oct. 2019.[Online].Available:https://github.com/huggingface/transformers/issues/8452
  • [33] Brando_Miranda, “How does one set the pad token correctly (not to eos) during finetuning to avoid model not predicting EOS?”, PyTorch Forums,Nov. 29, 2024.[Online].Available:https://discuss.pytorch.org/t/how-does-one-set-the-pad-token-correctly-not-to-eos-during-finetuning-to-avoid-model-notpredictingeos/213619
  • [34] mlabonne, “Issue with pad_token == eos_token: model not ‘learning when to stop’”, GitHub, Apr. 17, 2023. [Online]. Available: https://github.com/mlabonne/llm-course/issues/33
  • [35] E. J. Hu et al., “LoRA: LowRank Adaptation of Large Language Models,” *ICLR*, Feb. 2021. [Online]. Available: https://arxiv.org/abs/2106.09685
  • [36] M. Marek et al., “Small Batch Size Training for Language Models,” arXiv preprint, Jul. 2025. [Online]. Available: https://arxiv.org/abs/2507.07101
  • [37] Z. Huang et al., “Measuring the Impact of Gradient Accumulation on Cloud-Based Distributed Training,” CCGrid 2023. [Online]. Available: https://www.researchgate.net/publication/372262958_Measuring_the_Impact_of_Gradient_Accumulation_on_Cloud-based_Distributed_Training
  • [38] H. Jin, W. Wei, X. Wang, W. Zhang, and Y. Wu, "Rethinking Learning Rate Tuning in the Era of Large Language Models," arXiv preprint arXiv:2309.08859, Sep. 2023. [Online]. Available: https://arxiv.org/abs/2309.08859
  • [39] C. Li et al., “The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models,” arXiv, Aug. 2021. [Online]. Available: https://arxiv.org/abs/2108.06084
  • [40] S. Li et al., “Taming LLMs by Scaling Learning Rates with Gradient Grouping,” arXiv preprint, Jun. 2025. [Online]. Available: https://arxiv.org/abs/2506.01049
  • [41] Hugging Face, “Trainer Arguments Explained,” Hugging Face Docs, 2025.[Online].Available:https://huggingface.co/docs/transformers/main_classes/trainer
  • [42] T. Dettmers et al., “8-bit Optimizers via Block-wise Quantization,” arXivpreprint,Dec. 2021.[Online].Available:https://arxiv.org/abs/2110.02861
  • [43] Hugging Face, “8-bit Optimizers and Memory Efficiency,” Hugging FaceDocs,2024.[Online].Available:https://huggingface.co/docs/bitsandbytes/optimizers
  • [44] FoundationAI,“Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report,” Technical Report, 2024. [Online]. Available: https://www.arxiv.org/pdf/2508.01059
  • [45] R. Lakatos, P. Pollner, A. Hajdu, and T. Joo, “Investigating the Performance of Retrieval-Augmented Generation and Fine-Tuning for the Development of AI-Driven Knowledge-Based Systems,” arXiv preprint, 2024. [Online]. Available: https://arxiv.org/abs/2403.09727
  • [46] C. Rashtchian, H. Joren, J. Zhang, C.-S. Ferng, D.-C. Juan, and A. Taly, “Sufficient Context: A New Lens on Retrieval-Augmented Generation Systems,” arXiv preprint, 2025. [Online]. Available: https://arxiv.org/abs/2411.06037

Nicemlenmiş Düşük Dereceli Uyarlama, Denetimli İnce Ayar ve Ajan Tabanlı Geri Almayla Artırılmış Üretimin Birleşimiyle Büyük Dil Modeli Literatürüne Bütünleştirici Bir Yaklaşım

Yıl 2025, Cilt: 9 Sayı: 2, 249 - 261, 30.11.2025

Öz

Bu çalışma, hem mobil hem de web tabanlı uygulamalar için belge tabanlı soru cevaplama sistemleri için bir çözüm sunmaktadır. Bu çözüm, uygun veri kümesini kullanarak soruna uygun bulunan dönüştürücü mimarisinin ince ayarını ve ajan tabanlı geri almayla artırılmış üretim metodolojisini birleştirir. Bu, sistemin yalnızca belge tabanlı soruları değil, aynı zamanda web arama ajanı aracılığıyla belge dışı soruları da ele almasını sağlar. Modele farklı dillerde iletişimi kolaylaştırmak için çözüme ayrı bir ajan yapısı da entegre edildi. İlk aşamada LlaMA 3.1–8B Talimat modeli, bağlam-soru-cevap yapısına sahip bir veri kümesi kullanılarak nicelenmiş Düşük Dereceli Uyarlama yöntemi kullanılarak nicelendirildi ve Denetimli İnce Ayar ile eğitildi. Klasik geri almayla artırılmış üretim mimarisinde karşılaşılan halüsinasyon varlığı, yanlış belge analizi ve yetersiz bağlam nedeniyle eksik cevaplar gibi yaygın sorunların üstesinden gelmek için web araması, dil çevirisi gibi ajanlar, belge sıralaması ve halüsinasyon kontrolü gibi teknikler dahil edildi ve ajanlı geri almayla artırılmış üretim mimarisi önerildi. Bu sistem, kullanıcı sorularının ve cevaplarının sistematik olarak düzenlendiği dinamik bir yapı sağlar. Modelin performansı Tam Eşleşme, ROUGE-L benzerlik ölçütü, BLEU çeviri değerlendirme metriği ve F1 puanı gibi metrikler kullanılarak test edildi ve performans iyileştirmeleri gözlendi. Test sonuçları, ajan entegrasyonu ile oluşturulan modüler sistemin bağlamsal doğruluğu önemli ölçüde artırdığını göstermektedir.

Proje Numarası

-

Kaynakça

  • [1] Meta AI, Llama 3.1 8B Instruct, 2024. [Online]. Available: https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct [Accessed: Jul. 17, 2025].
  • [2] K. Kwiatkowski, et al., “Natural Questions:a Benchmark for Question Answering Research,” Trans. Assoc. Comput. Linguistics, vol. 7, pp. 453–466, 2019.
  • [3] S. Zhang et al., “Instruction Tuning for Large Language Models: A Survey,”arXiv,Aug.21,2023.[Online]. Available:https://arxiv.org/abs/2308.10792
  • [4] Z. Li et al., “Label Supervised Llama Finetuning (LS- Llama),” arXiv preprint 2310.01208, Oct. 2023. [Online]. Available: https://arxiv.org/abs/2310.01208
  • [5] T. Dettmers, “The Best GPUs for Deep Learning in 2023,” Tim DettmersBlog,2023.[Online].Available:https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/ [Accessed: 18-Jul-2025].
  • [6] J. Liang, G. Su, H. Lin, Y. Wu, R. Zhao, and Z. Li, “Reasoning RAG via System 1 or System 2: A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges,” arXiv preprint, arXiv:2506.10408, Jun. 2025. [Online]. Available: https://arxiv.org/abs/2506.10408
  • [7] “Natural Questions Filtered Dataset,” Kaggle, [Online]. Available: https://www.kaggle.com/datasets/allen-institute-for-ai/natural-questions.
  • [8] TensorFlow Datasets, “natural_questions_open” dataset, Splits: train (87,925), validation (3,610),2022 [Online]. Available:https://www.tensorflow.org/datasets/catalog/natural_questions_open.
  • [9] K. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “SQuAD: 100,000+ Questions for Machine Comprehension of Text,” in Proc. EMNLP, 2016.
  • [10] datarpit, distilbert-base-uncased-finetuned-natural-questions model card,2022.[Online].Available: https://huggingface.co/datarpit/distilbert-base-uncased-finetuned- natural-questions
  • [11] K. Lee, M.-W. Chang, and K. Toutanova, “Latent Retrieval for Weakly Supervised Open Domain Question Answering,” in Proc. ACL, pp. 6086–6096, Jul. 2019.
  • [12] Reece Shuttleworth, Jacob Andreas, Antonio Torralba, Pratyusha Sharma. "LoRA vs Full Fine-tuning: An Illusion of Equivalence." MIT CSAIL, 2023. [Online]. Available: https://arxiv.org/abs/2410.21228
  • [13] Tejaskumar Pujari, Anshul Goel, Ashwin Sharma. "Ensuring Responsible AI: The Role of Supervised Fine-Tuning (SFT) in Upholding Integrity and Privacy Regulations." Independent Researchers,India,2023.[Online].Available:https://www.researchgate.net/profile/TejaskumarPujari/publication/390944911_Ensuring_Responsible_AI_The_Role_of_Supervised_FineTuning_SFT_in_Upholding_Integrity_and_Privacy_Regulations/links/68085448bd3f1930dd630e4f/Ensuring-Responsible-AI-The-Role-of-Supervised-Fine-Tuning-SFT-in-Upholding-Integrity-and-Privacy-Regulations.pdf
  • [14] Joey Hong, Anca Dragan, Sergey Levine. "Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning." University of California,Berkeley,2023.[Online].Available:https://arxiv.org/abs/2411.05193
  • [15] Zhiqiang Hu, Lei Wang, Yihuai Lan, Wanyu Xu, Ee-Peng Lim, Lidong Bing, Xing Xu, Soujanya Poria, Roy Ka-Wei Lee. "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models." Singapore University of Technology and Design, 2023. [Online]. Available: https://arxiv.org/abs/2304.01933
  • [16] Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer. "QLoRA: Efficient Finetuning of Quantized LLMs." University of Washington,2023.[Online].Available:https://proceedings.neurips.cc/paper_files/paper/2023/hash/1feb87871436031bdc0f2beaa62a049b-Abstract-Conference.html
  • [17] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Riedel, S. (2021). Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv preprint arXiv:2005.11401. https://arxiv.org/pdf/2005.11401.pdf
  • [18] Barnett, S., Kurniawan, S., Thudumu, S., Brannelly, Z., & Abdelrazek, M. (2024). Seven failure points when engineering a retrieval augmented generation system. arXiv preprint arXiv:2401.05856. https://arxiv.org/pdf/2401.05856.pdf
  • [19] Singh, A., Ehtesham, A., Kumar, S., & Khoei, T. T. (2025). Agentic retrieval-augmented generation: A survey on agentic RAG. arXiv preprint arXiv:2501.09136. https://arxiv.org/pdf/2501.09136.pdf
  • [20] D. Patterson et al., “The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink,” arXiv preprint arXiv:2204.05149, 2022. [Online]. Available: https://arxiv.org/abs/2204.05149
  • [21] Y. Wang, F. Wei, S. Feng, M. Wang, and M. Zhou, MiniLM: Deep Self- Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers, in Advances in Neural Information Processing Systems 33 (NeurIPS 2020) Proceedings, 2020, arXiv preprint arXiv:2002.10957.[Online].Available:https://arxiv.org/abs/2002.10957
  • [22] “Comparing Enterprise NVIDIA GPUs: A100, H100, T4...,” Server-Parts.eu, 2024. [Online]. Available: https://www.server-parts.eu/post/nvidia-gpu. [Accessed: 18-Jul-2025].
  • [23] O. Wehrens, “OpenAI Whisper Benchmark Nvidia Tesla T4 / A100,” Owehrens.com,Jan.2023.[Online].Available:https://owehrens.com/openai-whisper-benchmark-on-nvidia-tesla-t4-a100/ [Accessed: 18-Jul-2025].
  • [24] “NVIDIA A100 Tensor Core GPU Architecture In-Depth,” NVIDIA Blog,2020.[Online].Available:https://blogs.nvidia.com/blog/2020/05/14/nvidia-a100/. [Accessed: 18-Jul-2025].
  • [25] “NVIDIA L4 vs. A100 GPUs: Choosing the Right Option...,” e2enetworks.com,2024.[Online].Available:https://e2enetworks.com/blog/nvidia-l4-vs-a100-gpus. [Accessed: 18-Jul-2025].
  • [26] Baseten, “NVIDIA A10 vs A100 GPUs for LLM and Stable Diffusion Inference,” Baseten Blog, 2024. [Online]. Available: https://www.baseten.co/blog/nvidia-a10-vs-a100-gpus-for-llm-and-stable-diffusion-inferenc/ [Accessed: 18-Jul-2025].
  • [27] Hugging Face, “Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA,” Hugging Face Blog, May 24, 2023. [Online]. Available: https://huggingface.co/blog/4bit-transformers-bitsandbytes
  • [28] A. Coelho, “Quantization in LLMs: Why Does It Matter?”, Dataiku Blog,Jan 10, 2024.[Online].Available:https://blog.dataiku.com/quantization-in-llms-why-does-it-matter/
  • [29] S. Bajpai, “Shrinking Elephants: A Funny Guide to 4bit and 8bit Quantization for LLMs with LoRA,” Medium, Sep 27, 2024. [Online].Available: https://medium.com/@shikharstruck/shrinking-elephants-a-funny-guide-to-4-bit-and-8-bit-quantization-for-llms-with-lora-ddf9f1a62070
  • [30] J. Kim, J. Lee, and S. Moon, “MemoryEfficient FineTuning of Compressed Large Language Models via sub4bit Integer Quantization,” arXiv preprint, arXiv:2305.14152, May 23, 2023. [Online]. Available: https://arxiv.org/abs/2305.14152
  • [31] “Why does the falcon QLoRA tutorial code use eos_token as pad_token?”, Hugging Face Discuss, Jul. 7, 2023. [Online]. Available: https://discuss.huggingface.co/t/why-does-the-falcon-qlora-tutorial-code-use-eos-token-as-pad-token/45954
  • [32] “Finetuning GPT: problems with padding”, Issue #8452, Hugging FaceTransformers,Oct. 2019.[Online].Available:https://github.com/huggingface/transformers/issues/8452
  • [33] Brando_Miranda, “How does one set the pad token correctly (not to eos) during finetuning to avoid model not predicting EOS?”, PyTorch Forums,Nov. 29, 2024.[Online].Available:https://discuss.pytorch.org/t/how-does-one-set-the-pad-token-correctly-not-to-eos-during-finetuning-to-avoid-model-notpredictingeos/213619
  • [34] mlabonne, “Issue with pad_token == eos_token: model not ‘learning when to stop’”, GitHub, Apr. 17, 2023. [Online]. Available: https://github.com/mlabonne/llm-course/issues/33
  • [35] E. J. Hu et al., “LoRA: LowRank Adaptation of Large Language Models,” *ICLR*, Feb. 2021. [Online]. Available: https://arxiv.org/abs/2106.09685
  • [36] M. Marek et al., “Small Batch Size Training for Language Models,” arXiv preprint, Jul. 2025. [Online]. Available: https://arxiv.org/abs/2507.07101
  • [37] Z. Huang et al., “Measuring the Impact of Gradient Accumulation on Cloud-Based Distributed Training,” CCGrid 2023. [Online]. Available: https://www.researchgate.net/publication/372262958_Measuring_the_Impact_of_Gradient_Accumulation_on_Cloud-based_Distributed_Training
  • [38] H. Jin, W. Wei, X. Wang, W. Zhang, and Y. Wu, "Rethinking Learning Rate Tuning in the Era of Large Language Models," arXiv preprint arXiv:2309.08859, Sep. 2023. [Online]. Available: https://arxiv.org/abs/2309.08859
  • [39] C. Li et al., “The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models,” arXiv, Aug. 2021. [Online]. Available: https://arxiv.org/abs/2108.06084
  • [40] S. Li et al., “Taming LLMs by Scaling Learning Rates with Gradient Grouping,” arXiv preprint, Jun. 2025. [Online]. Available: https://arxiv.org/abs/2506.01049
  • [41] Hugging Face, “Trainer Arguments Explained,” Hugging Face Docs, 2025.[Online].Available:https://huggingface.co/docs/transformers/main_classes/trainer
  • [42] T. Dettmers et al., “8-bit Optimizers via Block-wise Quantization,” arXivpreprint,Dec. 2021.[Online].Available:https://arxiv.org/abs/2110.02861
  • [43] Hugging Face, “8-bit Optimizers and Memory Efficiency,” Hugging FaceDocs,2024.[Online].Available:https://huggingface.co/docs/bitsandbytes/optimizers
  • [44] FoundationAI,“Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report,” Technical Report, 2024. [Online]. Available: https://www.arxiv.org/pdf/2508.01059
  • [45] R. Lakatos, P. Pollner, A. Hajdu, and T. Joo, “Investigating the Performance of Retrieval-Augmented Generation and Fine-Tuning for the Development of AI-Driven Knowledge-Based Systems,” arXiv preprint, 2024. [Online]. Available: https://arxiv.org/abs/2403.09727
  • [46] C. Rashtchian, H. Joren, J. Zhang, C.-S. Ferng, D.-C. Juan, and A. Taly, “Sufficient Context: A New Lens on Retrieval-Augmented Generation Systems,” arXiv preprint, 2025. [Online]. Available: https://arxiv.org/abs/2411.06037
Toplam 46 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Doğal Dil İşleme
Bölüm Araştırma Makalesi
Yazarlar

Aslı Güngör 0009-0009-2916-2488

Büşra Nur Emir 0009-0009-8019-9999

Sedanur Yılmaz 0009-0008-2218-9031

Melike Akdağ 0000-0002-7779-4756

Ali Berkol 0000-0002-3056-1226

Proje Numarası -
Gönderilme Tarihi 15 Ekim 2025
Kabul Tarihi 26 Kasım 2025
Erken Görünüm Tarihi 26 Kasım 2025
Yayımlanma Tarihi 30 Kasım 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 9 Sayı: 2

Kaynak Göster

IEEE A. Güngör, B. N. Emir, S. Yılmaz, M. Akdağ, ve A. Berkol, “An Integrative Approach to LLM Literature with the Combination of QLoRa, SFT and Agentic RAG”, IJMSIT, c. 9, sy. 2, ss. 249–261, 2025.