Research Article
BibTex RIS Cite

Evaluating the Impact of Temperature and Instruction Strategies on Hallucination in Large Language Models

Year 2026, Issue: Advanced Online Publication, 13 - 32, 16.01.2026
https://doi.org/10.54287/gujsa.1819131
https://izlik.org/JA33TG34LW

Abstract

Large Language Models (LLMs) have demonstrated impressive generative and reasoning abilities, yet their tendency to produce factually incorrect or fabricated information—so-called hallucinations—remains a key limitation. This study systematically examines how temperature and system instruction strategies affect hallucination behavior in open-source LLMs executed through the Ollama framework. Three representative models—Gemma 2B, Mistral 7B Instruct, and Phi-3 Mini—were evaluated on the TruthfulQA benchmark using zero-shot, few-shot, and “say-I-don’t-know” prompting paradigms. Performance was measured through exact match, token-level F1, semantic similarity, and embedding-based similarity metrics. Two-way ANOVA and3 Tukey post-hoc analyses revealed that system instruction significantly influenced factual accuracy across all models, while temperature effects were comparatively minor. Few-shot prompting achieved the highest mean F1 score (0.1889), indicating that example conditioning effectively constrained hallucinations. Conversely, “say-I-don’t-know” prompts increased semantic alignment but reduced precision, suggesting a conservative refusal bias. Embedding-based similarity analyses confirmed higher semantic consistency for zero-shot responses. The results highlight that prompt design exerts a stronger and more interpretable influence on hallucination than sampling stochasticity, offering practical guidance for improving the factual reliability of open-source LLMs.

References

  • Abdin, M., Aneja, J., Awadalla, H., Awadallah, A., Awan, A. A., Bach, N., Bahree, A., Bakhtiari, A., Bao, J., Behl, H., Benhaim, A., Bilenko, M., Bjorck, J., Bubeck, S., Cai, M., Cai, Q., Chaudhary, V., Chen, D., Chen, D., … Zhou, X. (2024). Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. https://doi.org/10.48550/arXiv.2404.14219
  • Chen, C., & Shu, K. (2023). Can LLM-Generated Misinformation Be Detected?. In: Proceedings of the 12th International Conference on Learning Representations (ICLR 2024). https://doi.org/10.48550/arXiv.2309.13788
  • Coeckelbergh, M. (2025). LLMs, Truth, and Democracy: An Overview of Risks. Science and Engineering Ethics, 31(1), 4. https://doi.org/10.1007/S11948-025-00529-0
  • Dahl, M., Magesh, V., Suzgun, M., & Ho, D. E. (2024). Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models. Journal of Legal Analysis, 16(1), 64–93. https://doi.org/10.1093/JLA/LAAE003
  • Datasets. (2025). Hugging Face. https://huggingface.co/docs/datasets/en/index
  • DeepSeek. (2024). deepseek-llm. DeepSeek. https://ollama.com/library/deepseek-llm?utm_source=chatgpt.com
  • Du, W., Yang, Y., & Welleck, S. (2025, July 13-19). Optimizing Temperature for Language Models with Multi-Sample Inference. In: A. Singh, M. Fazel, D. Hsu, S. Lacoste-Julien, F. Berkenkamp, T. Maharaj, K. Wagstaff, & J. Zhu (Eds.), Proceedings of the 42nd International Conference on Machine Learning (PMLR 267), (pp. 14648–14668), Vancouver, Canada. https://proceedings.mlr.press/v267/du25f.html
  • Dziri, N., Milton, S., Yu, M., Zaiane, O., & Reddy, S. (2022). On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?. In: M. Carpuat, M.-C. de Marneffe, & I. V. M. Ruiz (Eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2022), (pp. 5271–5285), Seattle, United States. https://doi.org/10.18653/v1/2022.naacl-main.387
  • Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., Casas, D. de las, Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. Le, Lavril, T., Wang, T., Lacroix, T., & Sayed, W. El. (2023). Mistral 7B. 2310.06825. https://doi.org/10.48550/arXiv.2310.06825
  • Ladhak, F., Durmus, E., Suzgun, M., Zhang, T., Jurafsky, D., McKeown, K., & Hashimoto, T. (2023). When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization. In: A. Vlachos, & I. Augenstein (Eds.), Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), (pp. 3206–3219), Dubrovnik, Croatia. https://doi.org/10.18653/v1/2023.eacl-main.234
  • Lin, S., Hilton, J., & Evans, O. (2022). TruthfulQA: Measuring How Models Mimic Human Falsehoods. In: S. Muresan, P. Nakov, & A. Villavicencio (Eds.), Proceedings of the Annual Meeting of the 60th Association for Computational Linguistics, (vol. 1), (pp. 3214–3252), Dublin, Ireland. https://doi.org/10.18653/v1/2022.acl-long.229
  • Ma, B., Wang, X., Hu, T., Haensch, A. C., Hedderich, M. A., Plank, B., & Kreuter, F. (2024). The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models. In: Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Findings of the Association for Computational Linguistics (EMNLP 2024), (pp. 8783–8805), Miami, Florida, USA. https://doi.org/10.18653/V1/2024.FINDINGS-EMNLP.513
  • Ollama. (2025). Ollama. https://ollama.com
  • Roustan, D., & Bastardot, F. (2025). The Clinicians’ Guide to Large Language Models: A General Perspective With a Focus on Hallucinations. Interactive Journal of Medical Research, 14, e59823. https://doi.org/10.2196/59823
  • Sahoo, P., Meharia, P., Ghosh, A., Saha, S., Jain, V., & Chadha, A. (2024). A Comprehensive Survey of Hallucination in Large Language, Image, Video and Audio Foundation Models. In: Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Findings of the Association for Computational Linguistics (EMNLP 2024), (pp. 11709–11724), Miami, Florida, USA. https://doi.org/10.18653/v1/2024.findings-emnlp.685
  • sentence-transformers/all-MiniLM-L6-v2. (2020). Hugging Face. https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
  • Shorinwa, O., Mei, Z., Lidard, J., Ren, A. Z., & Majumdar, A. (2025). A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions. ACM Computing Surveys, 58(3), 63. https://doi.org/10.1145/3744238
  • Team, G., & Deepmind, G. (2024). Gemma: Open Models Based on Gemini Research and Technology.
  • Wachter, S., Mittelstadt, B., & Russell, C. (2024). Do large language models have a legal duty to tell the truth? Royal Society Open Science, 11(8), 240197. https://doi.org/10.1098/rsos.240197
  • Wei, J., Huang, D., Lu, Y., Zhou Quoc, D., & Le Google Deepmind, V. (2023). Simple synthetic data reduces sycophancy in large language models. https://doi.org/10.48550/arXiv.2308.03958
  • Ye, H., Liu, T., Zhang, A., Hua, W., & Jia, W. (2024). Cognitive Mirage: A Review of Hallucinations in Large Language Models. In: Proceedings of the First International OpenKG Workshop: Large Knowledge-Enhanced Models Co-Located with The International Joint Conference on Artificial Intelligence (IJCAI 2024), (pp. 14–36). https://github.com/hongbinye/Cognitive-Mirage-Hallucinations-in-LLMs
  • Zhang, M., Huang, M., Shi, R., Guo, L., Peng, C., Yan, P., Zhou, Y., & Qiu, X. (2024). Calibrating the Confidence of Large Language Models by Eliciting Fidelity. In: Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), 2024 Conference on Empirical Methods in Natural Language Processing, (pp. 2959–2979), Miami, Florida, USA. https://doi.org/10.18653/V1/2024.EMNLP-MAIN.173
  • Zhang, Y., Li, J., & Li, W. (2023). VIBE: Topic-Driven Temporal Adaptation for Twitter Classification. In: H. Bouamor, J. Pino, & K. Bali (Eds.), 2023 Conference on Empirical Methods in Natural Language Processing, (pp. 3340–3354), Singapore. https://doi.org/10.18653/v1/2023.emnlp-main.203
  • Zhang, Y., Li, S., Qian, C., Liu, J., Yu, P., Han, C., Fung, Y. R., Mckeown, K., Zhai, C., Li, M., & Ji, H. (2025). The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination. In: W. Che, J. Nabende, E. Shutova, & M. T. Pilehvar (Eds.), Findings of the Association for Computational Linguistics (ACL 2025), (pp. 23340–23358), Vienna, Austria. https://doi.org/10.18653/v1/2025.findings-acl.1199
There are 24 citations in total.

Details

Primary Language English
Subjects Knowledge Representation and Reasoning, Natural Language Processing
Journal Section Research Article
Authors

Abdullah Talha Kabakuş 0000-0003-2181-4292

Submission Date November 6, 2025
Acceptance Date January 1, 2026
Early Pub Date January 16, 2026
Publication Date January 16, 2026
DOI https://doi.org/10.54287/gujsa.1819131
IZ https://izlik.org/JA33TG34LW
Published in Issue Year 2026 Issue: Advanced Online Publication

Cite

APA Kabakuş, A. T. (2026). Evaluating the Impact of Temperature and Instruction Strategies on Hallucination in Large Language Models. Gazi University Journal of Science Part A: Engineering and Innovation, Advanced Online Publication, 13-32. https://doi.org/10.54287/gujsa.1819131

Aim & Scope

Gazi University Journal of Science and Technology A: Engineering and Innovation (GUJSA) journal publishes original and qualified research articles in all fields of engineering, technology and basic sciences. Technical notes, letters to the editor, discussions or case studies are excluded.

"Gazi University Journal of Science Part A: Engineering and Innovation (GUJSA)" is an open access journal and has adopted an Open Access Policy.

Innovative work, new methods, approaches or recent findings in engineering, technology and basic sciences with a balance of practice and theory are expected. Papers previously published in other journals are not accepted.

Submitted articles should be written in “MS Office Word” format in accordance with the “Article Writing Rules”.

Articles submitted in other formats will not be accepted.

"Article Writing Rules"

"Manuscript Template"


The following FOUR (4) files must be uploaded while submitting an article to our journal.

The article should be embedded in this template (without author names). Articles uploaded without the template will not be accepted

The Title of the Article, Article Section, Abstract and Keywords, as well as the Contact Information (e-mail) of the Corresponding Author and the Names, Addresses and ORCID Numbers of all Authors should be included

Authors retain the copyright of their articles.

4. Similarity Reports (iThenticate)
Articles with a similarity rate of more than 20% (or 4% from a single reference) and 30% (including references, figure/table captions, article template) cannot be accepted. Authors should check the similarity rate with “iThenticate” when submitting the article to the journal and upload the “iThenticate” files to the system during submission. The same procedure should be followed after the revision of the article (before publication).

Publication Ethics and Malpractice Statement

A-Ethical Principles

Gazi University Journal of Science and Technology Section A: Engineering and Innovation is committed to comply with international ethical rules and COPE principles in the publication process. The journal applies double-blind refereeing and terminates publications with ethical violations. The editorial office may request data from the authors; if not submitted, the manuscript will be rejected. At least two positive referee reports are required for publication acceptance and the publication order is determined according to the date of acceptance. The copyright form for accepted articles must be signed by all authors. In case of ethical violations, the journal can be notified by e-mail.

B-Ethical Responsibilities of the Authors

Manuscripts will not be accepted in the following cases: incomplete author information, non-contributing authors, out-of-scope studies, plagiarism/unethical publishing, failure to indicate the source of funding, if any, copyright infringement, targeting specific individuals/institutions, conflict of interest, incomplete reporting standards, failure to submit requested data, duplicate/simultaneous publication, insufficient citations/references, significant errors. Articles with a pre-publication similarity rate (iThenticate) exceeding 20% (4% from a single source) and 30% excluding references are not accepted. Authors must submit iThenticate report.

C-Ethical Responsibilities of the Reviewers

Editors improve the quality of publications, protect the public interest, keep all records, work impartially and confidentially, and manage unethical situations. Before sending the manuscripts to the referees, the relevant editor checks that they comply with the spelling rules, format, scope of the journal, publication and ethical principles, scientific quality and originality.

D-Ethical Responsibilities of the Editor, Editor in Chief, Managing/Assistant Editors, Foreign Editor Advisory Board, Language/Technical Editors

Gazi University Journal of Science and Technology, Part A: Engineering and Innovation is a double blind peer-reviewed, internationally indexed and open access journal published 4 issues a year in English. There is no review/submission fee and the process is approximately 8 weeks. The article in the referee process can be withdrawn by the relevant author via DergiPark; requests from other authors are evaluated by the editor-in-chief. The article cannot be sent elsewhere before the process is completed.

To ensure ethical and objective practices, our editors are expected to follow established industry standards as well as the specific policies set by our journal and publisher. For additional guidance, please consult the COPE Short Guide to Ethical Editing.

Generative AI refers to artificial intelligence technologies capable of producing text, images, or other content in response to user inputs. Tools such as ChatGPT, DALL-E, and similar models fall under this category.

Article submission and all other processes in the GUJSA are free of charge.

Owner

Publishing Manager

Fracture Mechanics, Materials Science and Technologies, Composite and Hybrid Materials, Material Production Technologies, Metals and Alloy Materials, Organic Semiconductors, Powder Metallurgy

Chief Editor

Physical Sciences, Condensed Matter Physics

Chief Editor

Mineral Stratum and Geochemistry

Assistant Editors

Chemical Sciences, Catalysis and Mechanisms of Reactions, Industrial Product Design, Engineering, Energy, Chemical Engineering, Separation Processes, Environmental and Sustainable Processes, Catalytic Activity, Materials Science and Technologies, Manufacturing and Service Systems, Supply Chains
Physical Sciences, Foundations of Quantum Mechanics, Quantum Technologies, Statistical Mechanics, Physical Combinatorics and Mathematical Aspects of Condensed Matter

Editorial Board

Reinforced Concrete Buildings, Earthquake Engineering, Numerical Modelization in Civil Engineering, Civil Construction Engineering
Condensed Matter Physics
Thermodynamics and Statistical Physics, Computational Methods in Fluid Flow, Heat and Mass Transfer (Incl. Computational Fluid Dynamics), Electrical Energy Generation (Incl. Renewables, Excl. Photovoltaics), Renewable Energy Resources , Energy Generation, Conversion and Storage (Excl. Chemical and Electrical), Automotive Combustion and Fuel Engineering, Heat Transfer in Automotive
Condensed Matter Physics, Semiconductors
Energy Systems Engineering, Energy
Decision Support and Group Support Systems, Multiple Criteria Decision Making, Optimization in Manufacturing
Geological Sciences and Engineering, Mineralogy- Petrography, Isotope Geochemistry, Geochemistry (Other), Geochronology, Igneous and Metamorphic Petrology, Mineralogy and Crystallography, Volcanology
Engineering Electromagnetics, Antennas and Propagation
Plant Cell and Molecular Biology, Bioinformatics and Computational Biology, Genetics
Technical, Vocational and Workplace Education, Development of Vocational Education , Internal Combustion Engines, Automotive Combustion and Fuel Engineering, Automotive Engineering (Other)
Energy, Geothermal Energy Systems, Construction Materials, Chemical and Thermal Processes in Energy and Combustion, Materials Science and Technologies, Chemical Engineering (Other)
Electroanalytical Chemistry, Electrochemistry, Nuclear Chemistry
Mathematical Sciences, Applied Statistics, Applied Mathematics
Information and Computing Sciences, Biological Sciences, Ecology, Microbiology, Ecological Applications, Environmental Biogeochemistry, Geography Education, Gender, Sexuality and Education, Ecology, Sustainability and Energy, Quality Management in Construction and Environment, Environmental Engineering, Waste Management, Reduction, Reuse and Recycling, Air Pollution Modelling and Control
Food Sciences, Meat Technology, Food Chemistry and Food Sensory Science
Material Design and Behaviors, Material Characterization, Metals and Alloy Materials, Powder Metallurgy, Materials Engineering (Other)
Fracture Mechanics, Mechanical Engineering, Machine Design and Machine Equipment, Material Design and Behaviors, Tribology, Nanotechnology
Human-Computer Interaction, Artificial Reality, Computer Software, Requirements Engineering
Information and Computing Sciences, Machine Learning, Data Mining and Knowledge Discovery, Artificial Intelligence
Chemical Sciences, Physical Chemistry, Reaction Kinetics and Dynamics, Polymerisation Mechanisms, Solid and Hazardous Wastes, Energy, Separation Processes, Environmental and Sustainable Processes, Nanotechnology
Engineering, Electrochemical Energy Storage and Conversion, Catalytic Activity
Inorganic Materials, Biomaterial , Composite and Hybrid Materials, Ceramics in Materials Engineering
Electrical Engineering, Electrical Energy Transmission, Networks and Systems, Electrical Energy Generation (Incl. Renewables, Excl. Photovoltaics), Power Plants
Hydrogeology, Geology of Engineering, Applied Geology
Applied Mathematics (Other)
Energy, Catalytic Activity, Materials Science and Technologies
Algebra and Number Theory, Combinatorics and Discrete Mathematics (Excl. Physical Combinatorics), Real and Complex Functions (Incl. Several Variables)
Construction Materials
Symbolic Calculation, Algebra and Number Theory, Combinatorics and Discrete Mathematics (Excl. Physical Combinatorics), Mathematical Methods and Special Functions

English Language Editors

Linguistics (Other)
English As A Second Language

Technical Editors

Geological Sciences and Engineering, Applied Geology, Geophysics, Electrical and Electromagnetic Methods in Geophysics