The Performance and Reliability of Generative AI Models in Software Development: A C# Based Analysis

Halil Kaynarpınar; Abdulkadir Şeker

doi:10.51354/mjen.1784716

Research Article

Year 2025, Volume: 13 Issue: 2, 125 - 142, 29.12.2025

Halil Kaynarpınar Abdulkadir Şeker

https://doi.org/10.51354/mjen.1784716

https://izlik.org/JA66DS52CR

Abstract

References

[1] A. Bozkurt, (2023) “ChatGPT, Üretken Yapay Zeka ve Algoritmik Paradigma Değişikliği”, Alanyazın, c. 4, sayı 1, ss. 63–72, doi: 10.59320/alanyazin.1283282.
[2] S. Bulut, (2024) “Üretken Yapay Zeka Teknolojisi : GPT -4o”, Uluslararası İleri Doğa Bilimleri ve Mühendislik Araştırmaları Dergisi, sayı 8, ss. 380-387, 4.
[3] A. Kahveci̇ Yeti̇ş ve R. Daş, (2022) “Yazılım Ürün Ölçütlerinin Uygulamalı İncelenmesi”, Fırat Üniversitesi Mühendislik Bilim. Derg., c. 34, sayı 2, ss. 635–645, Eyl. doi: 10.35234/FUMBD.1114056.
[4] T. Demirhan, (2024) “Yazılım Geliştirme Öğreniminde Beceri Derinliği ve Dil Yeterliliğinin Yapay Zekâ ile Entegrasyonu”, c. 7, sayı 4, ss. 382–399.
[5] M. Hanefi Calp ve N. Arici, (2011) “Nesne yönelimli tasarım metrikleri ve kalite özellikleriyle ilişkisi”, Politek. Derg. J. Polytech. Cilt Digit. Object Identifier, c. 14141, sayı 10, ss. 9–14.
[6] U. Erdemir, U. Tekin, F. Bulut, “Nesneye Dayalı Yazılım Metrikleri ve Yazılım Kalitesi Object Oriented Software Metrics and Software Quality”, web.itu.edu.tr/buzluca/ykgs08_2.pdf.
[7] A. Abdou ve N. Darwish, (2024) “Severity classification of software code smells using machine learning techniques: A comparative study”, J. Softw. Evol. Process, c. 36, sayı 1, doi: 10.1002/smr.2454.
[8] A. Kıral ve T. E. Ayyıldız, (2018) “Yazılım Kalite Metriklerinin Kıyaslanması: Örnek Bir Olay İncelemesi”, 12. Ulusal Yazılım Mühendisliği Sempozyumu (UYMS' 2018) , İstanbul, Türkiye.
[9] M. Monteiro ve e. a. (2023), “End-to-End Software Construction using ChatGPT : An Experience Report End-to-End Software Construction using ChatGPT : An Experience Report”, sayı October, doi: 10.13140/RG.2.2.18968.98566.
[10] W. Y. Chen, (2024) “Intelligent Tutor: Leveraging ChatGPT and Microsoft Copilot Studio to Deliver a Generative AI Student Support and Feedback System within Teams”, arXiv:2405.13024.
[11] M. Hanefi Calp ve N. Arici, (2011) “Nesne yönelimli tasarım metrikleri ve kalite özellikleriyle ilişkisi”, Politek. Derg. J. Polytech. Cilt Digit. Object Identifier, c. 14141, sayı 10, ss. 9–14.
[12] W. Rahmaniar, (2024) “ChatGPT For Software Development: Opportunities and Challeges”, IT Prof., c. 26, sayı 3, ss. 80–86, doi: 10.1109/MITP.2024.3379831.
[13] I. Damyanov, N. Tsankov ve I. Nedyalkov, (2024) “Applications of Generative Artificial Intelligence in the Software Industry”, TEM J., c. 13, sayı 4, ss. 2724–2733, doi: 10.18421/TEM134-10.
[14] N. A. Ernst ve G. Bavota, (2022) “AI-Driven Development Is Here: Should You Worry?”, IEEE Softw., c. 39, sayı 2, ss. 106–110, doi: 10.1109/MS.2021.3133805.
[15] B. Petryshyn ve M. Lukoševičius, (2024) “Optimizing Large Language Models for OpenAPI Code Completion”, ss. 1–13, [Çevrimiçi]. Available at: http://arxiv.org/abs/2405.15729.
[16] Z. Özpolat, Ö. Yıldırım ve M. Karabatak, (2023) “Artificial Intelligence-Based Tools in Software Development Processes: Application of ChatGPT”, Eur. J. Tech., c. 13, sayı 2, ss. 229–240, doi: 10.36222/ejt.1330631.
[17] M. Chen ve e. a. (2021), “Evaluating Large Language Models Trained on Code”, arXiv:2107.03374v2.
[18] A. Mastropaolo ve e. a. (2023), “On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot”, içinde Proceedings - International Conference on Software Engineering, ss. 2149–2160. doi: 10.1109/ICSE48619.2023.00181.
[19] B. Rozière ve e. a. (2023), “Code Llama: Open Foundation Models for Code”, ss. 1–48, [Çevrimiçi]. Available at: http://arxiv.org/abs/2308.12950.
[20] D. Kim, (2024) “Comparing Proficiency of ChatGPT and Bard in Software Development”, içinde Generative AI for Effective Software Development, ss. 25–51. doi: 10.1007/978-3-031-55642-5_2.
[21] D. K. Kim ve e. a. (2023), “Assessment of ChatGPT’s Proficiency in Software Development”, içinde Proceedings - 2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023, ss. 2637–2644. doi: 10.1109/CSCE60160.2023.00421.
[22] N. Anderson ve e. a. (2024), “Using ChatGPT in Software Development Education”, içinde IEEE Global Engineering Education Conference, EDUCON, IEEE, ss. 1–5. doi: 10.1109/EDUCON60312.2024.10578808.
[23] T. Wu ve e. a. (2023), “A Brief Overview of ChatGPT: The History, Status Quo and Potential Future Development”, IEEE/CAA J. Autom. Sin., c. 10, sayı 5, ss. 1122–1136, doi: 10.1109/JAS.2023.123618.
[24] J. Savelka ve e. a. (2023), “From GPT-3 to GPT-4: On the Evolving Efficacy of LLMs to Answer Multiple- choice Questions for Programming Classes in Higher Education”, sayı February, doi: 10.1007/978-3-031- 53656-4.
[25] S. Zhang ve e. a. (2022), “OPT: Open Pre-trained Transformer Language Models”, [Çevrimiçi]. Available at: http://arxiv.org/abs/2205.01068
[26] C. Ebert and P. Louridas, "Generative AI for Software Practitioners," IEEE Software, vol. 40, no. 4, pp. 30–38, July-Aug. 2023.
[27] D. Tosi, "Studying the Quality of Source Code Generated by Different AI Generative Engines: An Empirical Evaluation," Future Internet, vol. 16, no. 6, p. 188, 2024.
[28] M. A. Florez Muñoz, J. C. Jaramillo De La Torre, S. Pareja López, S. Herrera, and C. A. Candela Uribe, "Comparative Study of AI Code Generation Tools: Quality Assessment and Performance Analysis," LatIA, vol. 1, no. 2, pp. 1-13, 2024.
[29] A. Fan, B. Gokkaya, M. Harman, M. Lyubarskiy, S. Sengupta, S. Yoo, and J. M. Zhang, "Large Language Models for Software Engineering: Survey and Open Problems," in 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE), 2023, pp. 31–53.
[30] L. Freeman, J. Robert, and H. Wojton, "The Impact of Generative AI on Test & Evaluation: Challenges and Opportunities," in Proceedings of the 2025 ACM International Conference on the Foundations of Software Engineering (SIGSOFT FSE Companion), 2025.
[31] M. Silhadi ve e. a. (2025), “Assessing the performance of Microsoft Copilot, GPT-4 and Google Gemini in ophthalmology”, Can. J. Ophthalmol., c. 60, sayı 4, ss. e507–e514, doi: 10.1016/j.jcjo.2025.01.001.
[32] S. Xu ve e. a. (2024), “Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction”, Comput. Educ. Artif. Intell., c. 7, sayı October, s. 100325, doi: 10.1016/j.caeai.2024.100325.
[33] C. M. Rosca ve A. Stancu, (2025) “Quality assessment of GPT-3.5 and Gemini 1.0 Pro for SQL syntax”, Comput. Stand. Interfaces, c. 95, s. 104041, doi: 10.1016/j.csi.2025.104041.
[34] C. Y. Huang ve e. a. (2025), “Performance of ChatGPT-4, Gemini, and DeepSeek-V3 on answering the multiple choice questions from Taiwan national dental technician licensing examinations and their self- learning abilities over a three-week period”, J. Dent. Sci., sayı xxxx, doi: 10.1016/j.jds.2025.07.011.
[35] Q. Jiang, Z. Gao ve G. E. Karniadakis, (2025) “DeepSeek vs. ChatGPT vs. Claude: A comparative study for scientific computing and scientific machine learning tasks”, Theor. Appl. Mech. Lett., c. 15, sayı 3, s. 100583, doi: 10.1016/j.taml.2025.100583.
[36] https://github.com/KAYNARPINAR/EkDosya.git

The Performance and Reliability of Generative AI Models in Software Development: A C# Based Analysis

Year 2025, Volume: 13 Issue: 2, 125 - 142, 29.12.2025

Halil Kaynarpınar Abdulkadir Şeker

https://doi.org/10.51354/mjen.1784716

https://izlik.org/JA66DS52CR

Abstract

The effectiveness of generative artificial intelligence models in software development is determined not only by their ability to generate correct solutions but also by their adherence to quality metrics and their resilience to exceptional scenarios. In this context, a comparative evaluation was conducted on four models using 10 fundamental algorithm problems and 10 object-oriented programming problems in the C# programming language. The generated solutions were assessed in terms of time complexity, memory usage, lines of code, number of variables and methods, and execution time. In addition, meaningful edge-case scenarios were employed to measure error tolerance and exception handling performance. The findings indicate that all models produced functionally valid solutions, yet exhibited limitations in advanced software engineering practices such as modularity, comprehensive error management, performance measurement, and unit testing. The analysis revealed that ChatGPT and Gemini stood out in terms of structure and consistency, Claude demonstrated greater reliability in handling errors, while Copilot offered advantages in code simplicity. Overall, the results highlight the importance of evaluating generative AI models not only under ideal conditions but also in atypical scenarios to ensure software quality and reliability.

Keywords

GenAI , LLM , Programing , AI-assisted coding

References

[1] A. Bozkurt, (2023) “ChatGPT, Üretken Yapay Zeka ve Algoritmik Paradigma Değişikliği”, Alanyazın, c. 4, sayı 1, ss. 63–72, doi: 10.59320/alanyazin.1283282.
[2] S. Bulut, (2024) “Üretken Yapay Zeka Teknolojisi : GPT -4o”, Uluslararası İleri Doğa Bilimleri ve Mühendislik Araştırmaları Dergisi, sayı 8, ss. 380-387, 4.
[3] A. Kahveci̇ Yeti̇ş ve R. Daş, (2022) “Yazılım Ürün Ölçütlerinin Uygulamalı İncelenmesi”, Fırat Üniversitesi Mühendislik Bilim. Derg., c. 34, sayı 2, ss. 635–645, Eyl. doi: 10.35234/FUMBD.1114056.
[4] T. Demirhan, (2024) “Yazılım Geliştirme Öğreniminde Beceri Derinliği ve Dil Yeterliliğinin Yapay Zekâ ile Entegrasyonu”, c. 7, sayı 4, ss. 382–399.
[5] M. Hanefi Calp ve N. Arici, (2011) “Nesne yönelimli tasarım metrikleri ve kalite özellikleriyle ilişkisi”, Politek. Derg. J. Polytech. Cilt Digit. Object Identifier, c. 14141, sayı 10, ss. 9–14.
[6] U. Erdemir, U. Tekin, F. Bulut, “Nesneye Dayalı Yazılım Metrikleri ve Yazılım Kalitesi Object Oriented Software Metrics and Software Quality”, web.itu.edu.tr/buzluca/ykgs08_2.pdf.
[7] A. Abdou ve N. Darwish, (2024) “Severity classification of software code smells using machine learning techniques: A comparative study”, J. Softw. Evol. Process, c. 36, sayı 1, doi: 10.1002/smr.2454.
[8] A. Kıral ve T. E. Ayyıldız, (2018) “Yazılım Kalite Metriklerinin Kıyaslanması: Örnek Bir Olay İncelemesi”, 12. Ulusal Yazılım Mühendisliği Sempozyumu (UYMS' 2018) , İstanbul, Türkiye.
[9] M. Monteiro ve e. a. (2023), “End-to-End Software Construction using ChatGPT : An Experience Report End-to-End Software Construction using ChatGPT : An Experience Report”, sayı October, doi: 10.13140/RG.2.2.18968.98566.
[10] W. Y. Chen, (2024) “Intelligent Tutor: Leveraging ChatGPT and Microsoft Copilot Studio to Deliver a Generative AI Student Support and Feedback System within Teams”, arXiv:2405.13024.
[11] M. Hanefi Calp ve N. Arici, (2011) “Nesne yönelimli tasarım metrikleri ve kalite özellikleriyle ilişkisi”, Politek. Derg. J. Polytech. Cilt Digit. Object Identifier, c. 14141, sayı 10, ss. 9–14.
[12] W. Rahmaniar, (2024) “ChatGPT For Software Development: Opportunities and Challeges”, IT Prof., c. 26, sayı 3, ss. 80–86, doi: 10.1109/MITP.2024.3379831.
[13] I. Damyanov, N. Tsankov ve I. Nedyalkov, (2024) “Applications of Generative Artificial Intelligence in the Software Industry”, TEM J., c. 13, sayı 4, ss. 2724–2733, doi: 10.18421/TEM134-10.
[14] N. A. Ernst ve G. Bavota, (2022) “AI-Driven Development Is Here: Should You Worry?”, IEEE Softw., c. 39, sayı 2, ss. 106–110, doi: 10.1109/MS.2021.3133805.
[15] B. Petryshyn ve M. Lukoševičius, (2024) “Optimizing Large Language Models for OpenAPI Code Completion”, ss. 1–13, [Çevrimiçi]. Available at: http://arxiv.org/abs/2405.15729.
[16] Z. Özpolat, Ö. Yıldırım ve M. Karabatak, (2023) “Artificial Intelligence-Based Tools in Software Development Processes: Application of ChatGPT”, Eur. J. Tech., c. 13, sayı 2, ss. 229–240, doi: 10.36222/ejt.1330631.
[17] M. Chen ve e. a. (2021), “Evaluating Large Language Models Trained on Code”, arXiv:2107.03374v2.
[18] A. Mastropaolo ve e. a. (2023), “On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot”, içinde Proceedings - International Conference on Software Engineering, ss. 2149–2160. doi: 10.1109/ICSE48619.2023.00181.
[19] B. Rozière ve e. a. (2023), “Code Llama: Open Foundation Models for Code”, ss. 1–48, [Çevrimiçi]. Available at: http://arxiv.org/abs/2308.12950.
[20] D. Kim, (2024) “Comparing Proficiency of ChatGPT and Bard in Software Development”, içinde Generative AI for Effective Software Development, ss. 25–51. doi: 10.1007/978-3-031-55642-5_2.
[21] D. K. Kim ve e. a. (2023), “Assessment of ChatGPT’s Proficiency in Software Development”, içinde Proceedings - 2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023, ss. 2637–2644. doi: 10.1109/CSCE60160.2023.00421.
[22] N. Anderson ve e. a. (2024), “Using ChatGPT in Software Development Education”, içinde IEEE Global Engineering Education Conference, EDUCON, IEEE, ss. 1–5. doi: 10.1109/EDUCON60312.2024.10578808.
[23] T. Wu ve e. a. (2023), “A Brief Overview of ChatGPT: The History, Status Quo and Potential Future Development”, IEEE/CAA J. Autom. Sin., c. 10, sayı 5, ss. 1122–1136, doi: 10.1109/JAS.2023.123618.
[24] J. Savelka ve e. a. (2023), “From GPT-3 to GPT-4: On the Evolving Efficacy of LLMs to Answer Multiple- choice Questions for Programming Classes in Higher Education”, sayı February, doi: 10.1007/978-3-031- 53656-4.
[25] S. Zhang ve e. a. (2022), “OPT: Open Pre-trained Transformer Language Models”, [Çevrimiçi]. Available at: http://arxiv.org/abs/2205.01068
[26] C. Ebert and P. Louridas, "Generative AI for Software Practitioners," IEEE Software, vol. 40, no. 4, pp. 30–38, July-Aug. 2023.
[27] D. Tosi, "Studying the Quality of Source Code Generated by Different AI Generative Engines: An Empirical Evaluation," Future Internet, vol. 16, no. 6, p. 188, 2024.
[28] M. A. Florez Muñoz, J. C. Jaramillo De La Torre, S. Pareja López, S. Herrera, and C. A. Candela Uribe, "Comparative Study of AI Code Generation Tools: Quality Assessment and Performance Analysis," LatIA, vol. 1, no. 2, pp. 1-13, 2024.
[29] A. Fan, B. Gokkaya, M. Harman, M. Lyubarskiy, S. Sengupta, S. Yoo, and J. M. Zhang, "Large Language Models for Software Engineering: Survey and Open Problems," in 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE), 2023, pp. 31–53.
[30] L. Freeman, J. Robert, and H. Wojton, "The Impact of Generative AI on Test & Evaluation: Challenges and Opportunities," in Proceedings of the 2025 ACM International Conference on the Foundations of Software Engineering (SIGSOFT FSE Companion), 2025.
[31] M. Silhadi ve e. a. (2025), “Assessing the performance of Microsoft Copilot, GPT-4 and Google Gemini in ophthalmology”, Can. J. Ophthalmol., c. 60, sayı 4, ss. e507–e514, doi: 10.1016/j.jcjo.2025.01.001.
[32] S. Xu ve e. a. (2024), “Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction”, Comput. Educ. Artif. Intell., c. 7, sayı October, s. 100325, doi: 10.1016/j.caeai.2024.100325.
[33] C. M. Rosca ve A. Stancu, (2025) “Quality assessment of GPT-3.5 and Gemini 1.0 Pro for SQL syntax”, Comput. Stand. Interfaces, c. 95, s. 104041, doi: 10.1016/j.csi.2025.104041.
[34] C. Y. Huang ve e. a. (2025), “Performance of ChatGPT-4, Gemini, and DeepSeek-V3 on answering the multiple choice questions from Taiwan national dental technician licensing examinations and their self- learning abilities over a three-week period”, J. Dent. Sci., sayı xxxx, doi: 10.1016/j.jds.2025.07.011.
[35] Q. Jiang, Z. Gao ve G. E. Karniadakis, (2025) “DeepSeek vs. ChatGPT vs. Claude: A comparative study for scientific computing and scientific machine learning tasks”, Theor. Appl. Mech. Lett., c. 15, sayı 3, s. 100583, doi: 10.1016/j.taml.2025.100583.
[36] https://github.com/KAYNARPINAR/EkDosya.git

There are 36 citations in total.

Details

Primary Language	English
Subjects	Natural Language Processing, Automated Software Engineering, Programming Languages
Journal Section	Research Article
Authors	Halil Kaynarpınar This is me 0009-0003-3270-6825 Abdulkadir Şeker 0000-0002-4552-2676
Submission Date	September 16, 2025
Acceptance Date	November 25, 2025
Publication Date	December 29, 2025
DOI	https://doi.org/10.51354/mjen.1784716
IZ	https://izlik.org/JA66DS52CR
Published in Issue	Year 2025 Volume: 13 Issue: 2

Cite

APA	Kaynarpınar, H., & Şeker, A. (2025). The Performance and Reliability of Generative AI Models in Software Development: A C# Based Analysis. MANAS Journal of Engineering, 13(2), 125-142. https://doi.org/10.51354/mjen.1784716
AMA	1.Kaynarpınar H, Şeker A. The Performance and Reliability of Generative AI Models in Software Development: A C# Based Analysis. MJEN. 2025;13(2):125-142. doi:10.51354/mjen.1784716
Chicago	Kaynarpınar, Halil, and Abdulkadir Şeker. 2025. “The Performance and Reliability of Generative AI Models in Software Development: A C# Based Analysis”. MANAS Journal of Engineering 13 (2): 125-42. https://doi.org/10.51354/mjen.1784716.
EndNote	Kaynarpınar H, Şeker A (December 1, 2025) The Performance and Reliability of Generative AI Models in Software Development: A C# Based Analysis. MANAS Journal of Engineering 13 2 125–142.
IEEE	[1]H. Kaynarpınar and A. Şeker, “The Performance and Reliability of Generative AI Models in Software Development: A C# Based Analysis”, MJEN, vol. 13, no. 2, pp. 125–142, Dec. 2025, doi: 10.51354/mjen.1784716.
ISNAD	Kaynarpınar, Halil - Şeker, Abdulkadir. “The Performance and Reliability of Generative AI Models in Software Development: A C# Based Analysis”. MANAS Journal of Engineering 13/2 (December 1, 2025): 125-142. https://doi.org/10.51354/mjen.1784716.
JAMA	1.Kaynarpınar H, Şeker A. The Performance and Reliability of Generative AI Models in Software Development: A C# Based Analysis. MJEN. 2025;13:125–142.
MLA	Kaynarpınar, Halil, and Abdulkadir Şeker. “The Performance and Reliability of Generative AI Models in Software Development: A C# Based Analysis”. MANAS Journal of Engineering, vol. 13, no. 2, Dec. 2025, pp. 125-42, doi:10.51354/mjen.1784716.
Vancouver	1.Kaynarpınar H, Şeker A. The Performance and Reliability of Generative AI Models in Software Development: A C# Based Analysis. MJEN [Internet]. 2025 Dec. 1;13(2):125-42. Available from: https://izlik.org/JA66DS52CR

Article Files

Full Text

Manas Journal of Engineering

16155