The Performance and Reliability of Generative AI Models in Software Development: A C# Based Analysis
Abstract
The effectiveness of generative artificial intelligence models in software development is determined not only by their ability to generate correct solutions but also by their adherence to quality metrics and their resilience to exceptional scenarios. In this context, a comparative evaluation was conducted on four models using 10 fundamental algorithm problems and 10 object-oriented programming problems in the C# programming language. The generated solutions were assessed in terms of time complexity, memory usage, lines of code, number of variables and methods, and execution time. In addition, meaningful edge-case scenarios were employed to measure error tolerance and exception handling performance. The findings indicate that all models produced functionally valid solutions, yet exhibited limitations in advanced software engineering practices such as modularity, comprehensive error management, performance measurement, and unit testing. The analysis revealed that ChatGPT and Gemini stood out in terms of structure and consistency, Claude demonstrated greater reliability in handling errors, while Copilot offered advantages in code simplicity. Overall, the results highlight the importance of evaluating generative AI models not only under ideal conditions but also in atypical scenarios to ensure software quality and reliability.
Keywords
GenAI, LLM, Programing, AI-assisted coding