TY - JOUR T1 - Security Evaluation of AI-Generated Code: A Comparative Study of ChatGPT, Copilot, And Gemini through Static and Dynamic Analysis AU - Ceran, Onur PY - 2025 DA - August Y2 - 2025 JF - Gazi Journal of Engineering Sciences JO - GJES PB - Parantez Teknoloji WT - DergiPark SN - 2149-9373 SP - 304 EP - 320 VL - 11 IS - 2 LA - en AB - This study examines the security performance of generative artificial intelligence (AI) tools of ChatGPT, Copilot, and Gemini within software development workflows. Through static and dynamic code analysis, security vulnerabilities in web application login code generated by these tools were systematically evaluated. Results indicate that while AI models offer efficiency in code generation, they also introduce varying levels of security risk. Copilot exhibited the highest cumulative risk with multiple high-level vulnerabilities, while ChatGPT demonstrated a lower risk profile. Gemini produced relatively optimized code but contained critical security flaws that require manual review. The most common vulnerabilities across all models were insecure design and security logging and monitoring failures, indicating a systemic issue in AI-generated code. The findings emphasize that generic prompts focusing on security are insufficient and that developers must use specific, security-oriented prompts, such as applying secure-by-design principles and implementing OWASP Top Ten protections. This study contributes to the growing body of literature addressing the security implications of integrating AI into software development, highlighting the importance of human oversight and carefully crafted prompts to mitigate potential risks. KW - Generative AI KW - ChatGPT KW - Copilot KW - Gemini KW - Software Security KW - Static Code Analysis KW - Dynamic Code Analysis CR - [1] S. Feuerriegel, J. Hartmann, C. Janiesch, and P. Zschech, “Generative AI,” Bus Inf Syst Eng, vol. 66, no. 1, pp. 111–126, Feb. 2024. doi: 10.1007/s12599-023-00834-7 CR - [2] L. Banh and G. Strobel, “Generative artificial intelligence,” Electron Markets, vol. 33, no. 1, p. 63, Dec. 2023. doi: 10.1007/s12525-023-00680-1 CR - [3] P. Kokol, “The Use of AI in Software Engineering: A Synthetic Knowledge Synthesis of the Recent Research Literature,” Information, vol. 15, no. 6, p. 354, Jun. 2024. doi: 10.3390/info15060354 CR - [4] Y. Almeida et al., “AICodeReview: Advancing code quality with AI-enhanced reviews,” SoftwareX, vol. 26, p. 101677, May 2024. doi: 10.1016/j.softx.2024.101677 CR - [5] P. Vaithilingam, T. Zhang, and E. L. Glassman, “Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models,” in CHI Conference on Human Factors in Computing Systems Extended Abstracts, New Orleans LA USA: ACM, Apr. 2022, pp. 1–7. doi: 10.1145/3491101.3519665 CR - [6] R. Wang, R. Cheng, D. Ford, and T. Zimmermann, “Investigating and Designing for Trust in AI-powered Code Generation Tools,” in The 2024 ACM Conference on Fairness, Accountability, and Transparency, Rio de Janeiro Brazil: ACM, Jun. 2024, pp. 1475–1493. doi: 10.1145/3630106.3658984 CR - [7] D. Hanson, “Future of Code with Generative AI: Transparency and Safety in the Era of AI Generated Software,” 2025, arXiv. doi: 10.48550/ARXIV.2505.20303 CR - [8] M. Taeb, H. Chi, and S. Bernadin, “Assessing the Effectiveness and Security Implications of AI Code Generators,” CISSE, vol. 11, no. 1, p. 6, Feb. 2024. doi: 10.53735/cisse.v11i1.180 CR - [9] S. Panichella, “Vulnerabilities Introduced by LLMs Through Code Suggestions,” in Large Language Models in Cybersecurity, A. Kucharavy, O. Plancherel, V. Mulder, A. Mermoud, and V. Lenders, Eds., Cham: Springer Nature Switzerland, 2024, pp. 87–97. doi: 10.1007/978-3-031-54827-7_9 CR - [10] K. Cho, Y. Park, J. Kim, B. Kim, and D. Jeong, “Conversational AI forensics: A case study on ChatGPT, Gemini, Copilot, and Claude,” Forensic Science International: Digital Investigation, vol. 52, p. 301855, Mar. 2025. doi: 10.1016/j.fsidi.2024.301855 CR - [11] H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions,” Commun. ACM, vol. 68, no. 2, pp. 96–105, Feb. 2025. doi: 10.1145/3610721 CR - [12] M. Mehta, “A comparative study of AI code bots: Efficiency, features, and use cases,” Int. J. Sci. Res. Arch., vol. 13, no. 1, pp. 595–602, Sep. 2024. doi: 10.30574/ijsra.2024.13.1.1718 CR - [13] F. Fui-Hoon Nah, R. Zheng, J. Cai, K. Siau, and L. Chen, “Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration,” Journal of Information Technology Case and Application Research, vol. 25, no. 3, pp. 277–304, Jul. 2023. doi: 10.1080/15228053.2023.2233814 CR - [14] A. Sobo, A. Mubarak, A. Baimagambetov, and N. Polatidis, “Evaluating LLMs for Code Generation in HRI: A Comparative Study of ChatGPT, Gemini, and Claude,” Applied Artificial Intelligence, vol. 39, no. 1, p. 2439610, Dec. 2025. doi: 10.1080/08839514.2024.2439610 CR - [15] H. Pearce, B. Tan, B. Ahmad, R. Karri, and B. Dolan-Gavitt, “Examining Zero-Shot Vulnerability Repair with Large Language Models,” in 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA: IEEE, May 2023, pp. 2339–2356. doi: 10.1109/SP46215.2023.10179324 CR - [16] A. Sarkar, A. D. Gordon, C. Negreanu, C. Poelitz, S. S. Ragavan, and B. Zorn, “What is it like to program with artificial intelligence?,” Oct. 17, 2022. arXiv: arXiv:2208.06213. doi: 10.48550/arXiv.2208.06213 CR - [17] M. Arsal et al., “Emerging Cybersecurity and Privacy Threats of ChatGPT, Gemini, and Copilot: Current Trends, Challenges, and Future Directions,” Oct. 24, 2024. doi: 10.20944/preprints202410.1909.v1 CR - [18] C. K. Lo, “What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature,” Education Sciences, vol. 13, no. 4, p. 410, Apr. 2023. doi: 10.3390/educsci13040410 CR - [19] Gemini Team et al., “Gemini: A Family of Highly Capable Multimodal Models,” 2023, arXiv. doi: 10.48550/ARXIV.2312.11805 CR - [20] A. J. Adetayo, M. O. Aborisade, and B. A. Sanni, “Microsoft Copilot and Anthropic Claude AI in education and library service,” LHTN, Jan. 2024. doi: 10.1108/LHTN-01-2024-0002 CR - [21] N. Perry, M. Srivastava, D. Kumar, and D. Boneh, “Do Users Write More Insecure Code with AI Assistants?,” in Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, Copenhagen Denmark: ACM, Nov. 2023, pp. 2785–2799. doi: 10.1145/3576915.3623157 CR - [22] Y. V. Kharchenko and O. M. Babenko, “Advantages and limitations of large language models in chemistry education: A comparative analysis of ChatGPT, Gemini and Copilot,” in Proceedings of the Free Open-Access Proceedings for Computer Science Workshops, Lviv, Ukraine, 2024. pp. 42–59. Accessed: Jun. 12, 2025 CR - [23] N. Tihanyi, T. Bisztray, M. A. Ferrag, R. Jain, and L. C. Cordeiro, “How secure is AI-generated code: a large-scale comparison of large language models,” Empir Software Eng, vol. 30, no. 2, p. 47, Mar. 2025. doi: 10.1007/s10664-024-10590-1 CR - [24] Trend Micro, “Security Vulnerabilities of ChatGPT-Generated Code,” Trend Micro. Accessed: Jun. 16, 2025. [Online]. Available: https://www.trendmicro.com/en_us/research/23/e/chatgpt-security-vulnerabilities.html CR - [25] M. Kharma, S. Choi, M. AlKhanafseh, and D. Mohaisen, “Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis,” 2025. arXiv. doi: 10.48550/ARXIV.2502.01853 CR - [26] D. Tosi, “Studying the Quality of Source Code Generated by Different AI Generative Engines: An Empirical Evaluation,” Future Internet, vol. 16, no. 6, p. 188, May 2024. doi: 10.3390/fi16060188 CR - [27] P. Smutny and M. Bojko, “Comparative Analysis of Chatbots Using Large Language Models for Web Development Tasks,” Applied Sciences, vol. 14, no. 21, p. 10048, Nov. 2024. doi: 10.3390/app142110048 CR - [28] Y. Yigit, W. J. Buchanan, M. G. Tehrani, and L. Maglaras, “Review of generative ai methods in cybersecurity,” arXiv preprint arXiv:2403.08701, 2024, Accessed: Jun. 12, 2025. [Online]. Available: https://storage.prod.researchhub.com/uploads/papers/2024/04/24/2403.08701.pdf CR - [29] A. H. Mohsin, I. M. Rahi, and R. A. Hussain, A Study of 2.5 D Face Recognition for Forensic Analysis. IJCSMC, 2020. CR - [30] G. M. Kapitsaki, “Generative AI for Code Generation: Software Reuse Implications,” in Reuse and Software Quality, vol. 14614, A. Achilleos, L. Fuentes, and G. A. Papadopoulos, Eds., in Lecture Notes in Computer Science, vol. 14614, Cham: Springer Nature Switzerland, 2024, pp. 37–47. doi: 10.1007/978-3-031-66459-5_3 CR - [31] D. Palla and A. Slaby, “Evaluation of Generative AI Models in Python Code Generation: A Comparative Study,” IEEE Access, vol. 13, pp. 65334–65347, 2025. doi: 10.1109/ACCESS.2025.3560244 CR - [32] C. Chahar, V. S. Chauhan, and M. L. Das, “Code Analysis for Software and System Security Using Open Source Tools,” Information Security Journal: A Global Perspective, vol. 21, no. 6, pp. 346–352, Jan. 2012. doi: 10.1080/19393555.2012.727132 CR - [33] A. Aggarwal and P. Jalote, “Integrating Static and Dynamic Analysis for Detecting Vulnerabilities,” in 30th Annual International Computer Software and Applications Conference (COMPSAC’06), Chicaco, IL: IEEE, 2006, pp. 343–350. doi: 10.1109/COMPSAC.2006.55 CR - [34] R. K. McLean, “Comparing Static Security Analysis Tools Using Open Source Software,” in 2012 IEEE Sixth International Conference on Software Security and Reliability Companion, Gaithersburg, MD, USA: IEEE, Jun. 2012, pp. 68–74. doi: 10.1109/SERE-C.2012.16 CR - [35] G. Díaz and J. R. Bermejo, “Static analysis of source code security: Assessment of tools against SAMATE tests,” Information and Software Technology, vol. 55, no. 8, pp. 1462–1476, Aug. 2013. doi: 10.1016/j.infsof.2013.02.005 CR - [36] P. Louridas, “Static code analysis,” IEEE Softw., vol. 23, no. 4, pp. 58–61, Jul. 2006. doi: 10.1109/MS.2006.114 CR - [37] Z. Zhioua, S. Short, and Y. Roudier, “Static Code Analysis for Software Security Verification: Problems and Approaches,” in 2014 IEEE 38th International Computer Software and Applications Conference Workshops, Vasteras, Sweden: IEEE, Jul. 2014, pp. 102–109. doi: 10.1109/COMPSACW.2014.22 CR - [38] B. Chess and G. McGraw, “Static analysis for security,” IEEE Secur. Privacy Mag., vol. 2, no. 6, pp. 76–79, Nov. 2004. doi: 10.1109/MSP.2004.111 CR - [39] A. Fasano et al., “SoK: Enabling Security Analyses of Embedded Systems via Rehosting,” in Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, Virtual Event Hong Kong: ACM, May 2021, pp. 687–701. doi: 10.1145/3433210.3453093 CR - [40] U. Urooj, B. A. S. Al-rimy, A. Zainal, F. A. Ghaleb, and M. A. Rassam, “Ransomware Detection Using the Dynamic Analysis and Machine Learning: A Survey and Research Directions,” Applied Sciences, vol. 12, no. 1, p. 172, Dec. 2021. doi: 10.3390/app12010172 CR - [41] T. Sutter, T. Kehrer, M. Rennhard, B. Tellenbach, and J. Klein, “Dynamic Security Analysis on Android: A Systematic Literature Review,” IEEE Access, vol. 12, pp. 57261–57287, 2024. doi: 10.1109/ACCESS.2024.3390612 CR - [42] M. Gegick, P. Rotella, and L. Williams, ‘Toward Non-security Failures as a Predictor of Security Faults and Failures,” in Engineering Secure Software and Systems, vol. 5429, F. Massacci, S. T. Redwine, and N. Zannone, Eds., in Lecture Notes in Computer Science, vol. 5429. , Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 135–149. doi: 10.1007/978-3-642-00199-4_12 CR - [43] J. Xu, Y. Wu, Z. Lu, and T. Wang, “Dockerfile TF Smell Detection Based on Dynamic and Static Analysis Methods,” in 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Milwaukee, WI, USA: IEEE, Jul. 2019, pp. 185–190. doi: 10.1109/COMPSAC.2019.00033 CR - [44] “OWASP Top 10:2021”, Open Web Application Security Project. Accessed: Jun. 16, 2025. [Online]. Available: https://owasp.org/Top10/ CR - [45] T. Petranović and N. Žarić, “Effectiveness of Using OWASP TOP 10 as AppSec Standard,” in 2023 27th International Conference on Information Technology (IT), Zabljak, Montenegro: IEEE, Feb. 2023, pp. 1–4. doi: 10.1109/IT57431.2023.10078626 CR - [46] R. Khoury, A. R. Avila, J. Brunelle, and B. M. Camara, “How Secure is Code Generated by ChatGPT?,” in 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, Oahu, HI, USA: IEEE, Oct. 2023, pp. 2445–2451. doi: 10.1109/SMC53992.2023.10394237 CR - [47] M. Nair, R. Sadhukhan, and D. Mukhopadhyay, “Generating Secure Hardware using ChatGPT Resistant to CWEs,” 2023, 2023/212. Accessed: Jun. 16, 2025. [Online]. Available: https://eprint.iacr.org/2023/212 CR - [48] D. Baca, K. Petersen, B. Carlsson, and L. Lundberg, “Static Code Analysis to Detect Software Security Vulnerabilities - Does Experience Matter?,” in 2009 International Conference on Availability, Reliability and Security, Fukuoka, Japan: IEEE, 2009, pp. 804–810. doi: 10.1109/ARES.2009.163 CR - [49] Y. Sun et al., “GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis,” in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon Portugal: ACM, Apr. 2024, pp. 1–13. doi: 10.1145/3597503.3639117 CR - [50] Z. Li, S. Dutta, and M. Naik, “IRIS: LLM-assisted static analysis for detecting security vulnerabilities,” in The Thirteenth International Conference on Learning Representations, 2025. Accessed: Jun. 16, 2025. [Online]. Available: https://openreview.net/forum?id=9LdJDU7E91 CR - [51] A. Kavian, M. M. Pourhashem Kallehbasti, S. Kazemi, E. Firouzi, and M. Ghafari, “LLM Security Guard for Code,” in Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, Salerno Italy: ACM, Jun. 2024, pp. 600–603. doi: 10.1145/3661167.3661263 CR - [52] Y. Zhang, W. Song, Z. Ji, Danfeng, Yao, and N. Meng, “How well does LLM generate security tests?,” 2023, arXiv. doi: 10.48550/ARXIV.2310.00710 UR - https://dergipark.org.tr/en/pub/gmbd/issue//1720932 L1 - https://dergipark.org.tr/en/download/article-file/4964707 ER -