Security Evaluation of AI-Generated Code: A Comparative Study of ChatGPT, Copilot, And Gemini through Static and Dynamic Analysis

Onur Ceran

Security Evaluation of AI-Generated Code: A Comparative Study of ChatGPT, Copilot, And Gemini through Static and Dynamic Analysis

Abstract

This study examines the security performance of generative artificial intelligence (AI) tools of ChatGPT, Copilot, and Gemini within software development workflows. Through static and dynamic code analysis, security vulnerabilities in web application login code generated by these tools were systematically evaluated. Results indicate that while AI models offer efficiency in code generation, they also introduce varying levels of security risk. Copilot exhibited the highest cumulative risk with multiple high-level vulnerabilities, while ChatGPT demonstrated a lower risk profile. Gemini produced relatively optimized code but contained critical security flaws that require manual review. The most common vulnerabilities across all models were insecure design and security logging and monitoring failures, indicating a systemic issue in AI-generated code. The findings emphasize that generic prompts focusing on security are insufficient and that developers must use specific, security-oriented prompts, such as applying secure-by-design principles and implementing OWASP Top Ten protections. This study contributes to the growing body of literature addressing the security implications of integrating AI into software development, highlighting the importance of human oversight and carefully crafted prompts to mitigate potential risks.

Keywords

References

[1] S. Feuerriegel, J. Hartmann, C. Janiesch, and P. Zschech, “Generative AI,” Bus Inf Syst Eng, vol. 66, no. 1, pp. 111–126, Feb. 2024. doi: 10.1007/s12599-023-00834-7
[2] L. Banh and G. Strobel, “Generative artificial intelligence,” Electron Markets, vol. 33, no. 1, p. 63, Dec. 2023. doi: 10.1007/s12525-023-00680-1
[3] P. Kokol, “The Use of AI in Software Engineering: A Synthetic Knowledge Synthesis of the Recent Research Literature,” Information, vol. 15, no. 6, p. 354, Jun. 2024. doi: 10.3390/info15060354
[4] Y. Almeida et al., “AICodeReview: Advancing code quality with AI-enhanced reviews,” SoftwareX, vol. 26, p. 101677, May 2024. doi: 10.1016/j.softx.2024.101677
[5] P. Vaithilingam, T. Zhang, and E. L. Glassman, “Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models,” in CHI Conference on Human Factors in Computing Systems Extended Abstracts, New Orleans LA USA: ACM, Apr. 2022, pp. 1–7. doi: 10.1145/3491101.3519665
[6] R. Wang, R. Cheng, D. Ford, and T. Zimmermann, “Investigating and Designing for Trust in AI-powered Code Generation Tools,” in The 2024 ACM Conference on Fairness, Accountability, and Transparency, Rio de Janeiro Brazil: ACM, Jun. 2024, pp. 1475–1493. doi: 10.1145/3630106.3658984
[7] D. Hanson, “Future of Code with Generative AI: Transparency and Safety in the Era of AI Generated Software,” 2025, arXiv. doi: 10.48550/ARXIV.2505.20303
[8] M. Taeb, H. Chi, and S. Bernadin, “Assessing the Effectiveness and Security Implications of AI Code Generators,” CISSE, vol. 11, no. 1, p. 6, Feb. 2024. doi: 10.53735/cisse.v11i1.180

[9] S. Panichella, “Vulnerabilities Introduced by LLMs Through Code Suggestions,” in Large Language Models in Cybersecurity, A. Kucharavy, O. Plancherel, V. Mulder, A. Mermoud, and V. Lenders, Eds., Cham: Springer Nature Switzerland, 2024, pp. 87–97. doi: 10.1007/978-3-031-54827-7_9
[10] K. Cho, Y. Park, J. Kim, B. Kim, and D. Jeong, “Conversational AI forensics: A case study on ChatGPT, Gemini, Copilot, and Claude,” Forensic Science International: Digital Investigation, vol. 52, p. 301855, Mar. 2025. doi: 10.1016/j.fsidi.2024.301855
[11] H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions,” Commun. ACM, vol. 68, no. 2, pp. 96–105, Feb. 2025. doi: 10.1145/3610721
[12] M. Mehta, “A comparative study of AI code bots: Efficiency, features, and use cases,” Int. J. Sci. Res. Arch., vol. 13, no. 1, pp. 595–602, Sep. 2024. doi: 10.30574/ijsra.2024.13.1.1718
[13] F. Fui-Hoon Nah, R. Zheng, J. Cai, K. Siau, and L. Chen, “Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration,” Journal of Information Technology Case and Application Research, vol. 25, no. 3, pp. 277–304, Jul. 2023. doi: 10.1080/15228053.2023.2233814
[14] A. Sobo, A. Mubarak, A. Baimagambetov, and N. Polatidis, “Evaluating LLMs for Code Generation in HRI: A Comparative Study of ChatGPT, Gemini, and Claude,” Applied Artificial Intelligence, vol. 39, no. 1, p. 2439610, Dec. 2025. doi: 10.1080/08839514.2024.2439610
[15] H. Pearce, B. Tan, B. Ahmad, R. Karri, and B. Dolan-Gavitt, “Examining Zero-Shot Vulnerability Repair with Large Language Models,” in 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA: IEEE, May 2023, pp. 2339–2356. doi: 10.1109/SP46215.2023.10179324
[16] A. Sarkar, A. D. Gordon, C. Negreanu, C. Poelitz, S. S. Ragavan, and B. Zorn, “What is it like to program with artificial intelligence?,” Oct. 17, 2022. arXiv: arXiv:2208.06213. doi: 10.48550/arXiv.2208.06213
[17] M. Arsal et al., “Emerging Cybersecurity and Privacy Threats of ChatGPT, Gemini, and Copilot: Current Trends, Challenges, and Future Directions,” Oct. 24, 2024. doi: 10.20944/preprints202410.1909.v1
[18] C. K. Lo, “What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature,” Education Sciences, vol. 13, no. 4, p. 410, Apr. 2023. doi: 10.3390/educsci13040410
[19] Gemini Team et al., “Gemini: A Family of Highly Capable Multimodal Models,” 2023, arXiv. doi: 10.48550/ARXIV.2312.11805
[20] A. J. Adetayo, M. O. Aborisade, and B. A. Sanni, “Microsoft Copilot and Anthropic Claude AI in education and library service,” LHTN, Jan. 2024. doi: 10.1108/LHTN-01-2024-0002
[21] N. Perry, M. Srivastava, D. Kumar, and D. Boneh, “Do Users Write More Insecure Code with AI Assistants?,” in Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, Copenhagen Denmark: ACM, Nov. 2023, pp. 2785–2799. doi: 10.1145/3576915.3623157
[22] Y. V. Kharchenko and O. M. Babenko, “Advantages and limitations of large language models in chemistry education: A comparative analysis of ChatGPT, Gemini and Copilot,” in Proceedings of the Free Open-Access Proceedings for Computer Science Workshops, Lviv, Ukraine, 2024. pp. 42–59. Accessed: Jun. 12, 2025
[23] N. Tihanyi, T. Bisztray, M. A. Ferrag, R. Jain, and L. C. Cordeiro, “How secure is AI-generated code: a large-scale comparison of large language models,” Empir Software Eng, vol. 30, no. 2, p. 47, Mar. 2025. doi: 10.1007/s10664-024-10590-1
[24] Trend Micro, “Security Vulnerabilities of ChatGPT-Generated Code,” Trend Micro. Accessed: Jun. 16, 2025. [Online]. Available: https://www.trendmicro.com/en_us/research/23/e/chatgpt-security-vulnerabilities.html
[25] M. Kharma, S. Choi, M. AlKhanafseh, and D. Mohaisen, “Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis,” 2025. arXiv. doi: 10.48550/ARXIV.2502.01853
[26] D. Tosi, “Studying the Quality of Source Code Generated by Different AI Generative Engines: An Empirical Evaluation,” Future Internet, vol. 16, no. 6, p. 188, May 2024. doi: 10.3390/fi16060188
[27] P. Smutny and M. Bojko, “Comparative Analysis of Chatbots Using Large Language Models for Web Development Tasks,” Applied Sciences, vol. 14, no. 21, p. 10048, Nov. 2024. doi: 10.3390/app142110048
[28] Y. Yigit, W. J. Buchanan, M. G. Tehrani, and L. Maglaras, “Review of generative ai methods in cybersecurity,” arXiv preprint arXiv:2403.08701, 2024, Accessed: Jun. 12, 2025. [Online]. Available: https://storage.prod.researchhub.com/uploads/papers/2024/04/24/2403.08701.pdf
[29] A. H. Mohsin, I. M. Rahi, and R. A. Hussain, A Study of 2.5 D Face Recognition for Forensic Analysis. IJCSMC, 2020.
[30] G. M. Kapitsaki, “Generative AI for Code Generation: Software Reuse Implications,” in Reuse and Software Quality, vol. 14614, A. Achilleos, L. Fuentes, and G. A. Papadopoulos, Eds., in Lecture Notes in Computer Science, vol. 14614, Cham: Springer Nature Switzerland, 2024, pp. 37–47. doi: 10.1007/978-3-031-66459-5_3
[31] D. Palla and A. Slaby, “Evaluation of Generative AI Models in Python Code Generation: A Comparative Study,” IEEE Access, vol. 13, pp. 65334–65347, 2025. doi: 10.1109/ACCESS.2025.3560244
[32] C. Chahar, V. S. Chauhan, and M. L. Das, “Code Analysis for Software and System Security Using Open Source Tools,” Information Security Journal: A Global Perspective, vol. 21, no. 6, pp. 346–352, Jan. 2012. doi: 10.1080/19393555.2012.727132
[33] A. Aggarwal and P. Jalote, “Integrating Static and Dynamic Analysis for Detecting Vulnerabilities,” in 30th Annual International Computer Software and Applications Conference (COMPSAC’06), Chicaco, IL: IEEE, 2006, pp. 343–350. doi: 10.1109/COMPSAC.2006.55
[34] R. K. McLean, “Comparing Static Security Analysis Tools Using Open Source Software,” in 2012 IEEE Sixth International Conference on Software Security and Reliability Companion, Gaithersburg, MD, USA: IEEE, Jun. 2012, pp. 68–74. doi: 10.1109/SERE-C.2012.16
[35] G. Díaz and J. R. Bermejo, “Static analysis of source code security: Assessment of tools against SAMATE tests,” Information and Software Technology, vol. 55, no. 8, pp. 1462–1476, Aug. 2013. doi: 10.1016/j.infsof.2013.02.005
[36] P. Louridas, “Static code analysis,” IEEE Softw., vol. 23, no. 4, pp. 58–61, Jul. 2006. doi: 10.1109/MS.2006.114
[37] Z. Zhioua, S. Short, and Y. Roudier, “Static Code Analysis for Software Security Verification: Problems and Approaches,” in 2014 IEEE 38th International Computer Software and Applications Conference Workshops, Vasteras, Sweden: IEEE, Jul. 2014, pp. 102–109. doi: 10.1109/COMPSACW.2014.22
[38] B. Chess and G. McGraw, “Static analysis for security,” IEEE Secur. Privacy Mag., vol. 2, no. 6, pp. 76–79, Nov. 2004. doi: 10.1109/MSP.2004.111
[39] A. Fasano et al., “SoK: Enabling Security Analyses of Embedded Systems via Rehosting,” in Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, Virtual Event Hong Kong: ACM, May 2021, pp. 687–701. doi: 10.1145/3433210.3453093
[40] U. Urooj, B. A. S. Al-rimy, A. Zainal, F. A. Ghaleb, and M. A. Rassam, “Ransomware Detection Using the Dynamic Analysis and Machine Learning: A Survey and Research Directions,” Applied Sciences, vol. 12, no. 1, p. 172, Dec. 2021. doi: 10.3390/app12010172
[41] T. Sutter, T. Kehrer, M. Rennhard, B. Tellenbach, and J. Klein, “Dynamic Security Analysis on Android: A Systematic Literature Review,” IEEE Access, vol. 12, pp. 57261–57287, 2024. doi: 10.1109/ACCESS.2024.3390612
[42] M. Gegick, P. Rotella, and L. Williams, ‘Toward Non-security Failures as a Predictor of Security Faults and Failures,” in Engineering Secure Software and Systems, vol. 5429, F. Massacci, S. T. Redwine, and N. Zannone, Eds., in Lecture Notes in Computer Science, vol. 5429. , Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 135–149. doi: 10.1007/978-3-642-00199-4_12
[43] J. Xu, Y. Wu, Z. Lu, and T. Wang, “Dockerfile TF Smell Detection Based on Dynamic and Static Analysis Methods,” in 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Milwaukee, WI, USA: IEEE, Jul. 2019, pp. 185–190. doi: 10.1109/COMPSAC.2019.00033
[44] “OWASP Top 10:2021”, Open Web Application Security Project. Accessed: Jun. 16, 2025. [Online]. Available: https://owasp.org/Top10/
[45] T. Petranović and N. Žarić, “Effectiveness of Using OWASP TOP 10 as AppSec Standard,” in 2023 27th International Conference on Information Technology (IT), Zabljak, Montenegro: IEEE, Feb. 2023, pp. 1–4. doi: 10.1109/IT57431.2023.10078626
[46] R. Khoury, A. R. Avila, J. Brunelle, and B. M. Camara, “How Secure is Code Generated by ChatGPT?,” in 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, Oahu, HI, USA: IEEE, Oct. 2023, pp. 2445–2451. doi: 10.1109/SMC53992.2023.10394237
[47] M. Nair, R. Sadhukhan, and D. Mukhopadhyay, “Generating Secure Hardware using ChatGPT Resistant to CWEs,” 2023, 2023/212. Accessed: Jun. 16, 2025. [Online]. Available: https://eprint.iacr.org/2023/212
[48] D. Baca, K. Petersen, B. Carlsson, and L. Lundberg, “Static Code Analysis to Detect Software Security Vulnerabilities - Does Experience Matter?,” in 2009 International Conference on Availability, Reliability and Security, Fukuoka, Japan: IEEE, 2009, pp. 804–810. doi: 10.1109/ARES.2009.163
[49] Y. Sun et al., “GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis,” in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon Portugal: ACM, Apr. 2024, pp. 1–13. doi: 10.1145/3597503.3639117
[50] Z. Li, S. Dutta, and M. Naik, “IRIS: LLM-assisted static analysis for detecting security vulnerabilities,” in The Thirteenth International Conference on Learning Representations, 2025. Accessed: Jun. 16, 2025. [Online]. Available: https://openreview.net/forum?id=9LdJDU7E91
[51] A. Kavian, M. M. Pourhashem Kallehbasti, S. Kazemi, E. Firouzi, and M. Ghafari, “LLM Security Guard for Code,” in Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, Salerno Italy: ACM, Jun. 2024, pp. 600–603. doi: 10.1145/3661167.3661263
[52] Y. Zhang, W. Song, Z. Ji, Danfeng, Yao, and N. Meng, “How well does LLM generate security tests?,” 2023, arXiv. doi: 10.48550/ARXIV.2310.00710

Details

Primary Language

English

Subjects

Computer Software

Journal Section

Research Article

Authors

Onur Ceran ^*
0000-0003-2147-0506
Türkiye

Publication Date

August 31, 2025

Submission Date

June 16, 2025

Acceptance Date

August 9, 2025

Published in Issue

Year 2025 Volume: 11 Number: 2

IZ

https://izlik.org/JA39ZF75PA

Cite

RIS / Bibtex

APA

Ceran, O. (2025). Security Evaluation of AI-Generated Code: A Comparative Study of ChatGPT, Copilot, And Gemini through Static and Dynamic Analysis. Gazi Journal of Engineering Sciences, 11(2), 304-320. https://izlik.org/JA39ZF75PA

AMA

1.Ceran O. Security Evaluation of AI-Generated Code: A Comparative Study of ChatGPT, Copilot, And Gemini through Static and Dynamic Analysis. GJES. 2025;11(2):304-320. https://izlik.org/JA39ZF75PA

Chicago

Ceran, Onur. 2025. “Security Evaluation of AI-Generated Code: A Comparative Study of ChatGPT, Copilot, And Gemini through Static and Dynamic Analysis”. Gazi Journal of Engineering Sciences 11 (2): 304-20. https://izlik.org/JA39ZF75PA.

EndNote

Ceran O (August 1, 2025) Security Evaluation of AI-Generated Code: A Comparative Study of ChatGPT, Copilot, And Gemini through Static and Dynamic Analysis. Gazi Journal of Engineering Sciences 11 2 304–320.

IEEE

[1]O. Ceran, “Security Evaluation of AI-Generated Code: A Comparative Study of ChatGPT, Copilot, And Gemini through Static and Dynamic Analysis”, GJES, vol. 11, no. 2, pp. 304–320, Aug. 2025, [Online]. Available: https://izlik.org/JA39ZF75PA

ISNAD

Ceran, Onur. “Security Evaluation of AI-Generated Code: A Comparative Study of ChatGPT, Copilot, And Gemini through Static and Dynamic Analysis”. Gazi Journal of Engineering Sciences 11/2 (August 1, 2025): 304-320. https://izlik.org/JA39ZF75PA.

JAMA

1.Ceran O. Security Evaluation of AI-Generated Code: A Comparative Study of ChatGPT, Copilot, And Gemini through Static and Dynamic Analysis. GJES. 2025;11:304–320.

MLA

Ceran, Onur. “Security Evaluation of AI-Generated Code: A Comparative Study of ChatGPT, Copilot, And Gemini through Static and Dynamic Analysis”. Gazi Journal of Engineering Sciences, vol. 11, no. 2, Aug. 2025, pp. 304-20, https://izlik.org/JA39ZF75PA.

Vancouver

1.Onur Ceran. Security Evaluation of AI-Generated Code: A Comparative Study of ChatGPT, Copilot, And Gemini through Static and Dynamic Analysis. GJES [Internet]. 2025 Aug. 1;11(2):304-20. Available from: https://izlik.org/JA39ZF75PA