TY  - JOUR
T1  - Security Evaluation of AI-Generated Code: A Comparative Study of ChatGPT, Copilot, And Gemini through Static and Dynamic Analysis
AU  - Ceran, Onur
PY  - 2025
DA  - August
Y2  - 2025
JF  - Gazi Journal of Engineering Sciences
JO  - GJES
PB  - Parantez Teknoloji
WT  - DergiPark
SN  - 2149-9373
SP  - 304
EP  - 320
VL  - 11
IS  - 2
LA  - en
AB  - This study examines the security performance of generative artificial intelligence (AI) tools of ChatGPT, Copilot, and Gemini within software development workflows. Through static and dynamic code analysis, security vulnerabilities in web application login code generated by these tools were systematically evaluated. Results indicate that while AI models offer efficiency in code generation, they also introduce varying levels of security risk. Copilot exhibited the highest cumulative risk with multiple high-level vulnerabilities, while ChatGPT demonstrated a lower risk profile. Gemini produced relatively optimized code but contained critical security flaws that require manual review. The most common vulnerabilities across all models were insecure design and security logging and monitoring failures, indicating a systemic issue in AI-generated code. The findings emphasize that generic prompts focusing on security are insufficient and that developers must use specific, security-oriented prompts, such as applying secure-by-design principles and implementing OWASP Top Ten protections. This study contributes to the growing body of literature addressing the security implications of integrating AI into software development, highlighting the importance of human oversight and carefully crafted prompts to mitigate potential risks.
KW  - Generative AI
KW  - ChatGPT
KW  - Copilot
KW  - Gemini
KW  - Software Security
KW  - Static Code Analysis
KW  - Dynamic Code Analysis
CR  - [1] S. Feuerriegel, J. Hartmann, C. Janiesch, and P. Zschech, “Generative AI,” Bus Inf Syst Eng, vol. 66, no. 1, pp. 111–126, Feb. 2024. doi: 10.1007/s12599-023-00834-7
CR  - [2] L. Banh and G. Strobel, “Generative artificial intelligence,” Electron Markets, vol. 33, no. 1, p. 63, Dec. 2023. doi: 10.1007/s12525-023-00680-1
CR  - [3] P. Kokol, “The Use of AI in Software Engineering: A Synthetic Knowledge Synthesis of the Recent Research Literature,” Information, vol. 15, no. 6, p. 354, Jun. 2024. doi: 10.3390/info15060354
CR  - [4] Y. Almeida et al., “AICodeReview: Advancing code quality with AI-enhanced reviews,” SoftwareX, vol. 26, p. 101677, May 2024. doi: 10.1016/j.softx.2024.101677
CR  - [5] P. Vaithilingam, T. Zhang, and E. L. Glassman, “Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models,” in CHI Conference on Human Factors in Computing Systems Extended Abstracts, New Orleans LA USA: ACM, Apr. 2022, pp. 1–7. doi: 10.1145/3491101.3519665
CR  - [6] R. Wang, R. Cheng, D. Ford, and T. Zimmermann, “Investigating and Designing for Trust in AI-powered Code Generation Tools,” in The 2024 ACM Conference on Fairness, Accountability, and Transparency, Rio de Janeiro Brazil: ACM, Jun. 2024, pp. 1475–1493. doi: 10.1145/3630106.3658984
CR  - [7] D. Hanson, “Future of Code with Generative AI: Transparency and Safety in the Era of AI Generated Software,” 2025, arXiv. doi: 10.48550/ARXIV.2505.20303
CR  - [8] M. Taeb, H. Chi, and S. Bernadin, “Assessing the Effectiveness and Security Implications of AI Code Generators,” CISSE, vol. 11, no. 1, p. 6, Feb. 2024. doi: 10.53735/cisse.v11i1.180
CR  - [9] S. Panichella, “Vulnerabilities Introduced by LLMs Through Code Suggestions,” in Large Language Models in Cybersecurity, A. Kucharavy, O. Plancherel, V. Mulder, A. Mermoud, and V. Lenders, Eds., Cham: Springer Nature Switzerland, 2024, pp. 87–97. doi: 10.1007/978-3-031-54827-7_9
CR  - [10] K. Cho, Y. Park, J. Kim, B. Kim, and D. Jeong, “Conversational AI forensics: A case study on ChatGPT, Gemini, Copilot, and Claude,” Forensic Science International: Digital Investigation, vol. 52, p. 301855, Mar. 2025. doi: 10.1016/j.fsidi.2024.301855
CR  - [11] H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions,” Commun. ACM, vol. 68, no. 2, pp. 96–105, Feb. 2025. doi: 10.1145/3610721
CR  - [12] M. Mehta, “A comparative study of AI code bots: Efficiency, features, and use cases,” Int. J. Sci. Res. Arch., vol. 13, no. 1, pp. 595–602, Sep. 2024. doi: 10.30574/ijsra.2024.13.1.1718
CR  - [13] F. Fui-Hoon Nah, R. Zheng, J. Cai, K. Siau, and L. Chen, “Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration,” Journal of Information Technology Case and Application Research, vol. 25, no. 3, pp. 277–304, Jul. 2023. doi: 10.1080/15228053.2023.2233814
CR  - [14] A. Sobo, A. Mubarak, A. Baimagambetov, and N. Polatidis, “Evaluating LLMs for Code Generation in HRI: A Comparative Study of ChatGPT, Gemini, and Claude,” Applied Artificial Intelligence, vol. 39, no. 1, p. 2439610, Dec. 2025. doi: 10.1080/08839514.2024.2439610
CR  - [15] H. Pearce, B. Tan, B. Ahmad, R. Karri, and B. Dolan-Gavitt, “Examining Zero-Shot Vulnerability Repair with Large Language Models,” in 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA: IEEE, May 2023, pp. 2339–2356. doi: 10.1109/SP46215.2023.10179324
CR  - [16] A. Sarkar, A. D. Gordon, C. Negreanu, C. Poelitz, S. S. Ragavan, and B. Zorn, “What is it like to program with artificial intelligence?,” Oct. 17, 2022. arXiv: arXiv:2208.06213. doi: 10.48550/arXiv.2208.06213
CR  - [17] M. Arsal et al., “Emerging Cybersecurity and Privacy Threats of ChatGPT, Gemini, and Copilot: Current Trends, Challenges, and Future Directions,” Oct. 24, 2024. doi: 10.20944/preprints202410.1909.v1
CR  - [18] C. K. Lo, “What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature,” Education Sciences, vol. 13, no. 4, p. 410, Apr. 2023. doi: 10.3390/educsci13040410
CR  - [19] Gemini Team et al., “Gemini: A Family of Highly Capable Multimodal Models,” 2023, arXiv. doi: 10.48550/ARXIV.2312.11805
CR  - [20] A. J. Adetayo, M. O. Aborisade, and B. A. Sanni, “Microsoft Copilot and Anthropic Claude AI in education and library service,” LHTN, Jan. 2024. doi: 10.1108/LHTN-01-2024-0002
CR  - [21] N. Perry, M. Srivastava, D. Kumar, and D. Boneh, “Do Users Write More Insecure Code with AI Assistants?,” in Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, Copenhagen Denmark: ACM, Nov. 2023, pp. 2785–2799. doi: 10.1145/3576915.3623157
CR  - [22] Y. V. Kharchenko and O. M. Babenko, “Advantages and limitations of large language models in chemistry education: A comparative analysis of ChatGPT, Gemini and Copilot,” in Proceedings of the Free Open-Access Proceedings for Computer Science Workshops, Lviv, Ukraine, 2024. pp. 42–59. Accessed: Jun. 12, 2025
CR  - [23] N. Tihanyi, T. Bisztray, M. A. Ferrag, R. Jain, and L. C. Cordeiro, “How secure is AI-generated code: a large-scale comparison of large language models,” Empir Software Eng, vol. 30, no. 2, p. 47, Mar. 2025. doi: 10.1007/s10664-024-10590-1
CR  - [24] Trend Micro, “Security Vulnerabilities of ChatGPT-Generated Code,” Trend Micro. Accessed: Jun. 16, 2025. [Online]. Available: https://www.trendmicro.com/en_us/research/23/e/chatgpt-security-vulnerabilities.html
CR  - [25] M. Kharma, S. Choi, M. AlKhanafseh, and D. Mohaisen, “Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis,” 2025. arXiv. doi: 10.48550/ARXIV.2502.01853
CR  - [26] D. Tosi, “Studying the Quality of Source Code Generated by Different AI Generative Engines: An Empirical Evaluation,” Future Internet, vol. 16, no. 6, p. 188, May 2024. doi: 10.3390/fi16060188
CR  - [27] P. Smutny and M. Bojko, “Comparative Analysis of Chatbots Using Large Language Models for Web Development Tasks,” Applied Sciences, vol. 14, no. 21, p. 10048, Nov. 2024. doi: 10.3390/app142110048
CR  - [28] Y. Yigit, W. J. Buchanan, M. G. Tehrani, and L. Maglaras, “Review of generative ai methods in cybersecurity,” arXiv preprint arXiv:2403.08701, 2024, Accessed: Jun. 12, 2025. [Online]. Available: https://storage.prod.researchhub.com/uploads/papers/2024/04/24/2403.08701.pdf
CR  - [29] A. H. Mohsin, I. M. Rahi, and R. A. Hussain, A Study of 2.5 D Face Recognition for Forensic Analysis. IJCSMC, 2020.
CR  - [30] G. M. Kapitsaki, “Generative AI for Code Generation: Software Reuse Implications,” in Reuse and Software Quality, vol. 14614, A. Achilleos, L. Fuentes, and G. A. Papadopoulos, Eds., in Lecture Notes in Computer Science, vol. 14614, Cham: Springer Nature Switzerland, 2024, pp. 37–47. doi: 10.1007/978-3-031-66459-5_3
CR  - [31] D. Palla and A. Slaby, “Evaluation of Generative AI Models in Python Code Generation: A Comparative Study,” IEEE Access, vol. 13, pp. 65334–65347, 2025. doi: 10.1109/ACCESS.2025.3560244
CR  - [32] C. Chahar, V. S. Chauhan, and M. L. Das, “Code Analysis for Software and System Security Using Open Source Tools,” Information Security Journal: A Global Perspective, vol. 21, no. 6, pp. 346–352, Jan. 2012. doi: 10.1080/19393555.2012.727132
CR  - [33] A. Aggarwal and P. Jalote, “Integrating Static and Dynamic Analysis for Detecting Vulnerabilities,” in 30th Annual International Computer Software and Applications Conference (COMPSAC’06), Chicaco, IL: IEEE, 2006, pp. 343–350. doi: 10.1109/COMPSAC.2006.55
CR  - [34] R. K. McLean, “Comparing Static Security Analysis Tools Using Open Source Software,” in 2012 IEEE Sixth International Conference on Software Security and Reliability Companion, Gaithersburg, MD, USA: IEEE, Jun. 2012, pp. 68–74. doi: 10.1109/SERE-C.2012.16
CR  - [35] G. Díaz and J. R. Bermejo, “Static analysis of source code security: Assessment of tools against SAMATE tests,” Information and Software Technology, vol. 55, no. 8, pp. 1462–1476, Aug. 2013. doi: 10.1016/j.infsof.2013.02.005
CR  - [36] P. Louridas, “Static code analysis,” IEEE Softw., vol. 23, no. 4, pp. 58–61, Jul. 2006. doi: 10.1109/MS.2006.114
CR  - [37] Z. Zhioua, S. Short, and Y. Roudier, “Static Code Analysis for Software Security Verification: Problems and Approaches,” in 2014 IEEE 38th International Computer Software and Applications Conference Workshops, Vasteras, Sweden: IEEE, Jul. 2014, pp. 102–109. doi: 10.1109/COMPSACW.2014.22
CR  - [38] B. Chess and G. McGraw, “Static analysis for security,” IEEE Secur. Privacy Mag., vol. 2, no. 6, pp. 76–79, Nov. 2004. doi: 10.1109/MSP.2004.111
CR  - [39] A. Fasano et al., “SoK: Enabling Security Analyses of Embedded Systems via Rehosting,” in Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, Virtual Event Hong Kong: ACM, May 2021, pp. 687–701. doi: 10.1145/3433210.3453093
CR  - [40] U. Urooj, B. A. S. Al-rimy, A. Zainal, F. A. Ghaleb, and M. A. Rassam, “Ransomware Detection Using the Dynamic Analysis and Machine Learning: A Survey and Research Directions,” Applied Sciences, vol. 12, no. 1, p. 172, Dec. 2021. doi: 10.3390/app12010172
CR  - [41] T. Sutter, T. Kehrer, M. Rennhard, B. Tellenbach, and J. Klein, “Dynamic Security Analysis on Android: A Systematic Literature Review,” IEEE Access, vol. 12, pp. 57261–57287, 2024. doi: 10.1109/ACCESS.2024.3390612
CR  - [42] M. Gegick, P. Rotella, and L. Williams, ‘Toward Non-security Failures as a Predictor of Security Faults and Failures,” in Engineering Secure Software and Systems, vol. 5429, F. Massacci, S. T. Redwine, and N. Zannone, Eds., in Lecture Notes in Computer Science, vol. 5429. , Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 135–149. doi: 10.1007/978-3-642-00199-4_12
CR  - [43] J. Xu, Y. Wu, Z. Lu, and T. Wang, “Dockerfile TF Smell Detection Based on Dynamic and Static Analysis Methods,” in 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Milwaukee, WI, USA: IEEE, Jul. 2019, pp. 185–190. doi: 10.1109/COMPSAC.2019.00033
CR  - [44] “OWASP Top 10:2021”, Open Web Application Security Project. Accessed: Jun. 16, 2025. [Online]. Available: https://owasp.org/Top10/
CR  - [45] T. Petranović and N. Žarić, “Effectiveness of Using OWASP TOP 10 as AppSec Standard,” in 2023 27th International Conference on Information Technology (IT), Zabljak, Montenegro: IEEE, Feb. 2023, pp. 1–4. doi: 10.1109/IT57431.2023.10078626
CR  - [46] R. Khoury, A. R. Avila, J. Brunelle, and B. M. Camara, “How Secure is Code Generated by ChatGPT?,” in 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, Oahu, HI, USA: IEEE, Oct. 2023, pp. 2445–2451. doi: 10.1109/SMC53992.2023.10394237
CR  - [47] M. Nair, R. Sadhukhan, and D. Mukhopadhyay, “Generating Secure Hardware using ChatGPT Resistant to CWEs,” 2023, 2023/212. Accessed: Jun. 16, 2025. [Online]. Available: https://eprint.iacr.org/2023/212
CR  - [48] D. Baca, K. Petersen, B. Carlsson, and L. Lundberg, “Static Code Analysis to Detect Software Security Vulnerabilities - Does Experience Matter?,” in 2009 International Conference on Availability, Reliability and Security, Fukuoka, Japan: IEEE, 2009, pp. 804–810. doi: 10.1109/ARES.2009.163
CR  - [49] Y. Sun et al., “GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis,” in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon Portugal: ACM, Apr. 2024, pp. 1–13. doi: 10.1145/3597503.3639117
CR  - [50] Z. Li, S. Dutta, and M. Naik, “IRIS: LLM-assisted static analysis for detecting security vulnerabilities,” in The Thirteenth International Conference on Learning Representations, 2025. Accessed: Jun. 16, 2025. [Online]. Available: https://openreview.net/forum?id=9LdJDU7E91
CR  - [51] A. Kavian, M. M. Pourhashem Kallehbasti, S. Kazemi, E. Firouzi, and M. Ghafari, “LLM Security Guard for Code,” in Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, Salerno Italy: ACM, Jun. 2024, pp. 600–603. doi: 10.1145/3661167.3661263
CR  - [52] Y. Zhang, W. Song, Z. Ji, Danfeng, Yao, and N. Meng, “How well does LLM generate security tests?,” 2023, arXiv. doi: 10.48550/ARXIV.2310.00710
UR  - https://dergipark.org.tr/en/pub/gmbd/issue//1720932
L1  - https://dergipark.org.tr/en/download/article-file/4964707
ER  -