Assessing GPT-Zero’s Accuracy in Identifying AI vs. Human-Written Essays

Selin Dik; Osman Erdem

doi:10.47086/pims.1762132

Assessing GPT-Zero’s Accuracy in Identifying AI vs. Human-Written Essays

Abstract

As the use of AI tools by students has become more prevalent, instructors have started using AI detection tools like GPTZero and QuillBot to detect AI written text. However, the reliability of these detectors remains uncertain. In our study, we focused mostly on the success rate of GPTZero, the most-used AI detector, in identifying AI-generated texts based on different lengths of randomly submitted essays: short (40-100 word count), medium (100-350 word count), and long (350-800 word count). We gathered a data set consisting of 28 AI-generated papers and 50 human-written papers. With this randomized essay data, papers were individually plugged into GPTZero and measured for percentage of AI generation and confidence. A vast majority of the AI-generated papers were detected accurately (ranging from 91-100\% AI believed generation), while the human generated essays fluctuated; there were a handful of false positives. These findings suggest that although GPTZero is effective at detecting purely AI-generated content, its reliability in distinguishing human-authored texts is limited. Educators should therefore exercise caution when relying solely on AI detection tools.

Keywords

Supporting Institution

Rockford University, University of North Texas

References

A. Akram, An empirical study of AI generated text detection tools, arXiv preprint arXiv:2310.01423 (2023). Available at: https://arxiv.org/pdf/2310.01423. Accessed 1 Sept. 2024.
A. M. Elkhatat, K. Elsaid, and S. Almeer, Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text. International Journal for Educational Integrity. 19(1) (2023), 1–16. Available at: https://edintegrity.biomedcentral. com/articles/10.1007/s40979-023-00140-5.
T. Folt´ynek, T. Ruas, P. Scharpf, N. Meuschke, M. Schubotz, W. Grosky and B. Gipp, Detecting machine-obfuscated plagiarism. In International conference on information. Cham: Springer International Publishing. (2020), 816–827.
M. Perkins, J. Roe, D. Postma, J. McGaughran and D. Hickerson, Detection of GPT-4 generated text in higher education: Combining academic judgement and software to identify generative AI tool misuse, Journal of Academic Ethics. 22(1) (2024), 89-113.
J. Q. J. Liu, K. T. K. Hui, F. A. Zoubi, Z. Z. X. Zhou, D. Samartzis, C. C. H. Yu, J. R. Chang, and A. Y. L. Wong, The great detectives: Humans versus AI detectors in catching large language model-generated medical writing, International Journal for Educational Integrity, 20(1) (2024). Available at: https://edintegrity.biomedcentral.com/articles/10. 1007/s40979-024-00155-6.
W. H. Walters, The effectiveness of software designed to detect AI-generated writing: A comparison of 16 AI text detectors, Opis – Online Information Review. 47 (2023). Available at: https://www.degruyter.com/document/doi/10.1515/opis-2022-0158/html.

Details

Primary Language

English

Subjects

Software Engineering (Other)

Journal Section

Research Article

Authors

Selin Dik ^*
0009-0009-7456-9666
United States

Osman Erdem
0009-0004-9327-5034
United States

Publication Date

January 3, 2026

Submission Date

August 10, 2025

Acceptance Date

December 31, 2025

Published in Issue

Year 2025 Volume: 7 Number: 2

DOI

https://doi.org/10.47086/pims.1762132

IZ

https://izlik.org/JA63RY32FF

Cite

RIS / Bibtex

APA

Dik, S., & Erdem, O. (2026). Assessing GPT-Zero’s Accuracy in Identifying AI vs. Human-Written Essays. Proceedings of International Mathematical Sciences, 7(2), 54-58. https://doi.org/10.47086/pims.1762132

AMA

1.Dik S, Erdem O. Assessing GPT-Zero’s Accuracy in Identifying AI vs. Human-Written Essays. PIMS. 2026;7(2):54-58. doi:10.47086/pims.1762132

Chicago

Dik, Selin, and Osman Erdem. 2026. “Assessing GPT-Zero’s Accuracy in Identifying AI Vs. Human-Written Essays”. Proceedings of International Mathematical Sciences 7 (2): 54-58. https://doi.org/10.47086/pims.1762132.

EndNote

Dik S, Erdem O (January 1, 2026) Assessing GPT-Zero’s Accuracy in Identifying AI vs. Human-Written Essays. Proceedings of International Mathematical Sciences 7 2 54–58.

IEEE

[1]S. Dik and O. Erdem, “Assessing GPT-Zero’s Accuracy in Identifying AI vs. Human-Written Essays”, PIMS, vol. 7, no. 2, pp. 54–58, Jan. 2026, doi: 10.47086/pims.1762132.

ISNAD

Dik, Selin - Erdem, Osman. “Assessing GPT-Zero’s Accuracy in Identifying AI Vs. Human-Written Essays”. Proceedings of International Mathematical Sciences 7/2 (January 1, 2026): 54-58. https://doi.org/10.47086/pims.1762132.

JAMA

1.Dik S, Erdem O. Assessing GPT-Zero’s Accuracy in Identifying AI vs. Human-Written Essays. PIMS. 2026;7:54–58.

MLA

Dik, Selin, and Osman Erdem. “Assessing GPT-Zero’s Accuracy in Identifying AI Vs. Human-Written Essays”. Proceedings of International Mathematical Sciences, vol. 7, no. 2, Jan. 2026, pp. 54-58, doi:10.47086/pims.1762132.

Vancouver

1.Selin Dik, Osman Erdem. Assessing GPT-Zero’s Accuracy in Identifying AI vs. Human-Written Essays. PIMS. 2026 Jan. 1;7(2):54-8. doi:10.47086/pims.1762132