Exploring perceptual boundaries: Assessing human ability to differentiate AI-cloned from real voices
Abstract
This study investigates human ability to distinguish between real and AI-generated (cloned) voices. As voice cloning technologies advance, concerns about authenticity, trust, and perceptual accuracy have become increasingly relevant in communication and media contexts. The research adopted a quantitative experimental design involving 35 participants who were asked to identify whether 40 recorded voice samples were real or AI-generated. The data were analyzed using sensitivity index (A′) values, confidence intervals, and t-tests to assess recognition accuracy across variables such as gender and age. Results indicated that participants recognized real voices with an average accuracy rate of 70%, while cloned voices were correctly identified at 60%. Female voices were slightly better recognized than male voices, but age did not significantly affect recognition performance. Although participants often described cloned voices as “human-like,” their actual discrimination accuracy remained relatively low, suggesting that auditory cues alone may not be sufficient to distinguish AI-generated speech from authentic human speech. The findings highlight perceptual challenges and ethical concerns associated with rapidly developing speech synthesis technologies. Further research integrating emotional tone analysis, multimodal perception, and user familiarity with AI systems is recommended to deepen understanding of human–AI auditory interaction.
Keywords
References
- Abdullah, A. A., Muhamad, S. S., & Veisi, H. (2024). Enhancing Kurdish text-to-speech with native corpus training: A high-quality WaveGlow vocoder approach. arXiv preprint arXiv:2409.13734.
- Ahmed, M. A. M., Elghamrawy, K. A., & Taha, Z. A. E. (n.d.). (Voick): Enhancing accessibility in audiobooks through voice cloning technology. IEEE.
- Bell, S. A. (2024). Vox ex machina: A cultural history of talking machines. HTML.
- Car, L. T., Dhinagaran, D. A., Kyaw, B. M., Kowatsch, T., Joty, S., Theng, Y. L., & Atun, R. (2020). Conversational agents in health care: Scoping review and conceptual analysis. Journal of Medical Internet Research, 22(8), e17158.
- Chan, M. P. Y., & Liberman, M. (2021). An acoustic analysis of vocal effort and speaking style. Proceedings of Meetings on Acoustics.
- Chakraborty, T., Reddy, U. K. S., Naik, S. M., Panja, M., & Manvitha, B. (n.d.). Ten years of generative adversarial nets (GANs): A survey of the state-of-the-art. Machine Learning: Science and Technology, 5(1), 11001. https://doi.org/10.1088/2632-2153/ad1f77
- Chuang, H., Tsai, Y., & Wu, C. (2021). Ethical considerations in voice cloning technology: A framework for responsible use. Journal of Artificial Intelligence Ethics, 5, 75–82.
- Duffy, M., Arora, A., & Reddy, S. (2019). Voice banking: A practical approach for individuals with voice disorders. Speech, Language, and Hearing Services in Schools, 50(2), 159–168.
Details
Primary Language
English
Subjects
Music Technology and Recording
Journal Section
Research Article
Authors
Seyhan Canyakan
*
0000-0001-6373-4245
Türkiye
Early Pub Date
February 22, 2026
Publication Date
February 22, 2026
Submission Date
February 23, 2025
Acceptance Date
October 15, 2025
Published in Issue
Year 2026 Volume: 7 Number: 1