Exploring perceptual boundaries: Assessing human ability to differentiate AI-cloned from real voices

Seyhan Canyakan

doi:10.5281/zenodo.19079040

EN

Exploring perceptual boundaries: Assessing human ability to differentiate AI-cloned from real voices

Abstract

This study investigates human ability to distinguish between real and AI-generated (cloned) voices. As voice cloning technologies advance, concerns about authenticity, trust, and perceptual accuracy have become increasingly relevant in communication and media contexts. The research adopted a quantitative experimental design involving 35 participants who were asked to identify whether 40 recorded voice samples were real or AI-generated. The data were analyzed using sensitivity index (A′) values, confidence intervals, and t-tests to assess recognition accuracy across variables such as gender and age. Results indicated that participants recognized real voices with an average accuracy rate of 70%, while cloned voices were correctly identified at 60%. Female voices were slightly better recognized than male voices, but age did not significantly affect recognition performance. Although participants often described cloned voices as “human-like,” their actual discrimination accuracy remained relatively low, suggesting that auditory cues alone may not be sufficient to distinguish AI-generated speech from authentic human speech. The findings highlight perceptual challenges and ethical concerns associated with rapidly developing speech synthesis technologies. Further research integrating emotional tone analysis, multimodal perception, and user familiarity with AI systems is recommended to deepen understanding of human–AI auditory interaction.

Keywords

References

Abdullah, A. A., Muhamad, S. S., & Veisi, H. (2024). Enhancing Kurdish text-to-speech with native corpus training: A high-quality WaveGlow vocoder approach. arXiv preprint arXiv:2409.13734.
Ahmed, M. A. M., Elghamrawy, K. A., & Taha, Z. A. E. (n.d.). (Voick): Enhancing accessibility in audiobooks through voice cloning technology. IEEE.
Bell, S. A. (2024). Vox ex machina: A cultural history of talking machines. HTML.
Car, L. T., Dhinagaran, D. A., Kyaw, B. M., Kowatsch, T., Joty, S., Theng, Y. L., & Atun, R. (2020). Conversational agents in health care: Scoping review and conceptual analysis. Journal of Medical Internet Research, 22(8), e17158.
Chan, M. P. Y., & Liberman, M. (2021). An acoustic analysis of vocal effort and speaking style. Proceedings of Meetings on Acoustics.
Chakraborty, T., Reddy, U. K. S., Naik, S. M., Panja, M., & Manvitha, B. (n.d.). Ten years of generative adversarial nets (GANs): A survey of the state-of-the-art. Machine Learning: Science and Technology, 5(1), 11001. https://doi.org/10.1088/2632-2153/ad1f77
Chuang, H., Tsai, Y., & Wu, C. (2021). Ethical considerations in voice cloning technology: A framework for responsible use. Journal of Artificial Intelligence Ethics, 5, 75–82.
Duffy, M., Arora, A., & Reddy, S. (2019). Voice banking: A practical approach for individuals with voice disorders. Speech, Language, and Hearing Services in Schools, 50(2), 159–168.

Inamdar, F. M., Ambesange, S., Mane, R., Hussain, H., Wagh, S., & Lakhe, P. (2023). Voice cloning using artificial intelligence and machine learning: A review. Journal of Advanced Zoology, 44.
Kain, A., & Laah, D. (2018). The role of multimodal cues in voice recognition: A review. Journal of Voice Research, 12(3), 195–210.
Koul, N. (2024). Ultimate deepfake detection using Python: Master deep learning techniques like CNNs, GANs, and transformers to detect deepfakes in images, audio.
Kreiman, J., & Sidtis, D. (2011). Voices and listeners: Toward a model of voice perception. Acoustics Today, 7(4), 7–15. https://doi.org/10.1121/1.3684228
Kucuk, S., Kim, H., & Lee, J. (2020). User experience in voice assistants: The impact of personalized voices on user engagement. International Journal of Human-Computer Interaction, 36(1), 40–56.
Kunfeng, W., Chao, G., Duan, Y., Yilun, L., Xinhu, Z., & Fei-yue, W. (n.d.). Generative adversarial networks: Introduction and outlook. IEEE/CAA Journal of Automatica Sinica. https://doi.org/10.1109/JAS.2017.7510583
Liu, X., Tan, Y., Hai, X., Yu, Q., & Zhou, Q. (n.d.). Hidden-in-wave: A novel idea to camouflage AI-synthesized voices based on speaker-irrelative features. IEEE.
Mishra, S. K., Yadav, S., & Kumar, H. (n.d.). Generative adversarial networks (GANs) in machine learning: Applications and challenges. International Journal of Applied Research. https://doi.org/10.22271/allresearch.2018.v4.i12g.11465
Napolitano, D. (n.d.). AI voice between anthropocentrism and posthumanism: Alexa and voice cloning. Journal of Interdisciplinary Voice Studies, 7(1), 35–49. https://doi.org/10.1386/jivs_00053_1
Neumann, J., Hiller, T., & Schmidt, F. (2020). Understanding the challenges of voice recognition in synthetic speech: An empirical study. Speech Communication, 115, 108–120.
Paul, A. (n.d.). The voice actor and their double: Working as a voice actor and teaching voice acting in the age of AI voice cloning. Tradition Innovations in Arts, Design, and Media Higher Education, 1(1). https://doi.org/10.9741/2996-4873.1017
Pisanski, K., & Bryant, G. A. (n.d.). The evolution of voice perception. In The Oxford Handbook of Voice Studies (pp. 268–300). https://doi.org/10.1093/oxfordhb/9780199982295.013.29
Qin, Z., Zhao, W., Yu, X., & Sun, X. (n.d.). OpenVoice: Versatile instant voice cloning. arXiv preprint arXiv:2312.01479. https://doi.org/10.48550/ARXIV.2312.01479
Ran, S. (n.d.). Applications and challenges of GAN in AI-powered artistry. Applied and Computational Engineering. https://doi.org/10.54254/2755-2721/41/20230713
Seong, J., Lee, W., & Lee, S. (n.d.). Multilingual speech synthesis for voice cloning. IEEE.
Sidtis, D., & Kreiman, J. (n.d.). In the beginning was the familiar voice: Personally familiar voices in the evolutionary and contemporary biology of communication. Integrative Psychological and Behavioral Science, 46(2), 146–159. https://doi.org/10.1007/s12124-011-9177-4
Snyder, D., Zhang, Y., & Yang, Y. (2016). Distinguishing synthetic voices from natural voices: A perceptual study. Acoustics Research Letters Online, 17(3), 215–220.
Vaishali, B. (2024). Comprehensive overview of GAN technology: Architecture, training, and applications. International Journal for Research in Applied Science and Engineering Technology. https://doi.org/10.22214/ijraset.2024.61016
van den Oord, A., Dieleman, S., & Zen, H. (2016). WaveNet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.

Details

Primary Language

English

Subjects

Music Technology and Recording

Journal Section

Research Article

Authors

Seyhan Canyakan ^*
0000-0001-6373-4245
Türkiye

Early Pub Date

February 22, 2026

Publication Date

February 22, 2026

Submission Date

February 23, 2025

Acceptance Date

October 15, 2025

Published in Issue

Year 2026 Volume: 7 Number: 1

DOI

https://doi.org/10.5281/zenodo.19079040

IZ

https://izlik.org/JA59PJ35KE

Cite

RIS / Bibtex

APA

Canyakan, S. (2026). Exploring perceptual boundaries: Assessing human ability to differentiate AI-cloned from real voices. Journal for the Interdisciplinary Art and Education, 7(1), 1-21. https://doi.org/10.5281/zenodo.19079040

AMA

1.Canyakan S. Exploring perceptual boundaries: Assessing human ability to differentiate AI-cloned from real voices. JIAE. 2026;7(1):1-21. doi:10.5281/zenodo.19079040

Chicago

Canyakan, Seyhan. 2026. “Exploring Perceptual Boundaries: Assessing Human Ability to Differentiate AI-Cloned from Real Voices”. Journal for the Interdisciplinary Art and Education 7 (1): 1-21. https://doi.org/10.5281/zenodo.19079040.

EndNote

Canyakan S (March 1, 2026) Exploring perceptual boundaries: Assessing human ability to differentiate AI-cloned from real voices. Journal for the Interdisciplinary Art and Education 7 1 1–21.

IEEE

[1]S. Canyakan, “Exploring perceptual boundaries: Assessing human ability to differentiate AI-cloned from real voices”, JIAE, vol. 7, no. 1, pp. 1–21, Mar. 2026, doi: 10.5281/zenodo.19079040.

ISNAD

Canyakan, Seyhan. “Exploring Perceptual Boundaries: Assessing Human Ability to Differentiate AI-Cloned from Real Voices”. Journal for the Interdisciplinary Art and Education 7/1 (March 1, 2026): 1-21. https://doi.org/10.5281/zenodo.19079040.

JAMA

1.Canyakan S. Exploring perceptual boundaries: Assessing human ability to differentiate AI-cloned from real voices. JIAE. 2026;7:1–21.

MLA

Canyakan, Seyhan. “Exploring Perceptual Boundaries: Assessing Human Ability to Differentiate AI-Cloned from Real Voices”. Journal for the Interdisciplinary Art and Education, vol. 7, no. 1, Mar. 2026, pp. 1-21, doi:10.5281/zenodo.19079040.

Vancouver

1.Seyhan Canyakan. Exploring perceptual boundaries: Assessing human ability to differentiate AI-cloned from real voices. JIAE. 2026 Mar. 1;7(1):1-21. doi:10.5281/zenodo.19079040