Research Article
BibTex RIS Cite

Anadili Türkçe olan konuşmacılar tarafından söylenen NATO fonetik alfabesinin akustik analizi: El ve gırtlak mikrofonlarının karşılaştırması

Year 2026, Volume: 15 Issue: 1, 1 - 1

Abstract

Bu pilot çalışma, anadili Türkçe olan konuşmacıların NATO fonetik alfabesini telaffuz ederken el tipi ve boğaz mikrofonlarıyla kaydedilen konuşma sinyallerinin karşılaştırmalı akustik analizini sunmaktadır. Toplam 2.080 ses örneği toplanmış ve ses etkinliği tespiti (VAD), gürültü azaltma ve sessizlik kırpma adımlarını içeren bir ön işleme sürecinden geçirilmiştir. Süre, ortalama karekök (RMS) enerjisi, sıfır geçiş oranı (ZCR) ve saniye başına sıfır geçiş yoğunluğu gibi akustik ölçümler çıkarılmıştır. Non-parametrik testler, mikrofon türleri arasında anlamlı farklar olduğunu ortaya koymuştur: El tipi mikrofon daha yüksek enerji ve zamansal kararlılık sağlarken, boğaz mikrofonu daha düşük enerji ancak daha yüksek spektral karmaşıklık göstermiştir. Bazı konuşmacılarda ses üretiminde gecikme ve artikülasyon asimetrileri gözlemlenmiştir. Bulgular, mikrofon türünün fonetik yapı ve sinyal kalitesini etkilediğini ve gürültülü ortamlarda otomatik konuşma tanıma (ASR) sistemleri için önemli çıkarımlar sunduğunu göstermektedir. Bu çalışma, gerçek dünya koşullarına uygun çok kanallı ASR sistemleri için temel oluşturmaktadır.

References

  • I.C.A.O., Manual on the ICAO phonetic alphabet. ICAO Publishing, 2007.
  •     M.D. Keller, J.M. Ziriax, W. Barns, B. Sheffield, D. Brungart, T. Thomas, B. Jaeger, and K. Yankaskas, Performance in noise: impact of reduced speech intelligibility on sailor performance in a Navy command and control environment. Hearing Research, 349, 55–66, 2017. https://doi.org/10.1016/j.heares.201 6.10.007.
  •     P. Boersma and D. Weenink, Praat: doing phonetics by computer. University of Amsterdam, 2023.
  •     M.A.T. Turan, Enhancement of throat microphone recordings using Gaussian mixture model probabilistic estimator. arXiv:1804.05937, 2018. https://doi.org/ 10.48550/arXiv.1804.05937
  •     B.E. Acker-Mills, A.J. Houtsma, and W.A. Ahroon, Speech intelligibility in noise using throat and acoustic microphones. Aviation, Space and Environmental Medicine, 77 (1), 26–31, 2006.
  •     E. Erzin, Improving throat microphone speech recognition by joint analysis of throat and acoustic microphone recordings. IEEE Transactions on Audio, Speech and Language Processing, 17 (7), 1316–1324, 2009.
  •     M.E. Arafat, I. Misra, and M.E. Hamid, A comparative study for throat microphone speech enhancement with different approaches. International Journal of Science and Research Archive, 13 (1), 850–859, 2024. https:// doi.org/10.30574/ijsra.2024.13.1.1631
  •     X. Huang, A. Acero, and H.W. Hon, Spoken language processing: a guide to theory, algorithm, and system development. Pearson Prentice Hall, Upper Saddle River, 2001.
  •     Y. Kobayashi, K. Watanabe, and M. Akagi, Analysis and synthesis of speech captured by a contact microphone using neural vocoders. IEEE/ACM Transactions on Audio, Speech and Language Processing, 28, 2619–2632, 2020.
  •   T. Nguyen and S. Kim, An investigation of throat microphone signals for speech enhancement in noisy environments. Sensors, 21 (3), 987, 2021.
  •   R. Microphones, RØDE AI-1 Audio Interface – Specifications. 2022. Accessed 21 April 2025. https:// www.rode.com/interfaces/ai-1
  •   S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N.E. Yalta Soplin, J. Heymann, M. Wiesner, N. Chen, A. Renduchintala, and T. Ochiai, ESPnet: end-to-end speech processing toolkit. 2018.
  •   S. Karita, N. Chen, T. Hayashi, T. Hori, H. Inaguma, Z. Jiang, M. Someki, N.E. Yalta Soplin, R. Yamamoto, X. Wang, S. Watanabe, T. Yoshimura, and W. Zhang, A comparative study on Transformer vs RNN in speech applications. arXiv:1909.06317, 2019. https://doi.org /10.48550/arXiv.1909.06317
  •   E. Salesky, M. Wiesner, J. Bremerman, R. Cattoni, M. Negri, M. Turchi, D.W. Oard, and M. Post, The Multilingual TEDx Corpus for speech recognition and translation. arXiv:2102.01757, 2021. https://doi.org/ 10.48550/arXiv.2102.01757
  •   N. Wilkinson and T. Niesler, A hybrid CNN-BiLSTM voice activity detector. arXiv:2103.03529, 2021. https://doi.org/10.48550/arXiv.2103.03529
  •   J. Thomas, noisereduce: a Python package for real-time noise removal from speech signals. 2022. Accessed 2 May 2025. https://pypi.org/project/noisereduce/
  •   G. Ioannides and V. Rallis, Real-time speech enhancement using spectral subtraction with minimum statistics and spectral floor. arXiv:2302.10313, 2023. https://doi.org/10.48550/arXiv.2302.10313
  •   B. McFee, C. Raffel, D. Liang, D.P.W. Ellis, M. McVicar, E. Battenberg, and O. Nieto, librosa: audio and music signal analysis in Python. Proceedings of the 14th Python in Science Conference, 18–25, 2015. https://doi.org/10.25080/Majora-7b98e3ed-003
  •   K.A. Abdalmalak and A. Gallardo-Antolín, Enhancement of a text-independent speaker verification system by using feature combination and parallel-structure classifiers. arXiv:2401.15018, 2024. https://doi.org/10.48550/arXiv.2401.15018
  •   S. Verma, R. Banerjee, and A. Singh, A comprehensive review of preprocessing strategies in speech recognition systems. ACM Transactions on Speech and Language Processing, 19 (1), 1–26, 2022.
  •   G. Degottex, P. Lanchantin, M.J. Gales, and S. King, A survey on acoustic representations for voice analysis and synthesis. IEEE/ACM Transactions on Audio, Speech and Language Processing, 29, 116–137, 2021.
  •   J.M. Valin and J. Skoglund, A real-time wideband neural vocoder at 1.6 kb/s using LPCNet. Proceedings of Interspeech, 3406–3410, 2019. https://doi.org/10. 21437/Interspeech.2019-1255
  •   A. Saeed, D. Grangier, and N. Zeghidour, Contrastive learning of general-purpose audio representations. arXiv:2010.10915, 2021. https://doi.org/10.48550/arX iv.2010.10915
  •   W. Wang, J. Yi, M. Wu, and X. Lei, Improving ASR robustness via uncertainty modeling and consistency training. Proceedings of Interspeech, 2022.
  •   Laerd Statistics, Mann–Whitney U Test using SPSS Statistics. 2021. Accessed 2 May 2025. https://stat istics.laerd.com/spss-tutorials/mann-whitne y-u-test-using-spss-statistics.php
  •   Y. Zhao, C. Ni, C.C. Leung, S. Joty, E.S. Chng, and B. Ma, A unified speaker adaptation approach for ASR. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 9398–9410, 2021. https://doi.org/10.18653/v1/2021.emnlp-main.737
  •   Y. Zhang, C. Wu, and Y. Wang, Acoustic environment and recording conditions for voice applications: a comparative study. IEEE Access, 8, 213187–213200, 2020.
  •   J.X. Ou, X. Ming, and A.C.L. Yu, Individual variability in subcortical neural encoding shapes phonetic cue weighting. Scientific Reports, 13 (1), 9991, 2023. https://doi.org/10.1038/s41598-023-37212-y
  •   D. Prabhu, P. Jyothi, S. Ganapathy, and V. Unni, Accented speech recognition with accent-specific codebooks. arXiv:2310.15970, 2023. https://doi.org/ 10.48550/arXiv.2310.15970
  •   M. McAuliffe, M. Socolof, S. Mihuc, M. Wagner, and M. Sonderegger, Montreal Forced Aligner: trainable text–speech alignment using Kaldi. Proceedings of Interspeech 2017, 498–502, 2017. https://doi.org/10.21 437/Interspeech.2017-1386
  •   H. Kallio, S. Antti, and J. Šimko, Fluency-related temporal features and syllable prominence as prosodic proficiency predictors for learners of English with different language backgrounds. Language and Speech, 65 (3), 571–597, 2022. https://doi.org/10.1177/0023 8309211040175
  •   O. Türk and H. Açıkgoz, Phonological influence of Turkish on English pronunciation: a comparative spectrographic study. Journal of Phonetics and Speech Sciences, 14, 157–170, 2022.
  •   K.J. Han, R. Prieto, K. Wu, and T. Ma, State-of-the-art speech recognition using multi-stream self-attention with dilated 1D convolutions. arXiv:1910.00716, 2019. https://doi.org/10.48550/arXiv.1910.00716
  •   K. Tomanek, V. Zayats, D. Padfield, K. Vaillancourt, and F. Biadsy, Residual adapters for parameter-efficient ASR adaptation to atypical and accented speech. arXiv:2109.06952, 2021. https://doi.org/10. 48550/arXiv.2109.06952
  •   A. Radford, J.W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, Robust speech recognition via large-scale weak supervision. arXiv:2212.04356, 2023. https://doi.org/10.48550/arX iv.2212.04356

Acoustic analysis of the NATO phonetic alphabet spoken by native Turkish speakers: A comparison of handheld and throat microphones

Year 2026, Volume: 15 Issue: 1, 1 - 1

Abstract

This pilot study presents a comparative acoustic analysis of speech signals recorded with handheld and throat microphones during the pronunciation of the NATO phonetic alphabet by native Turkish speakers. A total of 2,080 voice samples were collected and preprocessed using voice activity detection (VAD), noise reduction, and silence trimming. Acoustic metrics; duration, root mean square (RMS) energy, zero-crossing rate (ZCR), and zero-crossing density, were extracted. Non-parametric tests revealed significant differences between microphone types: the handheld device yielded higher energy and stability, while the throat microphone showed lower energy but higher spectral complexity. Results also indicated phonatory delay and articulation asymmetries in some speakers. The findings suggest that microphone type affects phonetic structure and signal quality, with implications for automatic speech recognition (ASR) systems in noisy environments. Notably, the throat microphone increased zero-crossing density due to internal vibration sensitivity. This study lays the groundwork for multichannel ASR systems designed for real-world acoustic variability.

Ethical Statement

This study was conducted with the approval of the Ethics Committee for Social and Human Sciences Research of Ondokuz Mayıs University. The approval was granted on 29 November 2024 under the decision number 2024-1118. The research involved voice recordings and interviews carried out as part of a master’s thesis titled “Machine Learning-Based Analysis and Resolution of Multilingual Pronunciation Issues in the NATO Phonetic Alphabet”, under the supervision of Dr. Öğr. Üyesi Selim Aras.

Thanks

This study was conducted with the approval of the Ethics Committee for Social and Human Sciences Research of Ondokuz Mayıs University. The approval was granted on 29 November 2024 under the decision number 2024-1118. The research involved voice recordings and interviews carried out as part of a master’s thesis titled “Machine Learning-Based Analysis and Resolution of Multilingual Pronunciation Issues in the NATO Phonetic Alphabet”, under the supervision of Dr. Öğr. Üyesi Selim Aras.

References

  • I.C.A.O., Manual on the ICAO phonetic alphabet. ICAO Publishing, 2007.
  •     M.D. Keller, J.M. Ziriax, W. Barns, B. Sheffield, D. Brungart, T. Thomas, B. Jaeger, and K. Yankaskas, Performance in noise: impact of reduced speech intelligibility on sailor performance in a Navy command and control environment. Hearing Research, 349, 55–66, 2017. https://doi.org/10.1016/j.heares.201 6.10.007.
  •     P. Boersma and D. Weenink, Praat: doing phonetics by computer. University of Amsterdam, 2023.
  •     M.A.T. Turan, Enhancement of throat microphone recordings using Gaussian mixture model probabilistic estimator. arXiv:1804.05937, 2018. https://doi.org/ 10.48550/arXiv.1804.05937
  •     B.E. Acker-Mills, A.J. Houtsma, and W.A. Ahroon, Speech intelligibility in noise using throat and acoustic microphones. Aviation, Space and Environmental Medicine, 77 (1), 26–31, 2006.
  •     E. Erzin, Improving throat microphone speech recognition by joint analysis of throat and acoustic microphone recordings. IEEE Transactions on Audio, Speech and Language Processing, 17 (7), 1316–1324, 2009.
  •     M.E. Arafat, I. Misra, and M.E. Hamid, A comparative study for throat microphone speech enhancement with different approaches. International Journal of Science and Research Archive, 13 (1), 850–859, 2024. https:// doi.org/10.30574/ijsra.2024.13.1.1631
  •     X. Huang, A. Acero, and H.W. Hon, Spoken language processing: a guide to theory, algorithm, and system development. Pearson Prentice Hall, Upper Saddle River, 2001.
  •     Y. Kobayashi, K. Watanabe, and M. Akagi, Analysis and synthesis of speech captured by a contact microphone using neural vocoders. IEEE/ACM Transactions on Audio, Speech and Language Processing, 28, 2619–2632, 2020.
  •   T. Nguyen and S. Kim, An investigation of throat microphone signals for speech enhancement in noisy environments. Sensors, 21 (3), 987, 2021.
  •   R. Microphones, RØDE AI-1 Audio Interface – Specifications. 2022. Accessed 21 April 2025. https:// www.rode.com/interfaces/ai-1
  •   S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N.E. Yalta Soplin, J. Heymann, M. Wiesner, N. Chen, A. Renduchintala, and T. Ochiai, ESPnet: end-to-end speech processing toolkit. 2018.
  •   S. Karita, N. Chen, T. Hayashi, T. Hori, H. Inaguma, Z. Jiang, M. Someki, N.E. Yalta Soplin, R. Yamamoto, X. Wang, S. Watanabe, T. Yoshimura, and W. Zhang, A comparative study on Transformer vs RNN in speech applications. arXiv:1909.06317, 2019. https://doi.org /10.48550/arXiv.1909.06317
  •   E. Salesky, M. Wiesner, J. Bremerman, R. Cattoni, M. Negri, M. Turchi, D.W. Oard, and M. Post, The Multilingual TEDx Corpus for speech recognition and translation. arXiv:2102.01757, 2021. https://doi.org/ 10.48550/arXiv.2102.01757
  •   N. Wilkinson and T. Niesler, A hybrid CNN-BiLSTM voice activity detector. arXiv:2103.03529, 2021. https://doi.org/10.48550/arXiv.2103.03529
  •   J. Thomas, noisereduce: a Python package for real-time noise removal from speech signals. 2022. Accessed 2 May 2025. https://pypi.org/project/noisereduce/
  •   G. Ioannides and V. Rallis, Real-time speech enhancement using spectral subtraction with minimum statistics and spectral floor. arXiv:2302.10313, 2023. https://doi.org/10.48550/arXiv.2302.10313
  •   B. McFee, C. Raffel, D. Liang, D.P.W. Ellis, M. McVicar, E. Battenberg, and O. Nieto, librosa: audio and music signal analysis in Python. Proceedings of the 14th Python in Science Conference, 18–25, 2015. https://doi.org/10.25080/Majora-7b98e3ed-003
  •   K.A. Abdalmalak and A. Gallardo-Antolín, Enhancement of a text-independent speaker verification system by using feature combination and parallel-structure classifiers. arXiv:2401.15018, 2024. https://doi.org/10.48550/arXiv.2401.15018
  •   S. Verma, R. Banerjee, and A. Singh, A comprehensive review of preprocessing strategies in speech recognition systems. ACM Transactions on Speech and Language Processing, 19 (1), 1–26, 2022.
  •   G. Degottex, P. Lanchantin, M.J. Gales, and S. King, A survey on acoustic representations for voice analysis and synthesis. IEEE/ACM Transactions on Audio, Speech and Language Processing, 29, 116–137, 2021.
  •   J.M. Valin and J. Skoglund, A real-time wideband neural vocoder at 1.6 kb/s using LPCNet. Proceedings of Interspeech, 3406–3410, 2019. https://doi.org/10. 21437/Interspeech.2019-1255
  •   A. Saeed, D. Grangier, and N. Zeghidour, Contrastive learning of general-purpose audio representations. arXiv:2010.10915, 2021. https://doi.org/10.48550/arX iv.2010.10915
  •   W. Wang, J. Yi, M. Wu, and X. Lei, Improving ASR robustness via uncertainty modeling and consistency training. Proceedings of Interspeech, 2022.
  •   Laerd Statistics, Mann–Whitney U Test using SPSS Statistics. 2021. Accessed 2 May 2025. https://stat istics.laerd.com/spss-tutorials/mann-whitne y-u-test-using-spss-statistics.php
  •   Y. Zhao, C. Ni, C.C. Leung, S. Joty, E.S. Chng, and B. Ma, A unified speaker adaptation approach for ASR. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 9398–9410, 2021. https://doi.org/10.18653/v1/2021.emnlp-main.737
  •   Y. Zhang, C. Wu, and Y. Wang, Acoustic environment and recording conditions for voice applications: a comparative study. IEEE Access, 8, 213187–213200, 2020.
  •   J.X. Ou, X. Ming, and A.C.L. Yu, Individual variability in subcortical neural encoding shapes phonetic cue weighting. Scientific Reports, 13 (1), 9991, 2023. https://doi.org/10.1038/s41598-023-37212-y
  •   D. Prabhu, P. Jyothi, S. Ganapathy, and V. Unni, Accented speech recognition with accent-specific codebooks. arXiv:2310.15970, 2023. https://doi.org/ 10.48550/arXiv.2310.15970
  •   M. McAuliffe, M. Socolof, S. Mihuc, M. Wagner, and M. Sonderegger, Montreal Forced Aligner: trainable text–speech alignment using Kaldi. Proceedings of Interspeech 2017, 498–502, 2017. https://doi.org/10.21 437/Interspeech.2017-1386
  •   H. Kallio, S. Antti, and J. Šimko, Fluency-related temporal features and syllable prominence as prosodic proficiency predictors for learners of English with different language backgrounds. Language and Speech, 65 (3), 571–597, 2022. https://doi.org/10.1177/0023 8309211040175
  •   O. Türk and H. Açıkgoz, Phonological influence of Turkish on English pronunciation: a comparative spectrographic study. Journal of Phonetics and Speech Sciences, 14, 157–170, 2022.
  •   K.J. Han, R. Prieto, K. Wu, and T. Ma, State-of-the-art speech recognition using multi-stream self-attention with dilated 1D convolutions. arXiv:1910.00716, 2019. https://doi.org/10.48550/arXiv.1910.00716
  •   K. Tomanek, V. Zayats, D. Padfield, K. Vaillancourt, and F. Biadsy, Residual adapters for parameter-efficient ASR adaptation to atypical and accented speech. arXiv:2109.06952, 2021. https://doi.org/10. 48550/arXiv.2109.06952
  •   A. Radford, J.W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, Robust speech recognition via large-scale weak supervision. arXiv:2212.04356, 2023. https://doi.org/10.48550/arX iv.2212.04356
There are 35 citations in total.

Details

Primary Language Turkish
Subjects Audio Processing, Human-Computer Interaction, Speech Recognition
Journal Section Research Article
Authors

Julio Cesar Velazquez Garcia 0009-0007-2270-8068

Selim Aras 0000-0003-1231-5782

Early Pub Date December 2, 2025
Publication Date December 4, 2025
Submission Date May 23, 2025
Acceptance Date October 21, 2025
Published in Issue Year 2026 Volume: 15 Issue: 1

Cite

APA Velazquez Garcia, J. C., & Aras, S. (2025). Anadili Türkçe olan konuşmacılar tarafından söylenen NATO fonetik alfabesinin akustik analizi: El ve gırtlak mikrofonlarının karşılaştırması. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, 15(1), 1-1. https://doi.org/10.28948/ngumuh.1705336
AMA Velazquez Garcia JC, Aras S. Anadili Türkçe olan konuşmacılar tarafından söylenen NATO fonetik alfabesinin akustik analizi: El ve gırtlak mikrofonlarının karşılaştırması. NOHU J. Eng. Sci. December 2025;15(1):1-1. doi:10.28948/ngumuh.1705336
Chicago Velazquez Garcia, Julio Cesar, and Selim Aras. “Anadili Türkçe Olan Konuşmacılar Tarafından Söylenen NATO Fonetik Alfabesinin Akustik Analizi: El Ve Gırtlak Mikrofonlarının Karşılaştırması”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 15, no. 1 (December 2025): 1-1. https://doi.org/10.28948/ngumuh.1705336.
EndNote Velazquez Garcia JC, Aras S (December 1, 2025) Anadili Türkçe olan konuşmacılar tarafından söylenen NATO fonetik alfabesinin akustik analizi: El ve gırtlak mikrofonlarının karşılaştırması. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 15 1 1–1.
IEEE J. C. Velazquez Garcia and S. Aras, “Anadili Türkçe olan konuşmacılar tarafından söylenen NATO fonetik alfabesinin akustik analizi: El ve gırtlak mikrofonlarının karşılaştırması”, NOHU J. Eng. Sci., vol. 15, no. 1, pp. 1–1, 2025, doi: 10.28948/ngumuh.1705336.
ISNAD Velazquez Garcia, Julio Cesar - Aras, Selim. “Anadili Türkçe Olan Konuşmacılar Tarafından Söylenen NATO Fonetik Alfabesinin Akustik Analizi: El Ve Gırtlak Mikrofonlarının Karşılaştırması”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 15/1 (December2025), 1-1. https://doi.org/10.28948/ngumuh.1705336.
JAMA Velazquez Garcia JC, Aras S. Anadili Türkçe olan konuşmacılar tarafından söylenen NATO fonetik alfabesinin akustik analizi: El ve gırtlak mikrofonlarının karşılaştırması. NOHU J. Eng. Sci. 2025;15:1–1.
MLA Velazquez Garcia, Julio Cesar and Selim Aras. “Anadili Türkçe Olan Konuşmacılar Tarafından Söylenen NATO Fonetik Alfabesinin Akustik Analizi: El Ve Gırtlak Mikrofonlarının Karşılaştırması”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, vol. 15, no. 1, 2025, pp. 1-1, doi:10.28948/ngumuh.1705336.
Vancouver Velazquez Garcia JC, Aras S. Anadili Türkçe olan konuşmacılar tarafından söylenen NATO fonetik alfabesinin akustik analizi: El ve gırtlak mikrofonlarının karşılaştırması. NOHU J. Eng. Sci. 2025;15(1):1-.

download