Feed-Forward Deep Neural Network Model Based Speech Recognition System for Speech Signal

Mahesh K. Singh; Sanjeev Kumar; Rajeev Ranjan

doi:10.35377/saucis...1819908

EN

Feed-Forward Deep Neural Network Model Based Speech Recognition System for Speech Signal

Abstract

This research work aims to enhance speech recognition accuracy and system generalization performance by optimizing deep neural network (DNN) Systems. The Experiments are conducted using a standard benchmark speech dataset and an independent real-time speech dataset while following a complete speaker-independent assessment method. The baseline model uses a feed-forward DNN, which researchers improve through Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) and Whale Optimization Algorithm (WOA), and the proposed Neural Whale Optimization Algorithm (NOWOA). Comprehensive evaluations, including confusion matrix-based metrics, 5-fold cross-validation, and overfitting analysis, are performed to assess robustness and reliability. Experimental results demonstrate that the baseline DNN achieves approximately 50\% recognition accuracy, while optimization significantly enhances performance. The proposed NOWOA-optimized DNN system achieves the highest recognition accuracy of 99.36\% among all tested methods, proving its effectiveness for speech recognition tasks on both standard and real-time datasets.

Keywords

Ethical Statement

It is declared that during the preparation process of this study, scientific and ethical principles were followed.

References

L. S. Hussein and S. A. Mahmood, Kurdish Speech-to-Text Recognition System Based on Deep Convolutional-Recurrent Neural Networks, UHD Journal of Science and Technology, vol. 6, no. 2, pp. 117–125, 2022. doi:10.21928/uhdjst.v6n2y2022.pp117-125.
D. Dutta, R. D. Choudhury, and U. Barman, Assamese Speech-Based Terminology Identification System using Convolutional Neural Network, International Journal of Computing and Digital Systems, vol. 12, no. 1, pp. 1191–1202, 2022. doi:10.12785/ijcds/120195.
M. Al Dabel, Speech Attribute Detection to Recognize Arabic Broadcast Speech in Industrial Networks, Mobile Information Systems, vol. 2022, pp. 1–13, 2022. doi:10.1155/2022/3732442.
M. K. Singh, Speaker Identification using MFCC Feature Extraction ANN Classification Technique, Wireless Personal Communications, vol. 136, no. 1, pp. 453–467, 2024. doi:10.1007/s11277-024-11282-1.
S. S. Nagineni, K. C. Krishna, M. Harish, and B. Nayak, Implementation of Supervised Speech Enhancement Model Using Fully Connected Feed-Forward Networks, International Journal of Innovative Science, Engineering & Technology, vol. 9, pp. 2348–7968, 2022.
Y. Choi, J. Jang, and M. W. Koo, A Korean Menu-Ordering Sentence Text-to-Speech System Using Conformer-Based FastSpeech2, The Journal of the Acoustical Society of Korea, vol. 41, no. 3, pp. 359–366, 2022. doi:10.7776/ASK.2022.41.3.359.
M. K. Singh, Identification of Speaker from Disguised Voice using MFCC Feature Extraction, Chi-Square and Classification Technique, Wireless Personal Communications, vol. 138, no. 2, pp. 973–987, 2024. doi:10.1007/s11277-024-11542-0.
G. Chakraborty, M. Sharma, N. Saikia, and K. K. Sarma, Soft-Computation-Based Speech Recognition System for Sylheti Language, International Journal of Speech Technology, vol. 25, no. 2, pp. 499–509, 2022. doi:10.1007/s10772-022-09976-7.

M. K. Singh, Feature Extraction and Classification Efficiency Analysis Using Machine Learning Approach for Speech Signal, Multimedia Tools and Applications, vol. 83, no. 16, pp. 47069–47084, 2024. doi:10.1007/s11042-023-17368-5.
R. Shashidhar and S. Patilkulkarni, Audiovisual Speech Recognition for Kannada Language Using Feed-Forward Neural Network, Neural Computing and Applications, vol. 34, no. 18, pp. 15603–15615, 2022. doi:10.1007/s00521-022-07249-7.
N. Rajput and S. K. Verma, Backpropagation Feed Forward Neural Network Approach for Speech Recognition, in Proc. 3rd Int. Conf. Reliability, Infocom Technologies and Optimization (ICRITO), pp. 1–6, 2014. doi:10.1109/ICRITO.2014.7014712.
M. K. Singh, Multimedia Application for Forensic Automatic Speaker Recognition from Disguised Voices Using MFCC Feature Extraction and Classification Techniques, Multimedia Tools and Applications, vol. 83, no. 32, pp. 77327–77345, 2024. doi:10.1007/s11042-024-18602-4.
T. Shaikh and A. Jadhav, Music Generation Using Dual Interactive Wasserstein Fourier Acquisitive Generative Adversarial Network, International Journal of Computational Intelligence and Applications, Art. no. 2450026, 2024. doi:10.1142/S1469026824500263.
I. A. Thukroo, R. Bashir, and K. J. Giri, A Review into Deep Learning Techniques for Spoken Language Identification, Multimedia Tools and Applications, vol. 81, no. 22, pp. 32593–32624, 2022. doi:10.1007/s11042-022-13054-0.
S. Kahlouche, M. Belhocine, and A. Menouar, Real-Time Human Action Recognition Using Deep Learning Architecture, International Journal of Computational Intelligence and Applications, vol. 20, no. 4, Art. no. 2150026, 2021. doi:10.1142/S1469026821500267.
B. Xue, M. Zhang, and W. N. Browne, A Comprehensive Comparison on Evolutionary Feature Selection Approaches to Classification, International Journal of Computational Intelligence and Applications, vol. 14, no. 2, Art. no. 1550008, 2015. doi:10.1142/S146902681550008X.
A. Zeyer, P. Doetsch, P. Voigtlaender, R. Schlüter, and H. Ney, A Comprehensive Study of Deep Bidirectional LSTM RNN for Acoustic Modeling in Speech Recognition, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), pp. 2462–2466, 2017. doi:10.1109/ICASSP.2017.7952599.
L. Lu, X. Zhang, K. Cho, and S. Renals, A Study of the Recurrent Neural Network Encoder-Decoder for Large Terminology Speech Recognition, in Proc. Annu. Conf. Int. Speech Communication Association (INTERSPEECH), pp. 3249–3253, 2015.
R. Nainvarapu, R. B. Tummala, and M. K. Singh, A Slant Transform and Diagonal Laplacian Based Fusion Algorithm for Visual Sensor Network Applications, in High-Performance Computing and Networking, Springer, Singapore, pp. 181–191, 2022. doi:10.1007/978-981-16-9885-9_15.
T. G. Fantaye, J. Yu, and T. T. Hailu, Advanced Convolutional Neural Network-Based Hybrid Acoustic Models for Low-Resource Speech Recognition, Computers, vol. 9, no. 2, Art. no. 36, 2020. doi:10.3390/computers9020036.
G. Korvel, P. Treigys, G. Tamulevicus, J. Bernataviciene, and B. Kostek, Analysis of 2D Feature Spaces for Deep Learning-Based Speech Recognition, Journal of the Audio Engineering Society, vol. 66, no. 12, pp. 1072–1081, 2018. doi:10.17743/jaes.2018.0066.
K. Marasek, Deep Belief Neural Networks and Bidirectional Long-Short Term Memory Hybrid for Speech Recognition, Archives of Acoustics, vol. 40, no. 2, pp. 191–195, 2015. doi:10.1515/aoa-2015-0021.
V. Tomar, M. Bansal, and P. Singh, Metaheuristic Algorithms for Optimization: A Brief Review, Engineering Proceedings, vol. 59, no. 1, p. 238, 2023. doi:10.3390/engproc2023059238.
A. S. Dhanjal and W. Singh, A Comprehensive Survey on Automatic Speech Recognition Using Neural Networks, Multimedia Tools and Applications, vol. 83, no. 8, pp. 23367–23412, 2024. doi:10.1007/s11042-023-16438-y.

Details

Primary Language

English

Subjects

Artificial Intelligence (Other)

Journal Section

Research Article

Authors

Mahesh K. Singh ^*
0009-0006-5036-6037
India

Sanjeev Kumar
0000-0001-7997-0111
India

Rajeev Ranjan
0000-0003-2359-4879
India

Early Pub Date

June 25, 2026

Publication Date

June 30, 2026

Submission Date

November 8, 2025

Acceptance Date

March 30, 2026

Published in Issue

Year 2026 Volume: 9 Number: 3

DOI

https://doi.org/10.35377/saucis...1819908

IZ

https://izlik.org/JA98RE79EU

Cite

RIS / Bibtex

APA

Singh, M. K., Kumar, S., & Ranjan, R. (2026). Feed-Forward Deep Neural Network Model Based Speech Recognition System for Speech Signal. Sakarya University Journal of Computer and Information Sciences, 9(3), 920-933. https://doi.org/10.35377/saucis...1819908

AMA

1.Singh MK, Kumar S, Ranjan R. Feed-Forward Deep Neural Network Model Based Speech Recognition System for Speech Signal. SAUCIS. 2026;9(3):920-933. doi:10.35377/saucis.1819908

Chicago

Singh, Mahesh K., Sanjeev Kumar, and Rajeev Ranjan. 2026. “Feed-Forward Deep Neural Network Model Based Speech Recognition System for Speech Signal”. Sakarya University Journal of Computer and Information Sciences 9 (3): 920-33. https://doi.org/10.35377/saucis. 1819908.

EndNote

Singh MK, Kumar S, Ranjan R (June 1, 2026) Feed-Forward Deep Neural Network Model Based Speech Recognition System for Speech Signal. Sakarya University Journal of Computer and Information Sciences 9 3 920–933.

IEEE

[1]M. K. Singh, S. Kumar, and R. Ranjan, “Feed-Forward Deep Neural Network Model Based Speech Recognition System for Speech Signal”, SAUCIS, vol. 9, no. 3, pp. 920–933, June 2026, doi: 10.35377/saucis...1819908.

ISNAD

Singh, Mahesh K. - Kumar, Sanjeev - Ranjan, Rajeev. “Feed-Forward Deep Neural Network Model Based Speech Recognition System for Speech Signal”. Sakarya University Journal of Computer and Information Sciences 9/3 (June 1, 2026): 920-933. https://doi.org/10.35377/saucis. 1819908.

JAMA

1.Singh MK, Kumar S, Ranjan R. Feed-Forward Deep Neural Network Model Based Speech Recognition System for Speech Signal. SAUCIS. 2026;9:920–933.

MLA

Singh, Mahesh K., et al. “Feed-Forward Deep Neural Network Model Based Speech Recognition System for Speech Signal”. Sakarya University Journal of Computer and Information Sciences, vol. 9, no. 3, June 2026, pp. 920-33, doi:10.35377/saucis. 1819908.

Vancouver

1.Mahesh K. Singh, Sanjeev Kumar, Rajeev Ranjan. Feed-Forward Deep Neural Network Model Based Speech Recognition System for Speech Signal. SAUCIS. 2026 Jun. 1;9(3):920-33. doi:10.35377/saucis. 1819908