Feed-Forward Deep Neural Network Model Based Speech Recognition System for Speech Signal
Abstract
This research work aims to enhance speech recognition accuracy and system generalization performance by optimizing deep neural network (DNN) Systems. The Experiments are conducted using a standard benchmark speech dataset and an independent real-time speech dataset while following a complete speaker-independent assessment method. The baseline model uses a feed-forward DNN, which researchers improve through Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) and Whale Optimization Algorithm (WOA), and the proposed Neural Whale Optimization Algorithm (NOWOA). Comprehensive evaluations, including confusion matrix-based metrics, 5-fold cross-validation, and overfitting analysis, are performed to assess robustness and reliability. Experimental results demonstrate that the baseline DNN achieves approximately 50\% recognition accuracy, while optimization significantly enhances performance. The proposed NOWOA-optimized DNN system achieves the highest recognition accuracy of 99.36\% among all tested methods, proving its effectiveness for speech recognition tasks on both standard and real-time datasets.
Keywords
Ethical Statement
References
- L. S. Hussein and S. A. Mahmood, Kurdish Speech-to-Text Recognition System Based on Deep Convolutional-Recurrent Neural Networks, UHD Journal of Science and Technology, vol. 6, no. 2, pp. 117–125, 2022. doi:10.21928/uhdjst.v6n2y2022.pp117-125.
- D. Dutta, R. D. Choudhury, and U. Barman, Assamese Speech-Based Terminology Identification System using Convolutional Neural Network, International Journal of Computing and Digital Systems, vol. 12, no. 1, pp. 1191–1202, 2022. doi:10.12785/ijcds/120195.
- M. Al Dabel, Speech Attribute Detection to Recognize Arabic Broadcast Speech in Industrial Networks, Mobile Information Systems, vol. 2022, pp. 1–13, 2022. doi:10.1155/2022/3732442.
- M. K. Singh, Speaker Identification using MFCC Feature Extraction ANN Classification Technique, Wireless Personal Communications, vol. 136, no. 1, pp. 453–467, 2024. doi:10.1007/s11277-024-11282-1.
- S. S. Nagineni, K. C. Krishna, M. Harish, and B. Nayak, Implementation of Supervised Speech Enhancement Model Using Fully Connected Feed-Forward Networks, International Journal of Innovative Science, Engineering & Technology, vol. 9, pp. 2348–7968, 2022.
- Y. Choi, J. Jang, and M. W. Koo, A Korean Menu-Ordering Sentence Text-to-Speech System Using Conformer-Based FastSpeech2, The Journal of the Acoustical Society of Korea, vol. 41, no. 3, pp. 359–366, 2022. doi:10.7776/ASK.2022.41.3.359.
- M. K. Singh, Identification of Speaker from Disguised Voice using MFCC Feature Extraction, Chi-Square and Classification Technique, Wireless Personal Communications, vol. 138, no. 2, pp. 973–987, 2024. doi:10.1007/s11277-024-11542-0.
- G. Chakraborty, M. Sharma, N. Saikia, and K. K. Sarma, Soft-Computation-Based Speech Recognition System for Sylheti Language, International Journal of Speech Technology, vol. 25, no. 2, pp. 499–509, 2022. doi:10.1007/s10772-022-09976-7.
Details
Primary Language
English
Subjects
Artificial Intelligence (Other)
Journal Section
Research Article
Authors
Early Pub Date
June 25, 2026
Publication Date
June 30, 2026
Submission Date
November 8, 2025
Acceptance Date
March 30, 2026
Published in Issue
Year 2026 Volume: 9 Number: 3
