In this paper, an automatic speech recognition system based on convolutional neural networks and MFCC has been proposed. We have been investigated some deep models’ architecture with various hyperparameters options such as Dropout rate and Learning rate. The dataset used in this paper was collected from Kaggle TensorFlow Speech Recognition Challenge. Each audio file in the dataset contain one word with one second length the total words in the dataset correspond to 30 categories with one category for background noise. The dataset contains 64,721 files has been separated into 51,088 for the training set, 6,798 for the validation set and 6,835 for the testing set. We have evaluated 3 models with different hyperparameters configuration in order to choose the best model with higher accuracy. The highest accuracy achieved is 88.21%.
convolutional neural networks, FFT, MFCC, speech recognition, feature extraction