<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.4 20241031//EN"
        "https://jats.nlm.nih.gov/publishing/1.4/JATS-journalpublishing1-4.dtd">
<article  article-type="research-article"        dtd-version="1.4">
            <front>

                <journal-meta>
                                                                <journal-id>adv. artif. intell. res.</journal-id>
            <journal-title-group>
                                                                                    <journal-title>Advances in Artificial Intelligence Research</journal-title>
            </journal-title-group>
                            <issn pub-type="ppub">2757-7422</issn>
                                                                                                        <publisher>
                    <publisher-name>Osman ÖZKARACA</publisher-name>
                </publisher>
                    </journal-meta>
                <article-meta>
                                        <article-id/>
                                                                <article-categories>
                                            <subj-group  xml:lang="en">
                                                            <subject>Artificial Intelligence</subject>
                                                    </subj-group>
                                            <subj-group  xml:lang="tr">
                                                            <subject>Yapay Zeka</subject>
                                                    </subj-group>
                                    </article-categories>
                                                                                                                                                        <title-group>
                                                                                                                                                            <article-title>Speech recognition based on convolutional neural networks and MFCC algorithm</article-title>
                                                                                                    </title-group>
            
                                                    <contrib-group content-type="authors">
                                                                        <contrib contrib-type="author">
                                                                    <contrib-id contrib-id-type="orcid">
                                        https://orcid.org/0000-0001-8101-4970</contrib-id>
                                                                <name>
                                    <surname>Mahmood</surname>
                                    <given-names>Arzo</given-names>
                                </name>
                                                                    <aff>SÜLEYMAN DEMİREL ÜNİVERSİTESİ</aff>
                                                            </contrib>
                                                    <contrib contrib-type="author">
                                                                    <contrib-id contrib-id-type="orcid">
                                        https://orcid.org/0000-0002-9652-6415</contrib-id>
                                                                <name>
                                    <surname>Köse</surname>
                                    <given-names>Utku</given-names>
                                </name>
                                                                    <aff>Süleyman Demirel Üniversitesi</aff>
                                                            </contrib>
                                                                                </contrib-group>
                        
                                        <pub-date pub-type="pub" iso-8601-date="20210115">
                    <day>01</day>
                    <month>15</month>
                    <year>2021</year>
                </pub-date>
                                        <volume>1</volume>
                                        <issue>1</issue>
                                        <fpage>6</fpage>
                                        <lpage>12</lpage>
                        
                        <history>
                                    <date date-type="received" iso-8601-date="20200712">
                        <day>07</day>
                        <month>12</month>
                        <year>2020</year>
                    </date>
                                                    <date date-type="accepted" iso-8601-date="20201125">
                        <day>11</day>
                        <month>25</month>
                        <year>2020</year>
                    </date>
                            </history>
                                        <permissions>
                    <copyright-statement>Copyright © 2020, Advances in Artificial Intelligence Research</copyright-statement>
                    <copyright-year>2020</copyright-year>
                    <copyright-holder>Advances in Artificial Intelligence Research</copyright-holder>
                </permissions>
            
                                                                                                                        <abstract><p>In this paper, an automatic speech recognition system based on convolutional neural networks and MFCC has been proposed. We have been investigated some deep models’ architecture with various hyperparameters options such as Dropout rate and Learning rate. The dataset used in this paper was collected from Kaggle TensorFlow Speech Recognition Challenge. Each audio file in the dataset contain one word with one second length the total words in the dataset correspond to 30 categories with one category for background noise. The dataset contains 64,721 files has been separated into 51,088 for the training set, 6,798 for the validation set and 6,835 for the testing set. We have evaluated 3 models with different hyperparameters configuration in order to choose the best model with higher accuracy. The highest accuracy achieved is 88.21%.</p></abstract>
                                                            
            
                                                                                        <kwd-group>
                                                    <kwd>convolutional neural networks</kwd>
                                                    <kwd>  FFT</kwd>
                                                    <kwd>  MFCC</kwd>
                                                    <kwd>  speech recognition</kwd>
                                                    <kwd>  feature extraction</kwd>
                                            </kwd-group>
                            
                                                                                                                                                    </article-meta>
    </front>
    <back>
                            <ref-list>
                                    <ref id="ref1">
                        <label>1</label>
                        <mixed-citation publication-type="journal">M. A. Anusuya and S. K. Katti. “Speech Recognition by Machine, A Review”. In: International Journal of Computer Science and Information Security, IJCSIS, Vol. 6, No. 3, pp. 181-205, December 2009, USA (Jan. 2010).</mixed-citation>
                    </ref>
                                    <ref id="ref2">
                        <label>2</label>
                        <mixed-citation publication-type="journal">Han, Wei, et al. &quot;An efficient MFCC extraction method in speech recognition.&quot; 2006 IEEE international symposium on circuits and systems. IEEE, 2006.</mixed-citation>
                    </ref>
                                    <ref id="ref3">
                        <label>3</label>
                        <mixed-citation publication-type="journal">Jiang, Fei, et al. &quot;An event recognition method for fiber distributed acoustic sensing systems based on the combination of MFCC and CNN.&quot; 2017 International Conference on Optical Instruments and Technology: Advanced Optical Sensors and Applications. Vol. 10618. International Society for Optics and Photonics, 2018.</mixed-citation>
                    </ref>
                                    <ref id="ref4">
                        <label>4</label>
                        <mixed-citation publication-type="journal">Warden, Pete. &quot;Speech commands: A dataset for limited-vocabulary speech recognition.&quot; arXiv preprint arXiv:1804.03209 (2018).</mixed-citation>
                    </ref>
                                    <ref id="ref5">
                        <label>5</label>
                        <mixed-citation publication-type="journal">S. Davis and P. Mermelstein. “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”. In: IEEE Transactions on Acoustics, Speech, and Signal Processing 28.4 (Aug. 1980), pp. 357–366.</mixed-citation>
                    </ref>
                                    <ref id="ref6">
                        <label>6</label>
                        <mixed-citation publication-type="journal">Harshita Gupta and Divya Gupta. “LPC and LPCC method of feature extraction in Speech Recognition System”. In: 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence). IEEE, Jan. 2016.</mixed-citation>
                    </ref>
                                    <ref id="ref7">
                        <label>7</label>
                        <mixed-citation publication-type="journal">D.J. Mashao, Y. Gotoh, and H.F. Silverman. “Analysis of LPC/DFT features for an HMM-based alphadigit recognizer”. In: IEEE Signal Processing Letters 3.4 (Apr. 1996), pp. 103–106.</mixed-citation>
                    </ref>
                                    <ref id="ref8">
                        <label>8</label>
                        <mixed-citation publication-type="journal">Y. Lecun et al. “Gradient-based learning applied to document recognition”. In: Proceedings of the IEEE 86.11 (1998), pp. 2278–2324.</mixed-citation>
                    </ref>
                                    <ref id="ref9">
                        <label>9</label>
                        <mixed-citation publication-type="journal">Hamed Habibi Aghdam and Elnaz Jahani Heravi. “Traffic Sign Detection and Recognition”. In: Guide toConvolutional Neural Networks. Springer International Publishing, 2017, pp. 1–14.</mixed-citation>
                    </ref>
                                    <ref id="ref10">
                        <label>10</label>
                        <mixed-citation publication-type="journal">Dan Claudiu Ciresan et al. “Convolutional Neural Network Committees for Handwritten Character Classification”. In: 2011 International Conference on Document Analysis and Recognition. IEEE, Sept. 2011.</mixed-citation>
                    </ref>
                                    <ref id="ref11">
                        <label>11</label>
                        <mixed-citation publication-type="journal">Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. &quot;Imagenet classification with deep convolutional neural networks.&quot; Advances in neural information processing systems. 2012.</mixed-citation>
                    </ref>
                            </ref-list>
                    </back>
    </article>
