<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.4 20241031//EN"
        "https://jats.nlm.nih.gov/publishing/1.4/JATS-journalpublishing1-4.dtd">
<article  article-type="research-article"        dtd-version="1.4">
            <front>

                <journal-meta>
                                    <journal-id></journal-id>
            <journal-title-group>
                                                                                    <journal-title>Politeknik Dergisi</journal-title>
            </journal-title-group>
                                        <issn pub-type="epub">2147-9429</issn>
                                                                                            <publisher>
                    <publisher-name>Gazi University</publisher-name>
                </publisher>
                    </journal-meta>
                <article-meta>
                                        <article-id pub-id-type="doi">10.2339/politeknik.1187808</article-id>
                                                                <article-categories>
                                            <subj-group  xml:lang="en">
                                                            <subject>Engineering</subject>
                                                    </subj-group>
                                            <subj-group  xml:lang="tr">
                                                            <subject>Mühendislik</subject>
                                                    </subj-group>
                                    </article-categories>
                                                                                                                                                        <title-group>
                                                                                                                        <article-title>Classification of Exon and Intron Regions on DNA Sequences with Hybrid Use of SBERT and ANFIS Approaches</article-title>
                                                                                                                                                                                                <trans-title-group xml:lang="tr">
                                    <trans-title>ANFIS ve SBERT Yaklaşımlarının Hibrit Kullanımı ile DNA Dizilimleri Üzerinde Ekson ve İntron Bölgelerinin Sınıflandırılması</trans-title>
                                </trans-title-group>
                                                                                                    </title-group>
            
                                                    <contrib-group content-type="authors">
                                                                        <contrib contrib-type="author">
                                                                    <contrib-id contrib-id-type="orcid">
                                        https://orcid.org/0000-0001-6670-915X</contrib-id>
                                                                <name>
                                    <surname>Akalın</surname>
                                    <given-names>Fatma</given-names>
                                </name>
                                                                    <aff>SAKARYA ÜNİVERSİTESİ</aff>
                                                            </contrib>
                                                    <contrib contrib-type="author">
                                                                    <contrib-id contrib-id-type="orcid">
                                        https://orcid.org/0000-0001-5005-8604</contrib-id>
                                                                <name>
                                    <surname>Yumuşak</surname>
                                    <given-names>Nejat</given-names>
                                </name>
                                                                    <aff>SAKARYA ÜNİVERSİTESİ</aff>
                                                            </contrib>
                                                                                </contrib-group>
                        
                                        <pub-date pub-type="pub" iso-8601-date="20240725">
                    <day>07</day>
                    <month>25</month>
                    <year>2024</year>
                </pub-date>
                                        <volume>27</volume>
                                        <issue>3</issue>
                                        <fpage>1043</fpage>
                                        <lpage>1053</lpage>
                        
                        <history>
                                    <date date-type="received" iso-8601-date="20221012">
                        <day>10</day>
                        <month>12</month>
                        <year>2022</year>
                    </date>
                                                    <date date-type="accepted" iso-8601-date="20230206">
                        <day>02</day>
                        <month>06</month>
                        <year>2023</year>
                    </date>
                            </history>
                                        <permissions>
                    <copyright-statement>Copyright © 1998, Journal of Polytechnic</copyright-statement>
                    <copyright-year>1998</copyright-year>
                    <copyright-holder>Journal of Polytechnic</copyright-holder>
                </permissions>
            
                                                                                                <abstract><p>DNA is the part of the genome that contains enormous amounts of information related to life. Amino acids are formed by coding three nucleotides in this genome part, and the encoded amino acids are called codes in DNA. The frequency of the triple nucleotide in the DNA sequence allows for the evaluation of protein-coding (exon) and non-protein-coding (intron) regions. Distinguishing these regions enables the analysis of vital functions related to life. This study provides the classification of exon and intron regions for BCR-ABL and MEFV genes obtained from NCBI and Ensemble datasets, respectively. Then, existing DNA sequences are clustered using pretrained models in the scope of the SBERT approach. In the clustering process, K-Means and Agglomerative Clustering approaches are used consecutively. The frequency of repetition of codes is calculated with a representative sample selected from each cluster. The matrix is created using the frequencies of 64 different codons that constitute genetic code. This matrix is given as input to the ANFIS structure. The %88.88 accuracy rate is obtained with the ANFIS approach to classify exon and intron DNA sequences. As a result of this study, a successful result was produced independently of DNA length.</p></abstract>
                                                                                                                                    <trans-abstract xml:lang="tr">
                            <p>DNA, canlılığa ilişkin devasa bilgi barındıran genom parçasıdır. Bu genom parçasındaki üç nükleotidin kodlanması ile aminoasitler oluşur ve kodlanan aminoasitler DNA’da kod olarak isimlendirilir. DNA dizilimindeki üçlü nükleotidin frekansı, protein kodlayan(ekson) ve protein kodlamayan(intron) bölgelere ilişkin analiz imkanı sağlar. Bu bölgelerin ayırt edilmesi yaşama ilişkin hayati fonksiyonların değerlendirilmesini mümkün kılar. Bu çalışma sırasıyla NCBI ve Ensemble veri setlerinden elde edilen BCR-ABL ve MEFV genleri için ekson ve intron bölgelerinin sınıflandırılmasını sağlamıştır. Ardından SBERT yaklaşımı kapsamında önceden eğitilmiş modeller ile mevcut DNA dizilimleri kümelenmiştir. Kümeleme sürecinde K-Means ve Agglomerative Kümeleme yaklaşımları art arda kullanılmıştır. Her bir kümeden seçilen temsili bir örnek ile kodonların tekrarlanma sıklığı hesaplanmıştır. Genetik kodun oluşmasını sağlayan 64 farklı kodonların frekansı kullanılarak matris oluşturulmuştur. Bu matris ANFIS yapısına girdi olarak verilmiştir. ANFIS yaklaşımı ile ekson ve intron bölgelerinin sınıflandırılmasında %88.88 doğruluk oranı elde edilmiştir. Bu çalışmanın sonucunda DNA uzunluğundan bağımsız başarılı bir sonuç üretilmiştir.</p></trans-abstract>
                                                            
            
                                                            <kwd-group>
                                                    <kwd>DNA sequences</kwd>
                                                    <kwd>  exon and intron regions</kwd>
                                                    <kwd>  K-Means and Agglomerative Clustering</kwd>
                                                    <kwd>  SBERT</kwd>
                                                    <kwd>  ANFIS</kwd>
                                            </kwd-group>
                                                        
                                                                            <kwd-group xml:lang="tr">
                                                    <kwd>DNA sequences</kwd>
                                                    <kwd>  exon and intron regions</kwd>
                                                    <kwd>  K-Means and Agglomerative Clustering</kwd>
                                                    <kwd>  SBERT</kwd>
                                                    <kwd>  ANFIS</kwd>
                                            </kwd-group>
                                                                                                            </article-meta>
    </front>
    <back>
                            <ref-list>
                                    <ref id="ref1">
                        <label>1</label>
                        <mixed-citation publication-type="journal">[1]	Raza K., ‘Fuzzy logic based approaches for gene regulatory network inference’, Artificial Intelligence in Medicine, 97: 189–203, (2019).</mixed-citation>
                    </ref>
                                    <ref id="ref2">
                        <label>2</label>
                        <mixed-citation publication-type="journal">[2]  Zheng P., Wang S., Wang X., and Zeng X., ‘Editorial: Artificial Intelligence in Bioinformatics and Drug Repurposing: Methods and Applications’, Frontiers in Genetics, 13: 1–4, (2022).</mixed-citation>
                    </ref>
                                    <ref id="ref3">
                        <label>3</label>
                        <mixed-citation publication-type="journal">[3]	Singh N., Nath R., and Singh D.B., ‘Splice-site identification for exon prediction using bidirectional LSTM-RNN approach’, Biochemistry and Biophysics Reports, 30, (2022).</mixed-citation>
                    </ref>
                                    <ref id="ref4">
                        <label>4</label>
                        <mixed-citation publication-type="journal">[4]    Kar S. and Ganguly M., ‘Study of effectiveness of FIR and IIR filters in Exon identification: A comparative approach’, Materials Today: Proceedings, 58: 437–444, (2022).</mixed-citation>
                    </ref>
                                    <ref id="ref5">
                        <label>5</label>
                        <mixed-citation publication-type="journal">[5]    Barman S., Saha S., Mandal A., and Roy M., ‘Prediction of protein coding regions of a DNA sequence through spectral analysis’, 2012 International Conference on Informatics, Electronics and Vision, ICIEV 2012, 12–16, (2012).</mixed-citation>
                    </ref>
                                    <ref id="ref6">
                        <label>6</label>
                        <mixed-citation publication-type="journal">[6]   Das L., Das J. K., and Nanda S., ‘Detection of exon location in eukaryotic DNA using a fuzzy adaptive Gabor wavelet transform’, Genomics, 112: 4406–4416, (2020).</mixed-citation>
                    </ref>
                                    <ref id="ref7">
                        <label>7</label>
                        <mixed-citation publication-type="journal">[7]     Das L., Nanda S., and Das J. K., ‘An integrated approach for identification of exon locations using recursive Gauss Newton tuned adaptive Kaiser window’, Genomics, 111: 284–296, (2019).</mixed-citation>
                    </ref>
                                    <ref id="ref8">
                        <label>8</label>
                        <mixed-citation publication-type="journal">[8]    Gupta R., Mittal A., Singh K., Bajpai P., and Prakash S., &#039;A Time Series Approach for Identification of Exons and Introns&#039;, 10th International Conference on Information Technology (ICIT 2007), 91–93, (2007).</mixed-citation>
                    </ref>
                                    <ref id="ref9">
                        <label>9</label>
                        <mixed-citation publication-type="journal">[9]    Das B. and Türkoglu I., ‘Sayisal haritalama teknikleri ve Fourier dönüsümü kullanılarak DNA dizilimlerinin sınıflandirilmasi’, Journal of the Faculty of Engineering and Architecture of Gazi University, 31(4): 921–932, (2016).</mixed-citation>
                    </ref>
                                    <ref id="ref10">
                        <label>10</label>
                        <mixed-citation publication-type="journal">[10]  Hota M. K. and Srivastava V. K., ‘Performance analysis of different DNA to numerical mapping techniques for identification of protein coding regions using tapered window based short-time discrete Fourier transform’, ICPCES 2010 - International Conference on Power, Control and Embedded Systems, (2010).</mixed-citation>
                    </ref>
                                    <ref id="ref11">
                        <label>11</label>
                        <mixed-citation publication-type="journal">[11]  Dessouky A. M., Taha T. E., Dessouky M. M., Eltholth A. A., Hassan E., and Abd El-Samie F. E., ‘Non-parametric spectral estimation techniques for DNA sequence analysis and exon region prediction’, Computers and Electrical Engineering, 73: 334–348, (2019).</mixed-citation>
                    </ref>
                                    <ref id="ref12">
                        <label>12</label>
                        <mixed-citation publication-type="journal">[12]  Roy M. and Barman S., ‘Spectral analysis of coding and non-coding regions of a DNA sequence by Parametric method’, Proceedings of the 2010 Annual IEEE India Conference: Green Energy, Computing and Communication, INDICON 2010, 7–10, (2010).</mixed-citation>
                    </ref>
                                    <ref id="ref13">
                        <label>13</label>
                        <mixed-citation publication-type="journal">[13] Singh A. K. and Srivastava V. K., ‘The three base periodicity of protein coding sequences and its application in exon prediction’, 2020 7th International Conference on Signal Processing and Integrated Networks, SPIN 2020, 64: 1089–1094, (2020).</mixed-citation>
                    </ref>
                                    <ref id="ref14">
                        <label>14</label>
                        <mixed-citation publication-type="journal">[14]  Akalın F. and Yumuşak N., ‘DNA genom dizilimi üzerinde dijital sinyal işleme teknikleri kullanılarak elde edilen ekson ve intron bölgelerinin EfficientNetB7 mimarisi ile sınıflandırılması’, Journal of the Faculty of Engineering and Architecture of Gazi University, 37(3): 1355–1371, (2022).</mixed-citation>
                    </ref>
                                    <ref id="ref15">
                        <label>15</label>
                        <mixed-citation publication-type="journal">[15]   Gunasekaran H., Ramalakshmi K., Rex Macedo Arokiaraj A., Kanmani S. D., Venkatesan C., and Dhas C. S. G., ‘Analysis of DNA Sequence Classification Using CNN and Hybrid Models’, Computational and Mathematical Methods in Medicine, (2021).</mixed-citation>
                    </ref>
                                    <ref id="ref16">
                        <label>16</label>
                        <mixed-citation publication-type="journal">[16]  Abass Y.A., Adeshina S.A., Agwu N.N., Boukar M.M., Department of Computer Science, ‘Analysis of Prostate Cancer DNA Sequences Using Bi-Directional Long Short Term Memory Model’, 2021 16th International Conference on Electronics Computer and Computation (ICECCO), 21–26, 2021.</mixed-citation>
                    </ref>
                                    <ref id="ref17">
                        <label>17</label>
                        <mixed-citation publication-type="journal">[17] Canatalay P. J. and Ucan O. N., ‘A Bidirectional LSTM-RNN and GRU Method to Exon Prediction Using Splice-Site Mapping’, Applied Sciences, 12(9), (2022).</mixed-citation>
                    </ref>
                                    <ref id="ref18">
                        <label>18</label>
                        <mixed-citation publication-type="journal">[18]  Nasr F.B., Oueslati A. E., ‘CNN for human exons and introns classification’, 2021 18th International Multi-Conference on Systems, Signals &amp; Devices, 249–254, (2021).</mixed-citation>
                    </ref>
                                    <ref id="ref19">
                        <label>19</label>
                        <mixed-citation publication-type="journal">[19] Chakraborty S. and Gupta V., DWT based cancer identification using EIIP, Proceedings - 2016 2nd International Conference on Computational Intelligence and Communication Technology, CICT 2016, 718–723, (2016).</mixed-citation>
                    </ref>
                                    <ref id="ref20">
                        <label>20</label>
                        <mixed-citation publication-type="journal">[20]  Marhon S. A. and Kremer S. C., ‘Protein coding region prediction based on the adaptive representation method’, Canadian Conference on Electrical and Computer Engineering, 000415–000418, (2011).</mixed-citation>
                    </ref>
                                    <ref id="ref21">
                        <label>21</label>
                        <mixed-citation publication-type="journal">[21] Li  J. et al., ‘Integrated entropy-based approach for analyzing exons and introns in DNA sequences’, BMC Bioinformatics, 20, (2019).</mixed-citation>
                    </ref>
                                    <ref id="ref22">
                        <label>22</label>
                        <mixed-citation publication-type="journal">[22]  https://www.ncbi.nlm.nih.gov/,‘NCBI’.</mixed-citation>
                    </ref>
                                    <ref id="ref23">
                        <label>23</label>
                        <mixed-citation publication-type="journal">[23]https://www.ensembl.org/Homo_sapiens/Gene/Sequence?db=core;g=ENSG00000103313;r=16:3242027-3256633, ‘Ensemble’.</mixed-citation>
                    </ref>
                                    <ref id="ref24">
                        <label>24</label>
                        <mixed-citation publication-type="journal">[24]  Wang T., Shi H., Liu W., and Yan X., ‘A joint FrameNet and element focusing Sentence-BERT method of sentence similarity computation’, Expert Systems with Applications, 200, (2022).</mixed-citation>
                    </ref>
                                    <ref id="ref25">
                        <label>25</label>
                        <mixed-citation publication-type="journal">[25]  Devlin J., Chang M. W., Lee K., and Toutanova K., ‘BERT: Pre-training of deep bidirectional transformers for language understanding’, NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 4171–4186, (2019).</mixed-citation>
                    </ref>
                                    <ref id="ref26">
                        <label>26</label>
                        <mixed-citation publication-type="journal">[26]  Santander-Cruz Y, et al., ‘Semantic Feature Extraction Using SBERT for Dementia Detection’ brain sciences, (2022).</mixed-citation>
                    </ref>
                                    <ref id="ref27">
                        <label>27</label>
                        <mixed-citation publication-type="journal">[27]  Reimers N.  and Gurevych I., ‘Sentence-BERT: Sentence embeddings using siamese BERT-networks’, arXiv, 3982–3992, (2019).</mixed-citation>
                    </ref>
                                    <ref id="ref28">
                        <label>28</label>
                        <mixed-citation publication-type="journal">[28]  Mahdevari S. and Khodabakhshi M. B., ‘A hybrid PSO-ANFIS model for predicting unstable zones in underground roadways’, Tunnelling and Underground Space Technology incorporating Trenchless Technology Research, 117, (2021).</mixed-citation>
                    </ref>
                                    <ref id="ref29">
                        <label>29</label>
                        <mixed-citation publication-type="journal">[29] Karaboga D. and Kaya E., ‘Estimation of number of foreign visitors with ANFIS by using ABC algorithm’, Soft Computing, 24:7579–7591, (2020).</mixed-citation>
                    </ref>
                                    <ref id="ref30">
                        <label>30</label>
                        <mixed-citation publication-type="journal">[30]https://www.sbert.net/examples/applications/clustering/README.html, ‘SBERT-Clustering’</mixed-citation>
                    </ref>
                                    <ref id="ref31">
                        <label>31</label>
                        <mixed-citation publication-type="journal">[31] https://www.sbert.net/docs/pretrained_models.html, ‘SBERT-Pretrained Models’</mixed-citation>
                    </ref>
                                    <ref id="ref32">
                        <label>32</label>
                        <mixed-citation publication-type="journal">[32]  Bihter DAŞ, ‘DNA dizilimlerinden hastalik tanilanmasi için işaret işleme temelli yeni yaklaşımların geliştirilmesi’, Fırat Üniversitesi Fen Bilimleri Enstitüsü Yazılım Mühendisliği Anabilim Dalı, Doktora Tezi, (2018).</mixed-citation>
                    </ref>
                                    <ref id="ref33">
                        <label>33</label>
                        <mixed-citation publication-type="journal">[33]  Sak H., Senior A, and Beaufays F., ‘Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition’, arXiv, (2014), [Online]. Available: http://arxiv.org/abs/1402.1128.</mixed-citation>
                    </ref>
                                    <ref id="ref34">
                        <label>34</label>
                        <mixed-citation publication-type="journal">[34]  Precup R. E., Bojan-Dragos C. A., Hedrea E. L., Roman R. C., and Petriu E. M., ‘Evolving Fuzzy Models of Shape Memory Alloy Wire Actuators’, Romanian Journal of Information Science and Technology, 24(4): 353–365, (2021).</mixed-citation>
                    </ref>
                                    <ref id="ref35">
                        <label>35</label>
                        <mixed-citation publication-type="journal">[35]  Mishra P. and Bhoi N., ‘Cancer gene recognition from microarray data with manta ray based enhanced ANFIS technique’, Biocybernetics and Biomedical Engineering, 41(3): 916–932, (2021).</mixed-citation>
                    </ref>
                                    <ref id="ref36">
                        <label>36</label>
                        <mixed-citation publication-type="journal">[36]  Akalın F., and Yumuşak N., ‘Lösemi hastalığının temel türlerinden ALL ve KML malignitelerinin graf sinir ağları ve bulanık mantık algoritması ile sınıflandırılması’, Journal of the Faculty of Engineering and Architecture of Gazi University, 38(2): 707–719, 2023.</mixed-citation>
                    </ref>
                                    <ref id="ref37">
                        <label>37</label>
                        <mixed-citation publication-type="journal">[37]  Zhu M.  and Lai Y., ‘Improvements Achieved by Multiple Imputation for Single-Cell RNA-Seq Data in Clustering Analysis and Differential Expression Analysis’, Journal of Computational Biology, 29(7): 634–649, (2022).</mixed-citation>
                    </ref>
                                    <ref id="ref38">
                        <label>38</label>
                        <mixed-citation publication-type="journal">[38] Radpour V. and Soleimanian Gharehchopogh F., ‘A Novel Hybrid Binary Farmland Fertility Algorithm with Naïve Bayes for Diagnosis of Heart Disease’, Sakarya University Journal of Computer and Information Sciences, 5(1), 2022.</mixed-citation>
                    </ref>
                                    <ref id="ref39">
                        <label>39</label>
                        <mixed-citation publication-type="journal">[39]  Ibrahim M. H., ‘WBBA-KM: A Hybrid Weight-Based Bat Algorithm with K-Means Algorithm For Cluster Analysis’, Journal of Polytechnic, 25(1): 65–73, 2022.</mixed-citation>
                    </ref>
                                    <ref id="ref40">
                        <label>40</label>
                        <mixed-citation publication-type="journal">[40]  M. E. BAYRAKDAR and A. ÇALHAN, ‘Optimization of Ant Colony for Next Generation Wireless Cognitive Networks’, Journal of Polytechnic, 24(3): 779–784, 2021.</mixed-citation>
                    </ref>
                                    <ref id="ref41">
                        <label>41</label>
                        <mixed-citation publication-type="journal">[41]  Garip Z., Çimen M. E., and Boz A. F., ‘Fotovoltaik Modellerin Parametre Çıkarımı İçin Geliştirilmiş Bir Kaotik Tabanlı Balina Optimizasyon Algoritması’, Journal of Polytechnic, 25(3): 1041–1054, 2022.</mixed-citation>
                    </ref>
                                    <ref id="ref42">
                        <label>42</label>
                        <mixed-citation publication-type="journal">[42]  Alghobiri M., Mohiuddin K., Khaleel M. A., Islam M., Shahwar S., and Nasr O., ‘A Novel Approach of Clustering Documents: Minimizing Computational Complexities in Accessing Database Systems’, International Arab Journal of Information Technology, 19(4), 617–628, (2022).</mixed-citation>
                    </ref>
                                    <ref id="ref43">
                        <label>43</label>
                        <mixed-citation publication-type="journal">[43]  Konar M., ‘Redesign of morphing UAV’s winglet using DS algorithm based ANFIS model’, Aircraft Engineering and Aerospace Technology, 91(9): 1214–1222, (2019).</mixed-citation>
                    </ref>
                            </ref-list>
                    </back>
    </article>
