EN
TR
Classification of Exon and Intron Regions on DNA Sequences with Hybrid Use of SBERT and ANFIS Approaches
Abstract
DNA is the part of the genome that contains enormous amounts of information related to life. Amino acids are formed by coding three nucleotides in this genome part, and the encoded amino acids are called codes in DNA. The frequency of the triple nucleotide in the DNA sequence allows for the evaluation of protein-coding (exon) and non-protein-coding (intron) regions. Distinguishing these regions enables the analysis of vital functions related to life. This study provides the classification of exon and intron regions for BCR-ABL and MEFV genes obtained from NCBI and Ensemble datasets, respectively. Then, existing DNA sequences are clustered using pretrained models in the scope of the SBERT approach. In the clustering process, K-Means and Agglomerative Clustering approaches are used consecutively. The frequency of repetition of codes is calculated with a representative sample selected from each cluster. The matrix is created using the frequencies of 64 different codons that constitute genetic code. This matrix is given as input to the ANFIS structure. The %88.88 accuracy rate is obtained with the ANFIS approach to classify exon and intron DNA sequences. As a result of this study, a successful result was produced independently of DNA length.
Keywords
References
- [1] Raza K., ‘Fuzzy logic based approaches for gene regulatory network inference’, Artificial Intelligence in Medicine, 97: 189–203, (2019).
- [2] Zheng P., Wang S., Wang X., and Zeng X., ‘Editorial: Artificial Intelligence in Bioinformatics and Drug Repurposing: Methods and Applications’, Frontiers in Genetics, 13: 1–4, (2022).
- [3] Singh N., Nath R., and Singh D.B., ‘Splice-site identification for exon prediction using bidirectional LSTM-RNN approach’, Biochemistry and Biophysics Reports, 30, (2022).
- [4] Kar S. and Ganguly M., ‘Study of effectiveness of FIR and IIR filters in Exon identification: A comparative approach’, Materials Today: Proceedings, 58: 437–444, (2022).
- [5] Barman S., Saha S., Mandal A., and Roy M., ‘Prediction of protein coding regions of a DNA sequence through spectral analysis’, 2012 International Conference on Informatics, Electronics and Vision, ICIEV 2012, 12–16, (2012).
- [6] Das L., Das J. K., and Nanda S., ‘Detection of exon location in eukaryotic DNA using a fuzzy adaptive Gabor wavelet transform’, Genomics, 112: 4406–4416, (2020).
- [7] Das L., Nanda S., and Das J. K., ‘An integrated approach for identification of exon locations using recursive Gauss Newton tuned adaptive Kaiser window’, Genomics, 111: 284–296, (2019).
- [8] Gupta R., Mittal A., Singh K., Bajpai P., and Prakash S., 'A Time Series Approach for Identification of Exons and Introns', 10th International Conference on Information Technology (ICIT 2007), 91–93, (2007).
Details
Primary Language
English
Subjects
Engineering
Journal Section
Research Article
Early Pub Date
March 27, 2024
Publication Date
July 25, 2024
Submission Date
October 12, 2022
Acceptance Date
February 6, 2023
Published in Issue
Year 2024 Volume: 27 Number: 3
APA
Akalın, F., & Yumuşak, N. (2024). Classification of Exon and Intron Regions on DNA Sequences with Hybrid Use of SBERT and ANFIS Approaches. Politeknik Dergisi, 27(3), 1043-1053. https://doi.org/10.2339/politeknik.1187808
AMA
1.Akalın F, Yumuşak N. Classification of Exon and Intron Regions on DNA Sequences with Hybrid Use of SBERT and ANFIS Approaches. Politeknik Dergisi. 2024;27(3):1043-1053. doi:10.2339/politeknik.1187808
Chicago
Akalın, Fatma, and Nejat Yumuşak. 2024. “Classification of Exon and Intron Regions on DNA Sequences With Hybrid Use of SBERT and ANFIS Approaches”. Politeknik Dergisi 27 (3): 1043-53. https://doi.org/10.2339/politeknik.1187808.
EndNote
Akalın F, Yumuşak N (July 1, 2024) Classification of Exon and Intron Regions on DNA Sequences with Hybrid Use of SBERT and ANFIS Approaches. Politeknik Dergisi 27 3 1043–1053.
IEEE
[1]F. Akalın and N. Yumuşak, “Classification of Exon and Intron Regions on DNA Sequences with Hybrid Use of SBERT and ANFIS Approaches”, Politeknik Dergisi, vol. 27, no. 3, pp. 1043–1053, July 2024, doi: 10.2339/politeknik.1187808.
ISNAD
Akalın, Fatma - Yumuşak, Nejat. “Classification of Exon and Intron Regions on DNA Sequences With Hybrid Use of SBERT and ANFIS Approaches”. Politeknik Dergisi 27/3 (July 1, 2024): 1043-1053. https://doi.org/10.2339/politeknik.1187808.
JAMA
1.Akalın F, Yumuşak N. Classification of Exon and Intron Regions on DNA Sequences with Hybrid Use of SBERT and ANFIS Approaches. Politeknik Dergisi. 2024;27:1043–1053.
MLA
Akalın, Fatma, and Nejat Yumuşak. “Classification of Exon and Intron Regions on DNA Sequences With Hybrid Use of SBERT and ANFIS Approaches”. Politeknik Dergisi, vol. 27, no. 3, July 2024, pp. 1043-5, doi:10.2339/politeknik.1187808.
Vancouver
1.Fatma Akalın, Nejat Yumuşak. Classification of Exon and Intron Regions on DNA Sequences with Hybrid Use of SBERT and ANFIS Approaches. Politeknik Dergisi. 2024 Jul. 1;27(3):1043-5. doi:10.2339/politeknik.1187808
Cited By
DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models
Frontiers in Medicine
https://doi.org/10.3389/fmed.2025.1503229