Research Article
BibTex RIS Cite

fCGR yöntemi ile DNA dizi benzerliği için transfer öğrenme modellerinin kullanılması

Year 2025, Volume: 14 Issue: 2, 1 - 1

Abstract

DNA dizilerinin benzerlik analizi, evrimsel ilişkilerin anlaşılması ve genetik mutasyonların belirlenmesi açısından kritik bir konudur. Geleneksel hizalama tabanlı yöntemler yüksek hesaplama maliyetine sahip olduğundan, bu çalışmada hizalamadan bağımsız DNA benzerlik analizi için transfer öğrenme modellerinin uygulanabilirliği incelenmiştir. DNA dizileri, Frequency Chaos Game Representation (fCGR) yöntemiyle görselleştirilmiş ve ResNet50, EfficientNetB0, MobileNet modelleriyle özellik çıkarımı yapılmıştır. Cosine similarity, Euclidean distance ve correlation gibi üç benzerlik metriği ve dört farklı hiyerarşik kümeleme yöntemi karşılaştırılmıştır. Sonuçlar, cosine similarity metriğinin genetik benzerlikleri daha iyi yansıttığını göstermektedir. MobileNet, hafif yapısı ve verimli özellik çıkarımıyla en yüksek doğruluk oranını sunmuştur. PCA ile görselleştirilen özellik vektörleri güçlü kümelenme eğilimleri sergilemiş ve referans filogenetik ağaçlarla uyum göstermiştir. Çalışma, transfer öğrenmenin genetik analizlerde uygulanabilirliğini ortaya koyarak ölçeklenebilir ve biyolojik olarak anlamlı analizler yapılabileceğini göstermektedir.

References

  • Z. D. Stephens et al., Big Data: Astronomical or Genomical?, PLoS Biol, 13, 7, p. e1002195, 2015. https://doi.org/10.1371/JOURNAL.PBIO.1002195.
  • S. B. Needleman and C. D. Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two proteins, J Mol Biol, 48, 3, 443–453, Mar. 1970. https://doi.org/10.1016/0022-2836(70)90057-4.
  • S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, Basic local alignment search tool, J Mol Biol, 215, 3, 403–410, Oct. 1990. https://doi.org/10.1016/S0022-2836(05)80360-2.
  • S. Vinga and J. Almeida, Alignment-free sequence comparison—a review, Bioinformatics, 19, 4, 513–523, Mar. 2003. https://doi.org/10.1093/BIOINFORMATICS/BTG005
  • H. F. Löchel and D. Heider, Chaos game representation and its applications in bioinformatics, Comput Struct Biotechnol J, 19, 6263–6271, Jan. 2021. https://doi.org/10.1016/J.CSBJ.2021.11.008.
  • M. Yousef and J. Allmer, Deep learning in bioinformatics, Turkish Journal of Biology, 47, 6, p. 366, 2023. https://doi.org/10.55730/1300-0152.2671.
  • H. Gunasekaran, K. Ramalakshmi, A. Rex Macedo Arokiaraj, S. D. Kanmani, C. Venkatesan, and C. S. G. Dhas, Analysis of DNA Sequence Classification Using CNN and Hybrid Models, Comput Math Methods Med, 2021, 1, p. 1835056, Jan. 2021. https://doi.org/10.1155/2021/1835056.
  • A. Zielezinski, S. Vinga, J. Almeida, and W. M. Karlowski, Alignment-free sequence comparison: benefits, applications, and tools, Genome Biology 2017 18:1, 18, 1, 1–17, Oct. 2017. https://doi.org/10.1186/S13059-017-1319-7.
  • A. Zielezinski et al., Benchmarking of alignment-free sequence comparison methods, Genome Biol, 20, 1, 1–18, Jul. 2019. https://doi.org/10.1186/S13059-019-1755-7/TABLES/1.
  • R. Rizzo, A. Fiannaca, M. La Rosa, and A. Urso, A deep learning approach to DNA sequence classification, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9874 LNCS, 129–140, 2016. https://doi.org/10.1007/978-3-319-44332-4_10/FIGURES/7.
  • O. Bonham-Carter, J. Steele, and D. Bastola, Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis, Brief Bioinform, 15, 6, 890–905, Nov. 2014. https://doi.org/10.1093/BIB/BBT052.
  • M. Uddin, M. K. Islam, M. R. Hassan, F. Jahan, and J. H. Baek, A fast and efficient algorithm for DNA sequence similarity identification, Complex and Intelligent Systems, 9, 2, 1265–1280, Apr. 2023. https://doi.org/10.1007/S40747-022-00846-Y/TABLES/12.
  • S. Zou, L. Wang, and J. Wang, A 2D graphical representation of the sequences of DNA based on triplets and its application, EURASIP J Bioinform Syst Biol, 2014, 1, 2014. https://doi.org/10.1186/1687-4153-2014-1.
  • N. Jafarzadeh and A. Iranmanesh, C-curve: A novel 3D graphical representation of DNA sequence based on codons, Math Biosci, 241, 2, 217–224, Feb. 2013. https://doi.org/10.1016/J.MBS.2012.11.009.
  • P. Waz and D. Bielińska-Waz, Non-standard similarity/dissimilarity analysis of DNA sequences, Genomics, 104, 6, 464–471, Dec. 2014. https://doi.org/10.1016/J.YGENO.2014.08.010.
  • B. Liao, M. Tan, and K. Ding, A 4D representation of DNA sequences and its application, Chem Phys Lett, 402, 4–6, 380–383, Feb. 2005. https://doi.org/10.1016/J.CPLETT.2004.12.062.
  • B. Liao, R. Li, W. Zhu, and X. Xiang, On the similarity of DNA primary sequences based on 5-D representation, J Math Chem, 42, 1, 47–57, Jul. 2007. https://doi.org/10.1007/S10910-006-9091-Z/METRICS.
  • B. Liao and T. M. Wang, Analysis of similarity/dissimilarity of DNA sequences based on nonoverlapping triplets of nucleotide bases, J Chem Inf Comput Sci, 44, 5, 1666–1670, Sep. 2004. https://doi.org/10.1021/CI034271F/ASSET/IMAGES/LARGE/CI034271FF3.JPEG.
  • E. Delibaş and A. Arslan, DNA sequence similarity analysis using image texture analysis based on first-order statistics, J Mol Graph Model, 99, p. 107603, Sep. 2020. https://doi.org/10.1016/j.jmgm.2020.107603.
  • W. Chen, B. Liao, and W. Li, Use of image texture analysis to find DNA sequence similarities, J Theor Biol, 455, 1–6, Oct. 2018. https://doi.org/10.1016/J.JTBI.2018.07.001.
  • M. Li and P. Vitányi, An introduction to Kolmogorov complexity and its applications, 3. Springer, 2008.
  • H. H. Otu and K. Sayood, A new sequence distance measure for phylogenetic tree construction, Bioinformatics, 19, 16, 2122–2130, Nov. 2003. https://doi.org/10.1093/BIOINFORMATICS/BTG295.
  • E. Delibaş and A. Arslan, A new feature vector model for alignment-free DNA sequence similarity analysis, Sigma Journal of Engineering and Natural Sciences, 40, 3, 610–619, Oct. 2022. https://doi.org/10.14744/sigma.2022.00065.
  • J. P. Bao and R. Y. Yuan, A wavelet-based feature vector model for DNA clustering, Genet Mol Res, 14, 4, 19163–19172, Dec. 2015. https://doi.org/10.4238/2015.DECEMBER.29.26.
  • G. Mendizabal-Ruiz, I. Román-Godínez, S. Torres-Ramos, R. A. Salido-Ruiz, H. Vélez-Pérez, and J. A. Morales, Genomic signal processing for DNA sequence clustering, PeerJ, 2018, 1, p. e4264, Jan. 2018. https://doi.org/10.7717/PEERJ.4264/SUPP-2.
  • S. Dey, P. Ghosh, and S. Das, Positional difference and Frequency (PdF) based alignment-free technique for genome sequence comparison, J Biomol Struct Dyn, Oct. 2023. https://doi.org/10.1080/07391102.2023.2272748.
  • S. Akbari Rokn Abadi, A. Mohammadi, and S. Koohi, A new profiling approach for DNA sequences based on the nucleotides’ physicochemical features for accurate analysis of SARS-CoV-2 genomes, BMC Genomics, 24, 1, Dec. 2023. https://doi.org/10.1186/S12864-023-09373-7.
  • M. K. Ganapathiraju, A. D. Mitchell, M. Thahir, K. Motwani, and S. Ananthasubramanian, Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences, J Bioinform Comput Biol, 10, 6, Dec. 2012. https://doi.org/10.1142/S0219720012500163.
  • H. U. Osmanbeyoglu and M. K. Ganapathiraju, N-gram analysis of 970 microbial organisms reveals presence of biological language models, BMC Bioinformatics, 12, p. 12, Jan. 2011. https://doi.org/10.1186/1471-2105-12-12.
  • M. R. Kantorovitz, G. E. Robinson, and S. Sinha, A statistical method for alignment-free comparison of regulatory sequences, Bioinformatics, 23, 13, i249–i255, Jul. 2007. https://doi.org/10.1093/BIOINFORMATICS/BTM211.
  • K. Song, J. Ren, G. Reinert, M. Deng, M. S. Waterman, and F. Sun, New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing, Brief Bioinform, 15, 3, 343–353, May 2014. https://doi.org/10.1093/BIB/BBT067.
  • H.-H. Huang and C. Yu, Clustering DNA sequences using the out-of-place measure with reduced n-grams, J Theor Biol, 406, 61–72, 2016. https://doi.org/https://doi.org/10.1016/j.jtbi.2016.06.029.
  • M. S. Nawaz, P. Fournier-Viger, M. Aslam, W. Li, Y. He, and X. Niu, Using alignment-free and pattern mining methods for SARS-CoV-2 genome analysis, Applied Intelligence, 53, 19, 21920–21943, Oct. 2023. https://doi.org/10.1007/S10489-023-04618-0/TABLES/13.
  • T. Wang, Z. G. Yu, and J. Li, CGRWDL: alignment-free phylogeny reconstruction method for viruses based on chaos game representation weighted by dynamical language model, Front Microbiol, 15, p. 1339156, Mar. 2024. https://doi.org/10.3389/FMICB.2024.1339156/BIBTEX.
  • B. Morgenstern, J. Söding, C. Bleidorn, A. Sturm, J. de Vries, and F. Manea, Alignment-free Phylogenetic Placement and its Applications, Feb. 2023. https://doi.org/10.53846/GOEDISS-9762.
  • J. S. Almeida, J. A. Carriço, A. Maretzek, P. A. Noble, and M. Fletcher, Analysis of genomic sequences by Chaos Game Representation, Bioinformatics, 17, 5, 429–437, May 2001. https://doi.org/10.1093/BIOINFORMATICS/17.5.429.
  • S. Safoury and W. Hussein, Enriched DNA strands classification using CGR images and convolutional neural network, ACM International Conference Proceeding Series, 87–92, Oct. 2019. https://doi.org/10.1145/3369166.3369176.
  • K. Dick and J. R. Green, Chaos Game Representations Deep Learning for Proteome-Wide Protein Prediction, Proceedings - IEEE 20th International Conference on Bioinformatics and Bioengineering, BIBE 2020, 115–121, Oct. 2020. https://doi.org/10.1109/BIBE50027.2020.00027.
  • R. Rizzo, A. Fiannaca, M. La Rosa, and A. Urso, Classification experiments of DNA sequences by using a deep neural network and chaos game representation, ACM International Conference Proceeding Series, 1164, 222–228, Jun. 2016. https://doi.org/10.1145/2983468.2983489.
  • K. Zheng, Z. H. You, J. Q. Li, L. Wang, Z. H. Guo, and Y. A. Huang, ICDA-CGR: Identification of circRNA-disease associations based on Chaos Game Representation, PLoS Comput Biol, 16, 5, May 2020. https://doi.org/10.1371/JOURNAL.PCBI.1007872.
  • C. Sravani, P. Pavani, G. Y. Vybhavi, G. Ramesh, A. Farman, and L. Venkareswara Reddy, Decoding the Human Genome: Machine Learning Techniques for DNA Sequencing Analysis, E3S Web of Conferences, 430, Oct. 2023. https://doi.org/10.1051/E3SCONF/202343001067.
  • A. Yang, W. Zhang, J. Wang, K. Yang, Y. Han, and L. Zhang, Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA, Front Bioeng Biotechnol, 8, Sep. 2020. https://doi.org/10.3389/FBIOE.2020.01032.
  • B. A. Bredesen and M. Rehmsmeier, DNA sequence models of genome-wide Drosophila melanogaster Polycomb binding sites improve generalization to independent Polycomb Response Elements, Nucleic Acids Res, 47, 15, 7781–7797, Sep. 2019. https://doi.org/10.1093/NAR/GKZ617.
  • S. Das, A. Das, D. K. Bhattacharya, and D. N. Tibarewala, A new graph-theoretic approach to determine the similarity of genome sequences based on nucleotide triplets, Genomics, 112, 6, 4701–4714, Nov. 2020. https://doi.org/10.1016/J.YGENO.2020.08.023.
  • T. Hoang, C. Yin, and S. S. T. Yau, Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison, Genomics, 108, 3–4, 134–142, Oct. 2016. https://doi.org/10.1016/J.YGENO.2016.08.002.
  • T. Hoang, C. Yin, H. Zheng, C. Yu, R. Lucy He, and S. S. T. Yau, A new method to cluster DNA sequences using Fourier power spectrum, J Theor Biol, 372, 135–145, May 2015. https://doi.org/10.1016/J.JTBI.2015.02.026.
  • D. Quan, N. Nguyen, L. Xing, P. Dong, T. Le, and L. Lin, A graph-theoretical approach to DNA similarity analysis, bioRxiv, p. 2021.08.05.455342, Aug. 2021. https://doi.org/10.1101/2021.08.05.455342.
  • X. Jin et al., A novel DNA sequence similarity calculation based on simplified pulse-coupled neural network and Huffman coding, Physica A: Statistical Mechanics and its Applications, 461, 325–338, Nov. 2016. https://doi.org/10.1016/J.PHYSA.2016.05.004.
  • E. Delibaş, A. Arslan, A. Şeker, and B. Diri, A novel alignment-free DNA sequence similarity analysis approach based on top-k n-gram match-up, J Mol Graph Model, 100, p. 107693, Nov. 2020. https://doi.org/10.1016/j.jmgm.2020.107693.
  • R. Dong, L. He, R. L. He, and S. S. T. Yau, A novel approach to clustering genome sequences using inter-nucleotide covariance, Front Pharmacol, 10, FEB, p. 423682, Apr. 2019. https://doi.org/10.3389/FGENE.2019.00234/BIBTEX.
  • H. J. Jeffrey, Chaos game representation of gene structure., Nucleic Acids Res, 18, 8, p. 2163, Apr. 1990. https://doi.org/10.1093/NAR/18.8.2163.
  • F. Zhuang et al., A Comprehensive Survey on Transfer Learning, Proceedings of the IEEE, 109, 1, 43–76, Jan. 2021. https://doi.org/10.1109/JPROC.2020.3004555.
  • S. Eskandari, A. Eslamian, and Q. Cheng, Comparative Analysis of Transfer Learning Models for Breast Cancer Classification, Aug. 2024. https://doi.org/10.1109/AIC61668.2024.10731032.
  • K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016. Accessed: Oct. 16, 2024. [Online]. Available: http://image-net.org/challenges/LSVRC/2015/
  • F. Hong, D. W. L. Tay, and A. Ang, Intelligent Pick-and-Place System Using MobileNet, Electronics 2023, 12, Page 621, 12, 3, p. 621, Jan. 2023. https://doi.org/10.3390/ELECTRONICS12030621.
  • J. Ren et al., Alignment-Free Sequence Analysis and Applications, Annu Rev Biomed Data Sci, 1, p. 93, Jul. 2018. https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013431.
  • K. Tamura, G. Stecher, and S. Kumar, MEGA11: Molecular Evolutionary Genetics Analysis Version 11, Mol Biol Evol, 38, 7, 3022–3027, Jun. 2021. https://doi.org/10.1093/MOLBEV/MSAB120.

Using transfer learning models for DNA sequence similarity via fCGR method

Year 2025, Volume: 14 Issue: 2, 1 - 1

Abstract

Similarity analysis of DNA sequences is a critical issue for understanding evolutionary relationships and identifying genetic mutations. Since traditional alignment-based methods have high computational costs, this study investigated the applicability of transfer learning models for alignment-independent DNA similarity analysis. DNA sequences were visualized with the Frequency Chaos Game Representation (fCGR) method and feature extraction was performed with ResNet50, EfficientNetB0, and MobileNet models. Three similarity metrics such as cosine similarity, Euclidean distance, and correlation and four different hierarchical clustering methods were compared. The results show that cosine similarity metric reflects genetic similarities better. MobileNet provided the highest accuracy rate with its lightweight structure and efficient feature extraction. Feature vectors visualized with PCA exhibited strong clustering tendencies and were in agreement with reference phylogenetic trees. The study demonstrates the applicability of transfer learning in genetic analyses and shows that scalable and biologically meaningful analyses can be performed.

References

  • Z. D. Stephens et al., Big Data: Astronomical or Genomical?, PLoS Biol, 13, 7, p. e1002195, 2015. https://doi.org/10.1371/JOURNAL.PBIO.1002195.
  • S. B. Needleman and C. D. Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two proteins, J Mol Biol, 48, 3, 443–453, Mar. 1970. https://doi.org/10.1016/0022-2836(70)90057-4.
  • S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, Basic local alignment search tool, J Mol Biol, 215, 3, 403–410, Oct. 1990. https://doi.org/10.1016/S0022-2836(05)80360-2.
  • S. Vinga and J. Almeida, Alignment-free sequence comparison—a review, Bioinformatics, 19, 4, 513–523, Mar. 2003. https://doi.org/10.1093/BIOINFORMATICS/BTG005
  • H. F. Löchel and D. Heider, Chaos game representation and its applications in bioinformatics, Comput Struct Biotechnol J, 19, 6263–6271, Jan. 2021. https://doi.org/10.1016/J.CSBJ.2021.11.008.
  • M. Yousef and J. Allmer, Deep learning in bioinformatics, Turkish Journal of Biology, 47, 6, p. 366, 2023. https://doi.org/10.55730/1300-0152.2671.
  • H. Gunasekaran, K. Ramalakshmi, A. Rex Macedo Arokiaraj, S. D. Kanmani, C. Venkatesan, and C. S. G. Dhas, Analysis of DNA Sequence Classification Using CNN and Hybrid Models, Comput Math Methods Med, 2021, 1, p. 1835056, Jan. 2021. https://doi.org/10.1155/2021/1835056.
  • A. Zielezinski, S. Vinga, J. Almeida, and W. M. Karlowski, Alignment-free sequence comparison: benefits, applications, and tools, Genome Biology 2017 18:1, 18, 1, 1–17, Oct. 2017. https://doi.org/10.1186/S13059-017-1319-7.
  • A. Zielezinski et al., Benchmarking of alignment-free sequence comparison methods, Genome Biol, 20, 1, 1–18, Jul. 2019. https://doi.org/10.1186/S13059-019-1755-7/TABLES/1.
  • R. Rizzo, A. Fiannaca, M. La Rosa, and A. Urso, A deep learning approach to DNA sequence classification, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9874 LNCS, 129–140, 2016. https://doi.org/10.1007/978-3-319-44332-4_10/FIGURES/7.
  • O. Bonham-Carter, J. Steele, and D. Bastola, Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis, Brief Bioinform, 15, 6, 890–905, Nov. 2014. https://doi.org/10.1093/BIB/BBT052.
  • M. Uddin, M. K. Islam, M. R. Hassan, F. Jahan, and J. H. Baek, A fast and efficient algorithm for DNA sequence similarity identification, Complex and Intelligent Systems, 9, 2, 1265–1280, Apr. 2023. https://doi.org/10.1007/S40747-022-00846-Y/TABLES/12.
  • S. Zou, L. Wang, and J. Wang, A 2D graphical representation of the sequences of DNA based on triplets and its application, EURASIP J Bioinform Syst Biol, 2014, 1, 2014. https://doi.org/10.1186/1687-4153-2014-1.
  • N. Jafarzadeh and A. Iranmanesh, C-curve: A novel 3D graphical representation of DNA sequence based on codons, Math Biosci, 241, 2, 217–224, Feb. 2013. https://doi.org/10.1016/J.MBS.2012.11.009.
  • P. Waz and D. Bielińska-Waz, Non-standard similarity/dissimilarity analysis of DNA sequences, Genomics, 104, 6, 464–471, Dec. 2014. https://doi.org/10.1016/J.YGENO.2014.08.010.
  • B. Liao, M. Tan, and K. Ding, A 4D representation of DNA sequences and its application, Chem Phys Lett, 402, 4–6, 380–383, Feb. 2005. https://doi.org/10.1016/J.CPLETT.2004.12.062.
  • B. Liao, R. Li, W. Zhu, and X. Xiang, On the similarity of DNA primary sequences based on 5-D representation, J Math Chem, 42, 1, 47–57, Jul. 2007. https://doi.org/10.1007/S10910-006-9091-Z/METRICS.
  • B. Liao and T. M. Wang, Analysis of similarity/dissimilarity of DNA sequences based on nonoverlapping triplets of nucleotide bases, J Chem Inf Comput Sci, 44, 5, 1666–1670, Sep. 2004. https://doi.org/10.1021/CI034271F/ASSET/IMAGES/LARGE/CI034271FF3.JPEG.
  • E. Delibaş and A. Arslan, DNA sequence similarity analysis using image texture analysis based on first-order statistics, J Mol Graph Model, 99, p. 107603, Sep. 2020. https://doi.org/10.1016/j.jmgm.2020.107603.
  • W. Chen, B. Liao, and W. Li, Use of image texture analysis to find DNA sequence similarities, J Theor Biol, 455, 1–6, Oct. 2018. https://doi.org/10.1016/J.JTBI.2018.07.001.
  • M. Li and P. Vitányi, An introduction to Kolmogorov complexity and its applications, 3. Springer, 2008.
  • H. H. Otu and K. Sayood, A new sequence distance measure for phylogenetic tree construction, Bioinformatics, 19, 16, 2122–2130, Nov. 2003. https://doi.org/10.1093/BIOINFORMATICS/BTG295.
  • E. Delibaş and A. Arslan, A new feature vector model for alignment-free DNA sequence similarity analysis, Sigma Journal of Engineering and Natural Sciences, 40, 3, 610–619, Oct. 2022. https://doi.org/10.14744/sigma.2022.00065.
  • J. P. Bao and R. Y. Yuan, A wavelet-based feature vector model for DNA clustering, Genet Mol Res, 14, 4, 19163–19172, Dec. 2015. https://doi.org/10.4238/2015.DECEMBER.29.26.
  • G. Mendizabal-Ruiz, I. Román-Godínez, S. Torres-Ramos, R. A. Salido-Ruiz, H. Vélez-Pérez, and J. A. Morales, Genomic signal processing for DNA sequence clustering, PeerJ, 2018, 1, p. e4264, Jan. 2018. https://doi.org/10.7717/PEERJ.4264/SUPP-2.
  • S. Dey, P. Ghosh, and S. Das, Positional difference and Frequency (PdF) based alignment-free technique for genome sequence comparison, J Biomol Struct Dyn, Oct. 2023. https://doi.org/10.1080/07391102.2023.2272748.
  • S. Akbari Rokn Abadi, A. Mohammadi, and S. Koohi, A new profiling approach for DNA sequences based on the nucleotides’ physicochemical features for accurate analysis of SARS-CoV-2 genomes, BMC Genomics, 24, 1, Dec. 2023. https://doi.org/10.1186/S12864-023-09373-7.
  • M. K. Ganapathiraju, A. D. Mitchell, M. Thahir, K. Motwani, and S. Ananthasubramanian, Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences, J Bioinform Comput Biol, 10, 6, Dec. 2012. https://doi.org/10.1142/S0219720012500163.
  • H. U. Osmanbeyoglu and M. K. Ganapathiraju, N-gram analysis of 970 microbial organisms reveals presence of biological language models, BMC Bioinformatics, 12, p. 12, Jan. 2011. https://doi.org/10.1186/1471-2105-12-12.
  • M. R. Kantorovitz, G. E. Robinson, and S. Sinha, A statistical method for alignment-free comparison of regulatory sequences, Bioinformatics, 23, 13, i249–i255, Jul. 2007. https://doi.org/10.1093/BIOINFORMATICS/BTM211.
  • K. Song, J. Ren, G. Reinert, M. Deng, M. S. Waterman, and F. Sun, New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing, Brief Bioinform, 15, 3, 343–353, May 2014. https://doi.org/10.1093/BIB/BBT067.
  • H.-H. Huang and C. Yu, Clustering DNA sequences using the out-of-place measure with reduced n-grams, J Theor Biol, 406, 61–72, 2016. https://doi.org/https://doi.org/10.1016/j.jtbi.2016.06.029.
  • M. S. Nawaz, P. Fournier-Viger, M. Aslam, W. Li, Y. He, and X. Niu, Using alignment-free and pattern mining methods for SARS-CoV-2 genome analysis, Applied Intelligence, 53, 19, 21920–21943, Oct. 2023. https://doi.org/10.1007/S10489-023-04618-0/TABLES/13.
  • T. Wang, Z. G. Yu, and J. Li, CGRWDL: alignment-free phylogeny reconstruction method for viruses based on chaos game representation weighted by dynamical language model, Front Microbiol, 15, p. 1339156, Mar. 2024. https://doi.org/10.3389/FMICB.2024.1339156/BIBTEX.
  • B. Morgenstern, J. Söding, C. Bleidorn, A. Sturm, J. de Vries, and F. Manea, Alignment-free Phylogenetic Placement and its Applications, Feb. 2023. https://doi.org/10.53846/GOEDISS-9762.
  • J. S. Almeida, J. A. Carriço, A. Maretzek, P. A. Noble, and M. Fletcher, Analysis of genomic sequences by Chaos Game Representation, Bioinformatics, 17, 5, 429–437, May 2001. https://doi.org/10.1093/BIOINFORMATICS/17.5.429.
  • S. Safoury and W. Hussein, Enriched DNA strands classification using CGR images and convolutional neural network, ACM International Conference Proceeding Series, 87–92, Oct. 2019. https://doi.org/10.1145/3369166.3369176.
  • K. Dick and J. R. Green, Chaos Game Representations Deep Learning for Proteome-Wide Protein Prediction, Proceedings - IEEE 20th International Conference on Bioinformatics and Bioengineering, BIBE 2020, 115–121, Oct. 2020. https://doi.org/10.1109/BIBE50027.2020.00027.
  • R. Rizzo, A. Fiannaca, M. La Rosa, and A. Urso, Classification experiments of DNA sequences by using a deep neural network and chaos game representation, ACM International Conference Proceeding Series, 1164, 222–228, Jun. 2016. https://doi.org/10.1145/2983468.2983489.
  • K. Zheng, Z. H. You, J. Q. Li, L. Wang, Z. H. Guo, and Y. A. Huang, ICDA-CGR: Identification of circRNA-disease associations based on Chaos Game Representation, PLoS Comput Biol, 16, 5, May 2020. https://doi.org/10.1371/JOURNAL.PCBI.1007872.
  • C. Sravani, P. Pavani, G. Y. Vybhavi, G. Ramesh, A. Farman, and L. Venkareswara Reddy, Decoding the Human Genome: Machine Learning Techniques for DNA Sequencing Analysis, E3S Web of Conferences, 430, Oct. 2023. https://doi.org/10.1051/E3SCONF/202343001067.
  • A. Yang, W. Zhang, J. Wang, K. Yang, Y. Han, and L. Zhang, Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA, Front Bioeng Biotechnol, 8, Sep. 2020. https://doi.org/10.3389/FBIOE.2020.01032.
  • B. A. Bredesen and M. Rehmsmeier, DNA sequence models of genome-wide Drosophila melanogaster Polycomb binding sites improve generalization to independent Polycomb Response Elements, Nucleic Acids Res, 47, 15, 7781–7797, Sep. 2019. https://doi.org/10.1093/NAR/GKZ617.
  • S. Das, A. Das, D. K. Bhattacharya, and D. N. Tibarewala, A new graph-theoretic approach to determine the similarity of genome sequences based on nucleotide triplets, Genomics, 112, 6, 4701–4714, Nov. 2020. https://doi.org/10.1016/J.YGENO.2020.08.023.
  • T. Hoang, C. Yin, and S. S. T. Yau, Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison, Genomics, 108, 3–4, 134–142, Oct. 2016. https://doi.org/10.1016/J.YGENO.2016.08.002.
  • T. Hoang, C. Yin, H. Zheng, C. Yu, R. Lucy He, and S. S. T. Yau, A new method to cluster DNA sequences using Fourier power spectrum, J Theor Biol, 372, 135–145, May 2015. https://doi.org/10.1016/J.JTBI.2015.02.026.
  • D. Quan, N. Nguyen, L. Xing, P. Dong, T. Le, and L. Lin, A graph-theoretical approach to DNA similarity analysis, bioRxiv, p. 2021.08.05.455342, Aug. 2021. https://doi.org/10.1101/2021.08.05.455342.
  • X. Jin et al., A novel DNA sequence similarity calculation based on simplified pulse-coupled neural network and Huffman coding, Physica A: Statistical Mechanics and its Applications, 461, 325–338, Nov. 2016. https://doi.org/10.1016/J.PHYSA.2016.05.004.
  • E. Delibaş, A. Arslan, A. Şeker, and B. Diri, A novel alignment-free DNA sequence similarity analysis approach based on top-k n-gram match-up, J Mol Graph Model, 100, p. 107693, Nov. 2020. https://doi.org/10.1016/j.jmgm.2020.107693.
  • R. Dong, L. He, R. L. He, and S. S. T. Yau, A novel approach to clustering genome sequences using inter-nucleotide covariance, Front Pharmacol, 10, FEB, p. 423682, Apr. 2019. https://doi.org/10.3389/FGENE.2019.00234/BIBTEX.
  • H. J. Jeffrey, Chaos game representation of gene structure., Nucleic Acids Res, 18, 8, p. 2163, Apr. 1990. https://doi.org/10.1093/NAR/18.8.2163.
  • F. Zhuang et al., A Comprehensive Survey on Transfer Learning, Proceedings of the IEEE, 109, 1, 43–76, Jan. 2021. https://doi.org/10.1109/JPROC.2020.3004555.
  • S. Eskandari, A. Eslamian, and Q. Cheng, Comparative Analysis of Transfer Learning Models for Breast Cancer Classification, Aug. 2024. https://doi.org/10.1109/AIC61668.2024.10731032.
  • K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016. Accessed: Oct. 16, 2024. [Online]. Available: http://image-net.org/challenges/LSVRC/2015/
  • F. Hong, D. W. L. Tay, and A. Ang, Intelligent Pick-and-Place System Using MobileNet, Electronics 2023, 12, Page 621, 12, 3, p. 621, Jan. 2023. https://doi.org/10.3390/ELECTRONICS12030621.
  • J. Ren et al., Alignment-Free Sequence Analysis and Applications, Annu Rev Biomed Data Sci, 1, p. 93, Jul. 2018. https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013431.
  • K. Tamura, G. Stecher, and S. Kumar, MEGA11: Molecular Evolutionary Genetics Analysis Version 11, Mol Biol Evol, 38, 7, 3022–3027, Jun. 2021. https://doi.org/10.1093/MOLBEV/MSAB120.
There are 57 citations in total.

Details

Primary Language English
Subjects Deep Learning, Data Engineering and Data Science
Journal Section Articles
Authors

Emre Delibaş 0000-0001-7564-5020

Early Pub Date March 17, 2025
Publication Date
Submission Date October 29, 2024
Acceptance Date February 5, 2025
Published in Issue Year 2025 Volume: 14 Issue: 2

Cite

APA Delibaş, E. (2025). Using transfer learning models for DNA sequence similarity via fCGR method. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, 14(2), 1-1. https://doi.org/10.28948/ngumuh.1575701
AMA Delibaş E. Using transfer learning models for DNA sequence similarity via fCGR method. NOHU J. Eng. Sci. March 2025;14(2):1-1. doi:10.28948/ngumuh.1575701
Chicago Delibaş, Emre. “Using Transfer Learning Models for DNA Sequence Similarity via FCGR Method”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 14, no. 2 (March 2025): 1-1. https://doi.org/10.28948/ngumuh.1575701.
EndNote Delibaş E (March 1, 2025) Using transfer learning models for DNA sequence similarity via fCGR method. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 14 2 1–1.
IEEE E. Delibaş, “Using transfer learning models for DNA sequence similarity via fCGR method”, NOHU J. Eng. Sci., vol. 14, no. 2, pp. 1–1, 2025, doi: 10.28948/ngumuh.1575701.
ISNAD Delibaş, Emre. “Using Transfer Learning Models for DNA Sequence Similarity via FCGR Method”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 14/2 (March 2025), 1-1. https://doi.org/10.28948/ngumuh.1575701.
JAMA Delibaş E. Using transfer learning models for DNA sequence similarity via fCGR method. NOHU J. Eng. Sci. 2025;14:1–1.
MLA Delibaş, Emre. “Using Transfer Learning Models for DNA Sequence Similarity via FCGR Method”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, vol. 14, no. 2, 2025, pp. 1-1, doi:10.28948/ngumuh.1575701.
Vancouver Delibaş E. Using transfer learning models for DNA sequence similarity via fCGR method. NOHU J. Eng. Sci. 2025;14(2):1-.

download