TY - JOUR T1 - Classifying RNA Strands with A Novel Graph Representation Based on the Sequence Free Energy AU - Algül, Enes PY - 2023 DA - June DO - 10.46810/tdfd.1240075 JF - Türk Doğa ve Fen Dergisi JO - TJNS PB - Bingol University WT - DergiPark SN - 2149-6366 SP - 32 EP - 39 VL - 12 IS - 2 LA - en AB - ABSTRACT Ribonucleic acids (RNA) are macromolecules in all living cell, and they are mediators between DNA and protein. Structurally, RNAs are more similar to the DNA. In this paper, we introduce a compact graph representation utilizing the Minimum Free Energy (MFE) of RNA molecules' secondary structure. This representation represents structural components of secondary RNAs as edges of the graphs, and MFE of these components represents their edge weights. The labeling process is used to determine these weights by considering both the MFE of the 2D RNA structures, and the specific settings in the RNA structures. This encoding is used to make the representation more compact by giving a unique graph representation for the secondary structural elements in the graph. Armed with the representation, we apply graph-based algorithms to categorize RNA molecules. We also present the result of the cutting-edge graph-based methods (All Paths Cycle Embeddings (APC), Shortest Paths Kernel/Embedding (SP), and Weisfeiler - Lehman and Optimal Assignment Kernel (WLOA)) on our dataset [1] using this new graph representation. Finally, we compare the results of the graph-based algorithms to a standard bioinformatics algorithm (Needleman-Wunsch) used for DNA and RNA comparison. KW - Graph representation KW - RNA KW - Graph Kernel KW - Machine Learning CR - E. Algul and R. C. Wilson, “A database and evaluation for classification of rna molecules using graph methods,” in Graph-Based Representations in Pattern Recognition, D. Conte, J.-Y. Ramel, and P. Foggia, Eds. Cham: Springer International Publishing, 2019, pp. 78–87. CR - D. Bechhofer and M. Deutscher, “Bacterial ribonucleases and their roles in rna metabolism,” Critical Reviews in Biochemistry and Molecular Biology, vol. 54, pp. 242–300, 05 2019. CR - “3dna: a suite of software programs for the analysis, rebuilding and visualization of 3-dimensional nucleic acid structures,” x3dna.org. [Online]. Available: http://x3dna.org/ CR - M. S. WATERMAN, “Secondary structure of singlestranded nucleic acids,” Studies in Foundations and Combinatorics Advances in Mathematics Supplementary Studies, vol. 1, pp. 167–212, 1978. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.15.4425rep=rep1type=pdf CR - D. Fera, N. Kim, N. Shiffeldrim, J. Zorn, U. Laserson, H. H. Gan, and T. Schlick, “Rag: Rna-as-graphs web resource,” BMC Bioinformatic, vol. 5, 07 2004. [Online]. Available: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471- 2105-5-88 CR - D. Knisley, J. Knisley, C. Ross, and A. Rockney, “Classifying multigraph models of secondary rna structure using graph-theoretic descriptors,” ISRN Bioinformatics, International Scholarly Research Network, 11 2012. [Online]. Available: https://doi.org/10.5402/2012/157135 CR - J. Huang, K. Li, and M. Gribskov, “Accurate classification of rna structures using topological fingerprints,” PLOS ONE, vol. 11, no. 10, pp. 1–19, 10 2016. [Online]. Available: https://doi.org/10.1371/journal.pone.0164726 CR - R. C. Wilson and E. Algul, “Categorization of rna molecules using graph methods,” in Structural, Syntactic, and Statistical Pattern Recognition, X. Bai, E. R. Hancock, T. K. Ho, R. C. Wilson, B. Biggio, and A. Robles-Kelly, Eds. Cham: Springer International Publishing, 2018, pp. 439–448. CR - S. V. N. Vishwanathan, N. N. Schraudolph, R. Kondor, and K. M. Borgwardt, “Graph kernels,” Journal of Machine Learning Research, vol. 11, pp. 1201–1242, 2010. CR - G. M. Blackburn, M. J. Gait, D. Loakes, D. M. Williams, J. A. Grasby, M. Egli, A. Flavell, S. Allen, J. Fisher, A. M. Pyle, et al., Nucleic acids in chemistry and biology. Royal Society of Chemistry, 2006. CR - M. Zuker, “Mfold web server for nucleic acid folding and hybridization prediction,” Nucleic Acids Research, vol. 31, no. 13, pp. 3406–3415, 07 2003. [Online]. Available: https://doi.org/10.1093/nar/gkg595 CR - H. Jabbari, I. Wark, and C. Montemagno, “Rna secondary structure prediction with pseudoknots: Contribution of algorithm versus energy model,” PLOS ONE, vol. 13, pp. 1–21, 04 2018. CR - Y. Wu, B. Shi, X. Ding, T. Liu, X. Hu, K. Y. Yip, Z. R. Yang, D. H. Mathews, and Z. J. Lu, “Improved prediction of RNA secondary structure by integrating the free energy model with restraints derived from experimental probing data,” Nucleic Acids Research, vol. 43, pp. 7247–7259, 07 2015. CR - K. Doshi, J. Cannone, C. Cobaugh, and R. Gutell, “Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for rna secondary structure prediction,” BMC bioinformatics, vol. 5, p. 105, 09 2004. CR - M. Zuker and P. Stiegler, “Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information,” Nucleic Acids Research, vol. 9, no. 1, pp. 133–148, 01 1981. [Online]. Available: https://doi.org/10.1093/nar/9.1.133 CR - I. L. Hofacker, “Vienna RNA secondary structure server,” Nucleic Acids Research, vol. 31, no. 13, pp. 3429–3431, 07 2003. [Online]. Available: https://doi.org/10.1093/nar/gkg599 CR - L. Wang, Y. Liu, X. Zhong, H. Liu, C. Lu, C. Li, and H. Zhang, “Dmfold: A novel method to predict rna secondary structure with pseudoknots based on deep learning and improved base pair maximization principle,” Frontiers in Genetics, vol. 10, p. 143, 2019. CR - P. S. Klosterman, M. Tamura, S. R. Holbrook, and S. E. Brenner, “SCOR: a Structural Classification of RNA database,” Nucleic Acids Research, vol. 30, pp. 392–394, 01 2002. CR - X. Lu and W. K. Olson, “3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures,” Nucleic Acids Research, vol. 31, pp. 5108–5121, 09 2003. CR - F. Vendeix, A. Munoz, and P. Agris, “Free energy calculation of modified base-pair formation in explicit solvent: A predictive model,” RNA (New York, N.Y.), vol. 15, pp. 2278–87, 10 2009. CR - I. TINOCO, O. C. UHLENBECK, and M. D. LEVINE, “Estimation of Secondary Structure in Ribonucleic Acids,” Nature, vol. 230, pp. 362– 367, 04 1971. [Online]. Available: https://doi.org/10.1038/230362a0 CR - N. Nicolo, “Learning with kernels on graphs: Dag-based kernels, data streams and rna function prediction,” Alma Mater Studiorum-Universita di Bologna ´, 2014. [Online]. Available: https://pdfs.semanticscholar.org/313b/7d182e81e021faed1cf650f480fdeaeeb3d6.pdf CR - G. K. D. de Vries, “A fast approximation of the weisfeiler-lehman graph kernel for rdf data,” in Machine Learning and Knowledge Discovery in Databases, H. Blockeel, K. Kersting, S. Nijssen, and F. Zelezn ˇ y, Eds. ´ Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 606–621. CR - N. M. Kriege, P.-L. Giscard, and R. C. Wilson, “On valid optimal assignment kernels and applications to graph classification,” in Advances in Neural Information Processing Systems, 2016, pp. 1615–1623. CR - N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt, “Weisfeiler-lehman graph kernels,” Journal of Machine Learning Research, vol. 12, pp. 2539–2561, 2011. [Online]. Available: http://dl.acm.org/citation.cfm?id=2078187 CR - K. M. Borgwardt and H. Kriegel, “Shortest-path kernels on graphs,” in Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), 27-30 November 2005, Houston, Texas, USA, 2005, pp. 74–81. [Online]. Available: http://dx.doi.org/10.1109/ICDM.2005.132 CR - P.-L. Giscard and R. C. Wilson, “The all-paths and cycles graph kernel,” arXiv preprint arXiv:1708.01410, 2017. CR - S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” Journal of Molecular Biology, vol. 43, no. 3, pp. 443–453, 1970. CR - Schmidt, Marco F. "DNA: Blueprint of the Proteins." Chemical Biology: and Drug Discovery. Berlin, Heidelberg: Springer Berlin Heidelberg, 2022. 33-47. CR - Ou, Xiujuan, et al. "Advances in RNA 3D Structure Prediction." Journal of Chemical Information and Modeling 62.23 (2022): 5862-5874. CR - Schulz, Till Hendrik, et al. "A generalized weisfeiler-lehman graph kernel." Machine Learning 111.7 (2022): 2601-2629. CR - Salim, Asif, S. S. Shiju, and S. Sumitra. "Graph kernels based on optimal node assignment." Knowledge-Based Systems 244 (2022): 108519. UR - https://doi.org/10.46810/tdfd.1240075 L1 - https://dergipark.org.tr/en/download/article-file/2907807 ER -