An alignment-free method for bulk comparison of protein sequences from different species
Abstract
The available number
of protein sequences rapidly increased with the development of new sequencing
techniques. This in turn led to an urgent need for the development of new
computational methods utilizing these data for the solution of different
biological problems. One of these problems is the comparison of protein
sequences from different species to reveal their evolutional relationship.
Recently, several alignment-free methods proposed for this purpose. Here in
this study, we also proposed an alignment-free method for the same purpose.
Different from the existing methods, the proposed method not only allows for a
pairwise comparison of two protein sequences, but also it allows for a bulk
comparison of multiple protein sequences simultaneously. Computational results
performed on gold-standard datasets showed that, bulk comparison of multiple
sequences is much faster than its pairwise counterpart and the proposed method
achieves a performance which is quite competitive with the state-of-the-art
alignment-based method, ClustalW.0000-0003-4810-1970
Keywords
References
- Z. Jiang and Z. Yanhong, "Using bioinformatics for drug target identification from the genome." American Journal of Pharmacogenomics 5.6 (2005): 387-396.
- M.S. Waterman, "Identification of common molecular subsequence." Mol. Biol 147 (1981): 195-197.
- S. F. Altschul, et al., "Basic local alignment search tool." Journal of molecular biology 215.3 (1990): 403-410.
- J. Yang and L. Zhang, "Run probabilities of seed-like patterns and identifying good transition seeds." Journal of Computational Biology 15.10 (2008): 1295-1313.
- A. Chakraborty and B. Sanghamitra, "FOGSAA: Fast optimal global sequence alignment algorithm." Scientific reports 3 (2013): 1746.
- O. Gotoh, "An improved algorithm for matching biological sequences." Journal of molecular biology 162.3 (1982): 705-708.
- X. Liu, et al., "Number of distinct sequence alignments with k-match and match sections." Computers in biology and medicine 63 (2015): 287-292.
- C. Li, et al., "Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation." Combinatorial chemistry & high throughput screening 21.2 (2018): 100-110.
Details
Primary Language
English
Subjects
Electrical Engineering
Journal Section
Research Article
Authors
Berat Dogan
*
0000-0003-4810-1970
Türkiye
Publication Date
October 30, 2019
Submission Date
March 16, 2019
Acceptance Date
September 23, 2019
Published in Issue
Year 2019 Volume: 7 Number: 4
