THREE MSA TOOLS ANALYSIS in DNA and PROTEIN DATASETS

Fırat Aşır; Tuğcan Korak; Özgür Öztürk

doi:10.51477/mejs.983750

Research Article

THREE MSA TOOLS ANALYSIS in DNA and PROTEIN DATASETS

Year 2021, Volume: 7 Issue: 2, 89 - 99, 30.12.2021

Fırat Aşır , Tuğcan Korak , Özgür Öztürk

https://doi.org/10.51477/mejs.983750

Abstract

Multiple sequence alignment (MSA) is used to align three or more sequences of DNA, RNA and protein. It is prominent for constructing phylogenetic trees and evolutionary relationships between sequences with regard to similarities and dissimilarities. Variety of multiple sequence alignment tools are available online, each having different methods and parameters to align sequences. In this article three MSA tools; CLUSTALW, SAGA and MAFFT are used for five datasets BALiBASE_R9, DIRMBASE, SABmark and additionally constructed DNABali and ProteinBali for alignment. Result show that for both protein and DNA dataset, MAFFT may be more useful among three of MSA tool used.

Keywords

multiple sequence alignment , MAFFT , SAGA , CLUSTALW

References

[1] Notredame, C. “Recent Evolutions of Multiple Sequence Alignment Algorithms”, PLOS Computational Biology, 3(8), e123, 2007.
[2] Edgar, R.C., Batzoglou, S. “Multiple sequence alignment”, Current opinion in structural biology, 16(3), 368-373, 2006.
[3] Moretti, S., et al. “The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods”, Nucleic Acids Research, 35(Web Server issue), W645-8, 2007.
[4] Chowdhury, B., Garai, G. “A review on multiple sequence alignment from the perspective of genetic algorithm”, Genomics, 109(5), 419-431, 2017.
[5] Edgar, R.C. “MUSCLE: a multiple sequence alignment method with reduced time and space complexity”, BMC Bioinformatics, 5, 113, 2004.
[6] Kumar, S., Filipski, A. “Multiple sequence alignment: in pursuit of homologous DNA positions”, Genome Research, 17(2), 127-35, 2007.
[7] Chatzou, M., et al. “Multiple sequence alignment modeling: methods and applications”, Briefings in Bioinformatics, 17(6), 1009-1023, 2016.
[8] Bawono, P., et al. “Multiple Sequence Alignment”, Methods Mol Biol, 1525, 167-189, 2017.
[9] Thompson, J.D. et al. “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, Nucleic Acids Research, 22(22), 4673-80, 1994.
[10] Notredame, C, Higgins, D.G. “SAGA: Sequence Alignment by Genetic Algorithm”, Nucleic Acids Research, 24(8), 1515-1524, 1996.
[11] Katoh, K., et al. “MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform”, Nucleic Acids Research, 30(14), 3059-66, 2002.
[12] Sievers, F., et al. “Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega”, Molecular Systems Biology, 7, 539, 2011.
[13] Pei, J., Grishin, N.V. “MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information”, Nucleic Acids Research, 34(16), 4364-4374, 2006.
[14] Do, C.B., et al. “ProbCons: Probabilistic consistency-based multiple sequence alignment”, Genome Research, 15(2), 330-40, 2005.
[15] Notredame, C., et al. “T-Coffee: A novel method for fast and accurate multiple sequence alignment”, Journal of Molecular Biology, 302(1), 205-17, 2000.
[16] Morgenstern, B., et al. “DIALIGN: finding local similarities by multiple sequence alignment”, Bioinformatics, 14(3), 290-4, 1998.
[17] Pei, J., et al. “PROMALS3D: a tool for multiple protein sequence and structure alignments”, Nucleic Acids Research, 36(7), 2295-300, 2008.
[18] Lassmann, T., Sonnhammer, E.L.L. “Kalign – an accurate and fast multiple sequence alignment algorithm”, BMC Bioinformatics, 6(1), 298, 2005.
[19] Wallace, I.M., et al. “M-Coffee: combining multiple sequence alignment methods with T-Coffee”, Nucleic acids research, 34(6), 1692-1699, 2006.
[20] Van Walle, I., et al. “Align-m--a new algorithm for multiple alignment of highly divergent sequences”, Bioinformatics, 20(9), 1428-35, 2004.
[21] Löytynoja, A., Goldman, N. “Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis”, Science, 320(5883), 1632-5, 2008.
[22] Löytynoja, A., Goldman, N. “An algorithm for progressive multiple alignment of sequences with insertions”, Proceedings of the National Academy of Sciences of the United States of America, 102(30), 10557-62, 2005.
[23] O'Sullivan, O., et al. “3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments”, Journal of Molecular Biology, 340(2), 385-395, 2004.
[24] Armougom, F., et al. “Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee”, Nucleic acids research, 34(Web Server issue), W604-W608, 2006.
[25] Zou, Q., et al. “HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy”, Bioinformatics, 31(15), 2475-81, 2015
[26] Pais, F.S., Ruy, P.C., Oliveira, G. and Coimbra, R.S. “Assessing the efficiency of multiple sequence alignment programs”, Algorithms for Molecular Biology, 9(1), 4, 2014.
[27] Subramanian, A.R., et al. “DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment”, Algorithms for Molecular Biology, 3:6, 2008.
[28] Menke, M., et al. “Matt: local flexibility aids protein multiple structure alignment”, PLOS Computational Biology, 4(1), e10, 2008.
[29] Van Walle, I., et al. “SABmark--a benchmark for sequence alignment that covers the entire known fold space”, Bioinformatics, 21(7), 1267-1268, 2005.

Year 2021, Volume: 7 Issue: 2, 89 - 99, 30.12.2021

Fırat Aşır , Tuğcan Korak , Özgür Öztürk

https://doi.org/10.51477/mejs.983750

Abstract

References

[1] Notredame, C. “Recent Evolutions of Multiple Sequence Alignment Algorithms”, PLOS Computational Biology, 3(8), e123, 2007.
[2] Edgar, R.C., Batzoglou, S. “Multiple sequence alignment”, Current opinion in structural biology, 16(3), 368-373, 2006.
[3] Moretti, S., et al. “The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods”, Nucleic Acids Research, 35(Web Server issue), W645-8, 2007.
[4] Chowdhury, B., Garai, G. “A review on multiple sequence alignment from the perspective of genetic algorithm”, Genomics, 109(5), 419-431, 2017.
[5] Edgar, R.C. “MUSCLE: a multiple sequence alignment method with reduced time and space complexity”, BMC Bioinformatics, 5, 113, 2004.
[6] Kumar, S., Filipski, A. “Multiple sequence alignment: in pursuit of homologous DNA positions”, Genome Research, 17(2), 127-35, 2007.
[7] Chatzou, M., et al. “Multiple sequence alignment modeling: methods and applications”, Briefings in Bioinformatics, 17(6), 1009-1023, 2016.
[8] Bawono, P., et al. “Multiple Sequence Alignment”, Methods Mol Biol, 1525, 167-189, 2017.
[9] Thompson, J.D. et al. “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, Nucleic Acids Research, 22(22), 4673-80, 1994.
[10] Notredame, C, Higgins, D.G. “SAGA: Sequence Alignment by Genetic Algorithm”, Nucleic Acids Research, 24(8), 1515-1524, 1996.
[11] Katoh, K., et al. “MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform”, Nucleic Acids Research, 30(14), 3059-66, 2002.
[12] Sievers, F., et al. “Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega”, Molecular Systems Biology, 7, 539, 2011.
[13] Pei, J., Grishin, N.V. “MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information”, Nucleic Acids Research, 34(16), 4364-4374, 2006.
[14] Do, C.B., et al. “ProbCons: Probabilistic consistency-based multiple sequence alignment”, Genome Research, 15(2), 330-40, 2005.
[15] Notredame, C., et al. “T-Coffee: A novel method for fast and accurate multiple sequence alignment”, Journal of Molecular Biology, 302(1), 205-17, 2000.
[16] Morgenstern, B., et al. “DIALIGN: finding local similarities by multiple sequence alignment”, Bioinformatics, 14(3), 290-4, 1998.
[17] Pei, J., et al. “PROMALS3D: a tool for multiple protein sequence and structure alignments”, Nucleic Acids Research, 36(7), 2295-300, 2008.
[18] Lassmann, T., Sonnhammer, E.L.L. “Kalign – an accurate and fast multiple sequence alignment algorithm”, BMC Bioinformatics, 6(1), 298, 2005.
[19] Wallace, I.M., et al. “M-Coffee: combining multiple sequence alignment methods with T-Coffee”, Nucleic acids research, 34(6), 1692-1699, 2006.
[20] Van Walle, I., et al. “Align-m--a new algorithm for multiple alignment of highly divergent sequences”, Bioinformatics, 20(9), 1428-35, 2004.
[21] Löytynoja, A., Goldman, N. “Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis”, Science, 320(5883), 1632-5, 2008.
[22] Löytynoja, A., Goldman, N. “An algorithm for progressive multiple alignment of sequences with insertions”, Proceedings of the National Academy of Sciences of the United States of America, 102(30), 10557-62, 2005.
[23] O'Sullivan, O., et al. “3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments”, Journal of Molecular Biology, 340(2), 385-395, 2004.
[24] Armougom, F., et al. “Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee”, Nucleic acids research, 34(Web Server issue), W604-W608, 2006.
[25] Zou, Q., et al. “HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy”, Bioinformatics, 31(15), 2475-81, 2015
[26] Pais, F.S., Ruy, P.C., Oliveira, G. and Coimbra, R.S. “Assessing the efficiency of multiple sequence alignment programs”, Algorithms for Molecular Biology, 9(1), 4, 2014.
[27] Subramanian, A.R., et al. “DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment”, Algorithms for Molecular Biology, 3:6, 2008.
[28] Menke, M., et al. “Matt: local flexibility aids protein multiple structure alignment”, PLOS Computational Biology, 4(1), e10, 2008.
[29] Van Walle, I., et al. “SABmark--a benchmark for sequence alignment that covers the entire known fold space”, Bioinformatics, 21(7), 1267-1268, 2005.

There are 29 citations in total.

Details

Primary Language	English
Subjects	Structural Biology, Biochemistry and Cell Biology (Other)
Journal Section	Research Article
Authors	Fırat Aşır 0000-0002-6384-9146 Tuğcan Korak 0000-0003-4902-4022 Özgür Öztürk 0000-0003-2605-4587
Publication Date	December 30, 2021
Submission Date	August 17, 2021
Acceptance Date	November 25, 2021
Published in Issue	Year 2021 Volume: 7 Issue: 2

Cite

IEEE	F. Aşır, T. Korak, and Ö. Öztürk, “THREE MSA TOOLS ANALYSIS in DNA and PROTEIN DATASETS”, MEJS, vol. 7, no. 2, pp. 89–99, 2021, doi: 10.51477/mejs.983750.

Download Cover Image

Article Files

Full Text

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

17826265674769