Research Article
BibTex RIS Cite

Protein Homology Modeling in the Low Sequence Similarity Regime

Year 2024, Volume: 7 Issue: 2, 165 - 174, 15.03.2024
https://doi.org/10.34248/bsengineering.1402011

Abstract

Predicting the 3-D structure of a protein from its sequence based on a template protein structure is still one of the most exact modeling techniques present today. However, template-based modeling is heavily dependent on the selection of a single template structure and the sequence alignment between target and template. Mainly when the target and template sequence identity is low, the error from the alignment introduces larger errors to the model structure. An iterative method to correct such alignment mistakes is used in this study with a benchmark set from CASP in the extremely low sequence-identity regime. This is a protocol developed and tested before and it evaluates the alignment quality by building rough 3-D models for each alignment. Then by using a genetic algorithm it iteratively creates a new set of alignments. Since the method evaluates models, not sequence alignments, structural features are automatically incorporated into the alignment protocol. In the current study, models from structural alignment have been built by Modeller program to show the maximum possible quality of the model that can be obtained from that template structure with the iterative modeling protocol. Then the results and correctly aligned segments from the iterative modeling protocol are analyzed. Finally, it has been shown that if a good local fragment assessment scoring function is developed, the correctly aligned segments exist in the pool of alignments created by the protocol. Thus, the improvement of modeling in the low sequence identity regime is conceivable.

References

  • Bertoline LMF, Lima AN, Krieger JE, Teixeira SK. 2023. Before and after AlphaFold2: An overview of protein structure prediction. Front Bioinform, 3: 1120370.
  • Bonneau R, Baker D. 2001. Ab initio protein structure prediction: Progress and prospects. Annu Rev Biophys Biomol Struct, 30: 173-189.
  • Chen H, Kihara D. 2011. Effect of using suboptimal alignments in template-based protein structure prediction. Proteins: Structure, Function and Bioinformatics, 79(1): 315-334.
  • Dunbrack RLJ. 2006. Sequence comparison and protein structure prediction. Curr Opin Struct Biol, 16(3): 374-384.
  • Eramian DD. 2008. Assessment and Prediction of Protein Structures. PhD thesis, University, University of California at San Franciso, San Francisco, pp: 252. URL: https://escholarship.org/uc/item/3k41q2cq (accessed date: June 12, 2023).
  • Gromiha MM, Nagarajan R, Selvaraj S. 2018. Protein structural bioinformatics: An overview. In Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics, 2: 445-459.
  • Guex N, Peitsch MC. 1997. Swiss PDB Viewer - References. Electrophoresis, 18(15): 2714-2723.
  • Hardin C, Pogorelov TV, Luthey-Schulten Z. 2002. Ab initio protein structure prediction. Curr Opin Struct Biol, 12(2): 176-181.
  • John B, Sali A. 2003. Comparative protein structure modeling by iterative alignment, model building and model assessment. Nucleic Acids Res, 31(14): 3982-3992.
  • Jones DT. 1999. GenTHREADER: An efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol, 287(4): 797-815.
  • Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Hassabis D. 2021. Applying and improving AlphaFold at CASP14. Prot Struct Functi Bioinformat, 89(12): 1711-1721.
  • Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S,
  • Jain R, Adler J, Hassabis D. 2021. Highly accurate protein structure prediction with AlphaFold. Nature, 596: 583-589.
  • Kahsay RY, Wang G, Gao G, Liao L, Dunbrack R. 2005. Quasi-consensus-based comparison of profile hidden Markov models for protein sequences. Bioinformatics, 21(10): 2287-2293.
  • Kim DE, Chivian D, Baker D. 2004. Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res, 32: W526-W531.
  • Marti-Renom MA, Madhusudhan MS, Sali A. 2004. Alignment of protein sequences by their profiles. Protein Sci, 13(4): 1071-1087.
  • Nassar R, Dignon GL, Razban RM, Dill KA. 2021. The Protein Folding Problem: The Role of Theory. J Mol Biol, 433(20): 167126.
  • Pearce R, Li Y, Omenn GS, Zhang Y. 2022. Fast and accurate Ab Initio Protein structure prediction using deep learning potentials. PLoS Comput Biol, 18(9): e1010539.
  • Pieper U, Webb BM, Dong GQ, Schneidman-Duhovny D, Fan H, Kim SJ, Khuri N, Spill YG, Weinkam P, Hammel M, Tainer JA, Nilges M, Sali A. 2006. MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res, 34: D291-5.
  • Rohl CA, Strauss CEM, Misura KMS, Baker D. 2004. Protein structure prediction using rosetta. Meth Enzymol, 383: 66-93.
  • Sauder JM, Arthur JW, Dunbrack RLJ. 2000. Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins, 40(1): 6-22.
  • Shen MY, Sali A. 2006. Statistical potential for assessment and prediction of protein structures. Protein Sci, 15(11): 2507-2524.
  • Soding J. 2005. Protein homology detection by HMM-HMM comparison. Bioinformatics, 21(7): 951-960.
  • Wang G, Dunbrack RLJ. 2004. Scoring profile-to-profile sequence alignments. Protein Sci, 13(6): 1612-1626.
  • Webb B, Sali A. 2016. Comparative protein structure modeling using MODELLER. Curr Protoc Bioinformatics, 20(54): 5.6.1-5.6.37.
  • Xu D, Zhang Y. 2012. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Prot Struct Funct Bioinformat, 80(7): 1715-1735.
  • Yang J, Zhang Y. 2015. I-TASSER server: New development for protein structure and function predictions. Nucleic Acids Res, 43(W1): W174-W181.
  • Zhou H, Zhou Y. 2005. Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins, 58(2): 321-328.

Protein Homology Modeling in the Low Sequence Similarity Regime

Year 2024, Volume: 7 Issue: 2, 165 - 174, 15.03.2024
https://doi.org/10.34248/bsengineering.1402011

Abstract

Predicting the 3-D structure of a protein from its sequence based on a template protein structure is still one of the most exact modeling techniques present today. However, template-based modeling is heavily dependent on the selection of a single template structure and the sequence alignment between target and template. Mainly when the target and template sequence identity is low, the error from the alignment introduces larger errors to the model structure. An iterative method to correct such alignment mistakes is used in this study with a benchmark set from CASP in the extremely low sequence-identity regime. This is a protocol developed and tested before and it evaluates the alignment quality by building rough 3-D models for each alignment. Then by using a genetic algorithm it iteratively creates a new set of alignments. Since the method evaluates models, not sequence alignments, structural features are automatically incorporated into the alignment protocol. In the current study, models from structural alignment have been built by Modeller program to show the maximum possible quality of the model that can be obtained from that template structure with the iterative modeling protocol. Then the results and correctly aligned segments from the iterative modeling protocol are analyzed. Finally, it has been shown that if a good local fragment assessment scoring function is developed, the correctly aligned segments exist in the pool of alignments created by the protocol. Thus, the improvement of modeling in the low sequence identity regime is conceivable.

References

  • Bertoline LMF, Lima AN, Krieger JE, Teixeira SK. 2023. Before and after AlphaFold2: An overview of protein structure prediction. Front Bioinform, 3: 1120370.
  • Bonneau R, Baker D. 2001. Ab initio protein structure prediction: Progress and prospects. Annu Rev Biophys Biomol Struct, 30: 173-189.
  • Chen H, Kihara D. 2011. Effect of using suboptimal alignments in template-based protein structure prediction. Proteins: Structure, Function and Bioinformatics, 79(1): 315-334.
  • Dunbrack RLJ. 2006. Sequence comparison and protein structure prediction. Curr Opin Struct Biol, 16(3): 374-384.
  • Eramian DD. 2008. Assessment and Prediction of Protein Structures. PhD thesis, University, University of California at San Franciso, San Francisco, pp: 252. URL: https://escholarship.org/uc/item/3k41q2cq (accessed date: June 12, 2023).
  • Gromiha MM, Nagarajan R, Selvaraj S. 2018. Protein structural bioinformatics: An overview. In Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics, 2: 445-459.
  • Guex N, Peitsch MC. 1997. Swiss PDB Viewer - References. Electrophoresis, 18(15): 2714-2723.
  • Hardin C, Pogorelov TV, Luthey-Schulten Z. 2002. Ab initio protein structure prediction. Curr Opin Struct Biol, 12(2): 176-181.
  • John B, Sali A. 2003. Comparative protein structure modeling by iterative alignment, model building and model assessment. Nucleic Acids Res, 31(14): 3982-3992.
  • Jones DT. 1999. GenTHREADER: An efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol, 287(4): 797-815.
  • Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Hassabis D. 2021. Applying and improving AlphaFold at CASP14. Prot Struct Functi Bioinformat, 89(12): 1711-1721.
  • Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S,
  • Jain R, Adler J, Hassabis D. 2021. Highly accurate protein structure prediction with AlphaFold. Nature, 596: 583-589.
  • Kahsay RY, Wang G, Gao G, Liao L, Dunbrack R. 2005. Quasi-consensus-based comparison of profile hidden Markov models for protein sequences. Bioinformatics, 21(10): 2287-2293.
  • Kim DE, Chivian D, Baker D. 2004. Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res, 32: W526-W531.
  • Marti-Renom MA, Madhusudhan MS, Sali A. 2004. Alignment of protein sequences by their profiles. Protein Sci, 13(4): 1071-1087.
  • Nassar R, Dignon GL, Razban RM, Dill KA. 2021. The Protein Folding Problem: The Role of Theory. J Mol Biol, 433(20): 167126.
  • Pearce R, Li Y, Omenn GS, Zhang Y. 2022. Fast and accurate Ab Initio Protein structure prediction using deep learning potentials. PLoS Comput Biol, 18(9): e1010539.
  • Pieper U, Webb BM, Dong GQ, Schneidman-Duhovny D, Fan H, Kim SJ, Khuri N, Spill YG, Weinkam P, Hammel M, Tainer JA, Nilges M, Sali A. 2006. MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res, 34: D291-5.
  • Rohl CA, Strauss CEM, Misura KMS, Baker D. 2004. Protein structure prediction using rosetta. Meth Enzymol, 383: 66-93.
  • Sauder JM, Arthur JW, Dunbrack RLJ. 2000. Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins, 40(1): 6-22.
  • Shen MY, Sali A. 2006. Statistical potential for assessment and prediction of protein structures. Protein Sci, 15(11): 2507-2524.
  • Soding J. 2005. Protein homology detection by HMM-HMM comparison. Bioinformatics, 21(7): 951-960.
  • Wang G, Dunbrack RLJ. 2004. Scoring profile-to-profile sequence alignments. Protein Sci, 13(6): 1612-1626.
  • Webb B, Sali A. 2016. Comparative protein structure modeling using MODELLER. Curr Protoc Bioinformatics, 20(54): 5.6.1-5.6.37.
  • Xu D, Zhang Y. 2012. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Prot Struct Funct Bioinformat, 80(7): 1715-1735.
  • Yang J, Zhang Y. 2015. I-TASSER server: New development for protein structure and function predictions. Nucleic Acids Res, 43(W1): W174-W181.
  • Zhou H, Zhou Y. 2005. Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins, 58(2): 321-328.
There are 28 citations in total.

Details

Primary Language English
Subjects Genetics (Other), Animal Cell and Molecular Biology, Protein Engineering
Journal Section Research Articles
Authors

Sebnem Essız 0000-0002-5476-4722

Early Pub Date February 17, 2024
Publication Date March 15, 2024
Submission Date December 9, 2023
Acceptance Date January 15, 2024
Published in Issue Year 2024 Volume: 7 Issue: 2

Cite

APA Essız, S. (2024). Protein Homology Modeling in the Low Sequence Similarity Regime. Black Sea Journal of Engineering and Science, 7(2), 165-174. https://doi.org/10.34248/bsengineering.1402011
AMA Essız S. Protein Homology Modeling in the Low Sequence Similarity Regime. BSJ Eng. Sci. March 2024;7(2):165-174. doi:10.34248/bsengineering.1402011
Chicago Essız, Sebnem. “Protein Homology Modeling in the Low Sequence Similarity Regime”. Black Sea Journal of Engineering and Science 7, no. 2 (March 2024): 165-74. https://doi.org/10.34248/bsengineering.1402011.
EndNote Essız S (March 1, 2024) Protein Homology Modeling in the Low Sequence Similarity Regime. Black Sea Journal of Engineering and Science 7 2 165–174.
IEEE S. Essız, “Protein Homology Modeling in the Low Sequence Similarity Regime”, BSJ Eng. Sci., vol. 7, no. 2, pp. 165–174, 2024, doi: 10.34248/bsengineering.1402011.
ISNAD Essız, Sebnem. “Protein Homology Modeling in the Low Sequence Similarity Regime”. Black Sea Journal of Engineering and Science 7/2 (March 2024), 165-174. https://doi.org/10.34248/bsengineering.1402011.
JAMA Essız S. Protein Homology Modeling in the Low Sequence Similarity Regime. BSJ Eng. Sci. 2024;7:165–174.
MLA Essız, Sebnem. “Protein Homology Modeling in the Low Sequence Similarity Regime”. Black Sea Journal of Engineering and Science, vol. 7, no. 2, 2024, pp. 165-74, doi:10.34248/bsengineering.1402011.
Vancouver Essız S. Protein Homology Modeling in the Low Sequence Similarity Regime. BSJ Eng. Sci. 2024;7(2):165-74.

                                                24890