Effect of Benchmark Datasets on Protein Structure Prediction As a Concept

Nuh Azgınoğlu

doi:10.31590/ejosat.1014716

Conference Paper

Effect of Benchmark Datasets on Protein Structure Prediction As a Concept

Year 2021, , 117 - 121, 01.12.2021

Nuh Azgınoğlu

https://doi.org/10.31590/ejosat.1014716

Abstract

Knowing the protein structures is essential in understanding the job descriptions of proteins involved in vital functions, drug design, and many more. On the other hand, protein structure prediction is an alternative bioinformatics sub-study field to shorten the process that takes a long time in the laboratory environment. Performance analyzes of the methods developed in this field are generally made on benchmark datasets. The size of the datasets directly affects the algorithm runtime. In this study, how to benchmark datasets are reflected in the results is analyzed. Within the scope of the study, two different benchmark datasets, CB513 and EVASet, and two different protein structure prediction methods, JPred and Porter, were used. The study is a source of inspiration for further studies with the idea of developing benchmark datasets that are comprehensive in terms of protein properties but contain as little data as possible in terms of data size.

Keywords

Protein structure prediction, Benchmark dataset, Concept

Supporting Institution

Kayseri University Scientific Research Projects Unit

Project Number

FHD-2021-1045

Thanks

This study was supported as Project Number: FHD-2021-1045 by Kayseri University Scientific Research Projects Unit. We thank Kayseri University Scientific Research Projects unit for their contributions.

References

Asai, K., Hayamizu, S., & Handa, K. I. (1993). Prediction of protein secondary structure by the hidden Markov model. Bioinformatics, 9(2), 141-146.
Atasever, S., Azgınoglu, N., Erbay, H., & Aydın, Z. (2021). 3-State Protein Secondary Structure Prediction based on SCOPe Classes. Brazilian Archives of Biology and Technology, 64.
Aydin, Z., Azginoglu, N., Bilgin, H. I., & Celik, M. (2019). Developing structural profile matrices for protein secondary structure and solvent accessibility prediction. Bioinformatics, 35(20), 4004-4010.
Azginoglu, N., Aydin, Z., & Celik, M. (2020). Structural profile matrices for predicting structural properties of proteins. Journal of Bioinformatics and Computational Biology, 18(04), 2050022.
Bouziane, H., Messabih, B., & Chouarfia, A. (2015). Effect of simple ensemble methods on protein secondary structure prediction. Soft Computing, 19(6), 1663-1678.
Bujnicki, J. M., Elofsson, A., Fischer, D., & Rychlewski, L. (2001). LiveBench‐1: Continuous benchmarking of protein structure prediction servers. Protein Science, 10(2), 352-361.
Cuff, J. A., & Barton, G. J. (1999). Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins: Structure, Function, and Bioinformatics, 34(4), 508-519.
Drozdetskiy, A., Cole, C., Procter, J., & Barton, G. J. (2015). JPred4: a protein secondary structure prediction server. Nucleic acids research, 43(W1), W389-W394.
Jones, D. T. (1999). Protein secondary structure prediction based on position-specific scoring matrices. Journal of molecular biology, 292(2), 195-202.
Holley, L. H., & Karplus, M. (1989). Protein secondary structure prediction with a neural network. Proceedings of the National Academy of Sciences, 86(1), 152-156.
Koh, I. Y., Eyrich, V. A., Marti-Renom, M. A., Przybylski, D., Madhusudhan, M. S., Eswar, N., ... & Rost, B. (2003). EVA: evaluation of protein structure prediction servers. Nucleic Acids Research, 31(13), 3311-3315.
Krishnan, K. V. (1932). The Defence Mechanism of the Human Body. The Indian medical gazette, 67(11), 637.
KU, L. L. (1952). Lane medical lectures: proteins and enzymes.
Mirabello, C., & Pollastri, G. (2013). Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility. Bioinformatics, 29(16), 2056-2058.
Le, Q., Sievers, F., & Higgins, D. G. (2017). Protein multiple sequence alignment benchmarking through secondary structure prediction. Bioinformatics, 33(9), 1331-1337.
Pearson, W. R., & Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences, 85(8), 2444-2448.
Pirovano, W., & Heringa, J. (2010). Protein secondary structure prediction. Data Mining Techniques for the Life Sciences, 327-348.
Rost, B., & Eyrich, V. A. (2001). EVA: large‐scale analysis of secondary structure prediction. Proteins: Structure, Function, and Bioinformatics, 45(S5), 192-199.
Silverman, R. B., & Holladay, M. W. (2014). The organic chemistry of drug design and drug action. Academic press.
Spencer, M., Eickholt, J., & Cheng, J. (2014). A deep learning network approach to ab initio protein secondary structure prediction. IEEE/ACM transactions on computational biology and bioinformatics, 12(1), 103-112.
Van Goudoever, J. B., Vlaardingerbroek, H., van den Akker, C. H., de Groof, F., & van der Schoor, S. R. (2014). Amino acids and proteins. Nutritional Care of Preterm Infants, 110, 49-63.
Zemla, A., Venclovas, Č., Fidelis, K., & Rost, B. (1999). A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment. Proteins: Structure, Function, and Bioinformatics, 34(2), 220-223.

Kıyaslama Veri Kümelerinin Protein Yapı Tahminine Etkisi: Bir Kavram Çalışması

Year 2021, , 117 - 121, 01.12.2021

Nuh Azgınoğlu

https://doi.org/10.31590/ejosat.1014716

Abstract

Protein yapılarının bilinmesi hayati fonksiyonlarda görev alan proteinlerin görev tanımlarının anlaşılabilmesi, ilaç tasarımı ve daha birçok açıdan öneme sahiptir. Protein yapı tahmini ise laboratuvar ortamında oldukça uzun zaman alan süreci kısaltmak için alternatif bir biyoinformatik alt çalışma alanıdır. Bu alanda geliştirilen yöntemlerin performans analizleri genel itibariyle kıyaslama (benchmark) veri kümeleri üzerinden yapılmaktadır. Veri kümelerinin büyüklüğü algoritma çalışma zamanlarına doğrudan etki etmektedir. Bu çalışmada kapsamında kıyaslama veri kümelerinin sonuçlara nasıl yansıdığı analiz edilmiştir. Çalışma kapsamında iki CB513 ve EVASet olmak üzere iki farklı kıyaslama veri kümesi, JPred ve Porter olmak üzere iki farklı protein yapı tahmini yöntemi kullanılmıştır. Çalışma, protein özellikleri açısından geniş kapsamlı ancak, veri büyüklüğü anlamında olabildiğince az veri içerecek olan benchmark veri kümeleri geliştirme fikri itibariyle sonraki çalışmalar için esin kaynağı niteliğindedir.

Keywords

Protein yapı tahmini, Kıyaslama veri kümesi, Kavram

Project Number

FHD-2021-1045

References

Asai, K., Hayamizu, S., & Handa, K. I. (1993). Prediction of protein secondary structure by the hidden Markov model. Bioinformatics, 9(2), 141-146.
Atasever, S., Azgınoglu, N., Erbay, H., & Aydın, Z. (2021). 3-State Protein Secondary Structure Prediction based on SCOPe Classes. Brazilian Archives of Biology and Technology, 64.
Aydin, Z., Azginoglu, N., Bilgin, H. I., & Celik, M. (2019). Developing structural profile matrices for protein secondary structure and solvent accessibility prediction. Bioinformatics, 35(20), 4004-4010.
Azginoglu, N., Aydin, Z., & Celik, M. (2020). Structural profile matrices for predicting structural properties of proteins. Journal of Bioinformatics and Computational Biology, 18(04), 2050022.
Bouziane, H., Messabih, B., & Chouarfia, A. (2015). Effect of simple ensemble methods on protein secondary structure prediction. Soft Computing, 19(6), 1663-1678.
Bujnicki, J. M., Elofsson, A., Fischer, D., & Rychlewski, L. (2001). LiveBench‐1: Continuous benchmarking of protein structure prediction servers. Protein Science, 10(2), 352-361.
Cuff, J. A., & Barton, G. J. (1999). Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins: Structure, Function, and Bioinformatics, 34(4), 508-519.
Drozdetskiy, A., Cole, C., Procter, J., & Barton, G. J. (2015). JPred4: a protein secondary structure prediction server. Nucleic acids research, 43(W1), W389-W394.
Jones, D. T. (1999). Protein secondary structure prediction based on position-specific scoring matrices. Journal of molecular biology, 292(2), 195-202.
Holley, L. H., & Karplus, M. (1989). Protein secondary structure prediction with a neural network. Proceedings of the National Academy of Sciences, 86(1), 152-156.
Koh, I. Y., Eyrich, V. A., Marti-Renom, M. A., Przybylski, D., Madhusudhan, M. S., Eswar, N., ... & Rost, B. (2003). EVA: evaluation of protein structure prediction servers. Nucleic Acids Research, 31(13), 3311-3315.
Krishnan, K. V. (1932). The Defence Mechanism of the Human Body. The Indian medical gazette, 67(11), 637.
KU, L. L. (1952). Lane medical lectures: proteins and enzymes.
Mirabello, C., & Pollastri, G. (2013). Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility. Bioinformatics, 29(16), 2056-2058.
Le, Q., Sievers, F., & Higgins, D. G. (2017). Protein multiple sequence alignment benchmarking through secondary structure prediction. Bioinformatics, 33(9), 1331-1337.
Pearson, W. R., & Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences, 85(8), 2444-2448.
Pirovano, W., & Heringa, J. (2010). Protein secondary structure prediction. Data Mining Techniques for the Life Sciences, 327-348.
Rost, B., & Eyrich, V. A. (2001). EVA: large‐scale analysis of secondary structure prediction. Proteins: Structure, Function, and Bioinformatics, 45(S5), 192-199.
Silverman, R. B., & Holladay, M. W. (2014). The organic chemistry of drug design and drug action. Academic press.
Spencer, M., Eickholt, J., & Cheng, J. (2014). A deep learning network approach to ab initio protein secondary structure prediction. IEEE/ACM transactions on computational biology and bioinformatics, 12(1), 103-112.
Van Goudoever, J. B., Vlaardingerbroek, H., van den Akker, C. H., de Groof, F., & van der Schoor, S. R. (2014). Amino acids and proteins. Nutritional Care of Preterm Infants, 110, 49-63.
Zemla, A., Venclovas, Č., Fidelis, K., & Rost, B. (1999). A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment. Proteins: Structure, Function, and Bioinformatics, 34(2), 220-223.

There are 22 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Articles
Authors	Nuh Azgınoğlu 0000-0002-4074-7366
Project Number	FHD-2021-1045
Publication Date	December 1, 2021
Published in Issue	Year 2021

Cite

APA	Azgınoğlu, N. (2021). Effect of Benchmark Datasets on Protein Structure Prediction As a Concept. Avrupa Bilim Ve Teknoloji Dergisi(29), 117-121. https://doi.org/10.31590/ejosat.1014716

Article Files

Full Text