Derin Öğrenme (CNN, RNN, LSTM, GRU) Kullanarak Protein İkincil Yapı Tahmini

Ezgi Çakmak; İhsan Hakan Selvi

doi:10.26650/acin.1008075

Araştırma Makalesi

Using Deep Learning (CNN, RNN, LSTM, GRU) methods for the prediction of Protein Secondary Structure

Yıl 2022, Cilt: 6 Sayı: 1, 43 - 52, 28.06.2022

Ezgi Çakmak , İhsan Hakan Selvi

https://doi.org/10.26650/acin.1008075

https://izlik.org/JA94TJ96ER

Öz

Proteins play a crucial function in the biological processes of living organisms. Knowing the function of the protein offers significant insight into future biological and medical research. Since a protein’s shape determines its function, it is important to understand the protein’s 3D structure. Although experimental methods such as X-ray crystallography and nuclear magnetic resonance (NMR) have been used to examine the shape of proteins, so far the results have been insufficient. As a result, predicting the 3D structure of proteins is crucial. Determining the 3D structure of a protein from its primary structure is challenging. Therefore, predicting the protein secondary structure becomes important for studying its structure and function. Many emerging methods, including machine learning, as well as deep learning, have been used to predict the secondary structure of proteins and comprise a crucial part of Structural Bioinformatics. The goal of this study is to compare the results generated by predictive models that were created using the four most frequently utilized deep learning methods: convolutional neural networks (CNN), recurrent neural networks (RNN), long short term memory networks (LSTM), and gated recurrent units (GRU). The CB513 dataset was used to train and test these models, and performance evaluation metrics viz. accuracy, f1 score, recall, and precision were applied. The CNN, RNN, LSTM, and GRU models had an accuracy of 82.54%, 82.06%, 81.1%, and 81.48%, respectively.

Anahtar Kelimeler

Protein Secondary Structure Prediction , CNN , RNN , GRU

Kaynakça

Allison, L. A. (2007). From gene to protein. In Fundamental Molecular Biology (1. Baskı). Blackwell Publishing. google scholar
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z, Miller, W., & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25(17), 3389-3402. https://doi.org/10.1093/nar/25.17.3389 google scholar
Aydin, Z., Kaynar, O., & Görmez, Y. (2018). Comparison of NR and UniClust databases for protein secondary structure prediction. 2018 26th Signal Processing and Communications Applications Conference (SIU), 1-4. https://doi.org/10.1109/SIU.2018.8404285 google scholar
Baldi, P., Brunak, S., Frasconi, P., Soda, G., & Pollastri, G. (1999). Exploiting the past and the future in protein secondary structure prediction. Bioinformatics, 15(11). https://doi.org/10.1093/bioinformatics/15.11.937 google scholar
Branden, C. I., & Tooze, J. (2012). Introduction to Protein Structure. In Introduction to Protein Structure. https://doi.org/10.1201/9781136969898 google scholar
Chen, C., Tian, Y., Zou, X., Cai, P., & Mo, J. (2007). Prediction of protein secondary structure content using support vector machine. Talanta. https://doi. org/10.1016/j.talanta.2006.09.015 google scholar
Chou, P. Y., & Fasman, G. D. (1974). Prediction of Protein Conformation. Biochemistry. https://doi.org/10.1021/bi00699a002 google scholar
Colab. (n.d.). Retrieved from https://research.google.com/colaboratory/intl/tr/faq.html google scholar
Cuff, J. A., & Barton, G. J. (1999). Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins: Structure, Function, and Bioinformatics, 34(4), 508-519. https://doi.org/https://doi.org/10.1002/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO;2-4 google scholar
Dill, K. A., & MacCallum, J. L. (2012). The protein-folding problem, 50 years on. Science. https://doi.org/10.1126/science.1219021 google scholar
Garnier, J., Osguthorpe, D. J., & Robson, B. (1978). Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. Journal of Molecular Biology. https://doi.org/10.1016/0022-2836(78)90297-8 google scholar
Heffernan, R., Yang, Y., Paliwal, K., & Zhou, Y. (2017). Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics. https:// doi.org/10.1093/bioinformatics/btx218 google scholar
Kneller, D. G., Cohen, F. E., & Langridge, R. (1990). Improvements in protein secondary structure prediction by an enhanced neural network. Journal of Molecular Biology. https://doi.org/10.1016/0022-2836(90)90154-E google scholar
Liu, H. L., & Hsu, J. P. (2005). Recent developments in structural proteomics for protein structure determination. Proteomics. https://doi.org/10.1002/ pmic.200401104 google scholar
Mirabello, C., & Pollastri, G. (2013). Porter, PaleAle 4.0: High-accuracy prediction of protein secondary structure and relative solvent accessibility. Bioinformatics. https://doi.org/10.1093/bioinformatics/btt344 google scholar
Moraes, I., Evans, G., Sanchez-Weatherby, J., Newstead, S., & Stewart, P. D. S. (2014). Membrane protein structure determination—the next generation. Biochimica et Biophysica Acta (BBA)-Biomembranes, 1838(1), 78-87. google scholar
Nguyen, M. N., & Rajapakse, J. C. (2003). Multi-class support vector machines for protein secondary structure prediction. Genome Informatics. International Conference on Genome Informatics. https://doi.org/10.11234/gi1990.14.218 google scholar
Pollastri, G., & McLysaght, A. (2005). Porter: A new, accurate server for protein secondary structure prediction. Bioinformatics. https://doi.org/10.1093/ bioinformatics/bti203 google scholar
Rost, B., & Sander, C. (2000). Third generation prediction of secondary structures. In Webster, D. (Ed.), Protein Structure Prediction: Methods and Protocols (pp. 71-95). https://doi.org/10.1385/1-59259-368-2:71 google scholar
Rost, B. (2003). Rising Accuracy of Protein Secondary Structure Prediction. 207-249. google scholar
Rost, Burkhard, & Sander, C. (1994). Combining evolutionary information and neural networks to predict protein secondary structure. Proteins: Structure, Function, and Bioinformatics. https://doi.org/10.1002/prot.340190108 google scholar
Salamov, A. A., & Solovyev, V. V. (1995). Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. Journal of Molecular Biology. https://doi.org/10.1006/jmbi.1994.0116 google scholar
Wang, J., Cheng, J., Zhao, Z., & Lu, W. (2019). Protein Secondary Structure Prediction Using Ensemble of LSTM Neural Networks. 2019 2nd International Conference on Information Systems and Computer Aided Education, ICISCAE 2019. https://doi.org/10.1109/ICISCAE48440.2019.221626 google scholar
Wang, Y., Mao, H., & Yi, Z. (2017). Protein secondary structure prediction by using deep learning method. Knowledge-Based Systems, 118, 115-123. https://doi.org/10.1016/j.knosys.2016.11.015 google scholar
Yi, T. M., & Lander, E. S. (1993). Protein secondary structure prediction using nearest-neighbor methods. Journal of Molecular Biology. https://doi.org/10.1006/jmbi.1993.1464 google scholar

Derin Öğrenme (CNN, RNN, LSTM, GRU) Kullanarak Protein İkincil Yapı Tahmini

Yıl 2022, Cilt: 6 Sayı: 1, 43 - 52, 28.06.2022

Ezgi Çakmak , İhsan Hakan Selvi

https://doi.org/10.26650/acin.1008075

https://izlik.org/JA94TJ96ER

Öz

Protein, canlı organizmaların biyolojik süreçlerinde çok önemli bir role sahiptir. Proteinin işlevini bilmek, biyoloji ve tıp alanında gelecekteki çalışmalara büyük katkı sağlar. Proteinin fonksiyonunu anlamak için üç boyutlu yapısını anlamak önemlidir. Protein yapısını çözümlemek için X-ışını kristalografisi ve NMR gibi deneysel yöntemler kullanılmasına rağmen, sonuçların yetersiz olduğu kanıtlanmıştır. Bu nedenle, proteinlerin üç boyutlu yapısının tahmini, süreçlerdeki en önemli konulardan biri haline gelmektedir. Birincil yapı olarak bilinen amino asit dizisinden proteinin üç boyutlu şeklinin belirlenmesi zorlu olarak tanımlandığından, ikincil yapının tahmin edilmesi bu konuda önemli bir rol oynamaktadır. Literatürde protein ikincil yapısını tahmin etmek için makine öğrenmesi ve son zamanlarda derin öğrenme gibi birçok yöntem kullanılmıştır. Bu makale, yaygın olarak uygulanan dört derin öğrenme yöntemi olan CNN, RNN, LSTM ve GRU kullanılarak geliştirilen modellerin performanslarının bir karşılaştırmasını sağlamayı amaçlamaktadır. Bu modellerin eğitimi ve test edilmesi amacıyla CB513 veri seti kullanılmış, buna ek olarak doğruluk, f1 skoru, doğruluk ve kesinlik gibi performans değerlendirme ölçütleri uygulanmıştır. CNN, RNN, LSTM ve GRU modellerinin doğruluk oranları sırasıyla %82,54, %82,06, %81,1 ve %81,48’dir.

Anahtar Kelimeler

Protein İkincil Yapı Tahmini , CNN , RNN , GRU

Kaynakça

Allison, L. A. (2007). From gene to protein. In Fundamental Molecular Biology (1. Baskı). Blackwell Publishing. google scholar
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z, Miller, W., & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25(17), 3389-3402. https://doi.org/10.1093/nar/25.17.3389 google scholar
Aydin, Z., Kaynar, O., & Görmez, Y. (2018). Comparison of NR and UniClust databases for protein secondary structure prediction. 2018 26th Signal Processing and Communications Applications Conference (SIU), 1-4. https://doi.org/10.1109/SIU.2018.8404285 google scholar
Baldi, P., Brunak, S., Frasconi, P., Soda, G., & Pollastri, G. (1999). Exploiting the past and the future in protein secondary structure prediction. Bioinformatics, 15(11). https://doi.org/10.1093/bioinformatics/15.11.937 google scholar
Branden, C. I., & Tooze, J. (2012). Introduction to Protein Structure. In Introduction to Protein Structure. https://doi.org/10.1201/9781136969898 google scholar
Chen, C., Tian, Y., Zou, X., Cai, P., & Mo, J. (2007). Prediction of protein secondary structure content using support vector machine. Talanta. https://doi. org/10.1016/j.talanta.2006.09.015 google scholar
Chou, P. Y., & Fasman, G. D. (1974). Prediction of Protein Conformation. Biochemistry. https://doi.org/10.1021/bi00699a002 google scholar
Colab. (n.d.). Retrieved from https://research.google.com/colaboratory/intl/tr/faq.html google scholar
Cuff, J. A., & Barton, G. J. (1999). Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins: Structure, Function, and Bioinformatics, 34(4), 508-519. https://doi.org/https://doi.org/10.1002/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO;2-4 google scholar
Dill, K. A., & MacCallum, J. L. (2012). The protein-folding problem, 50 years on. Science. https://doi.org/10.1126/science.1219021 google scholar
Garnier, J., Osguthorpe, D. J., & Robson, B. (1978). Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. Journal of Molecular Biology. https://doi.org/10.1016/0022-2836(78)90297-8 google scholar
Heffernan, R., Yang, Y., Paliwal, K., & Zhou, Y. (2017). Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics. https:// doi.org/10.1093/bioinformatics/btx218 google scholar
Kneller, D. G., Cohen, F. E., & Langridge, R. (1990). Improvements in protein secondary structure prediction by an enhanced neural network. Journal of Molecular Biology. https://doi.org/10.1016/0022-2836(90)90154-E google scholar
Liu, H. L., & Hsu, J. P. (2005). Recent developments in structural proteomics for protein structure determination. Proteomics. https://doi.org/10.1002/ pmic.200401104 google scholar
Mirabello, C., & Pollastri, G. (2013). Porter, PaleAle 4.0: High-accuracy prediction of protein secondary structure and relative solvent accessibility. Bioinformatics. https://doi.org/10.1093/bioinformatics/btt344 google scholar
Moraes, I., Evans, G., Sanchez-Weatherby, J., Newstead, S., & Stewart, P. D. S. (2014). Membrane protein structure determination—the next generation. Biochimica et Biophysica Acta (BBA)-Biomembranes, 1838(1), 78-87. google scholar
Nguyen, M. N., & Rajapakse, J. C. (2003). Multi-class support vector machines for protein secondary structure prediction. Genome Informatics. International Conference on Genome Informatics. https://doi.org/10.11234/gi1990.14.218 google scholar
Pollastri, G., & McLysaght, A. (2005). Porter: A new, accurate server for protein secondary structure prediction. Bioinformatics. https://doi.org/10.1093/ bioinformatics/bti203 google scholar
Rost, B., & Sander, C. (2000). Third generation prediction of secondary structures. In Webster, D. (Ed.), Protein Structure Prediction: Methods and Protocols (pp. 71-95). https://doi.org/10.1385/1-59259-368-2:71 google scholar
Rost, B. (2003). Rising Accuracy of Protein Secondary Structure Prediction. 207-249. google scholar
Rost, Burkhard, & Sander, C. (1994). Combining evolutionary information and neural networks to predict protein secondary structure. Proteins: Structure, Function, and Bioinformatics. https://doi.org/10.1002/prot.340190108 google scholar
Salamov, A. A., & Solovyev, V. V. (1995). Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. Journal of Molecular Biology. https://doi.org/10.1006/jmbi.1994.0116 google scholar
Wang, J., Cheng, J., Zhao, Z., & Lu, W. (2019). Protein Secondary Structure Prediction Using Ensemble of LSTM Neural Networks. 2019 2nd International Conference on Information Systems and Computer Aided Education, ICISCAE 2019. https://doi.org/10.1109/ICISCAE48440.2019.221626 google scholar
Wang, Y., Mao, H., & Yi, Z. (2017). Protein secondary structure prediction by using deep learning method. Knowledge-Based Systems, 118, 115-123. https://doi.org/10.1016/j.knosys.2016.11.015 google scholar
Yi, T. M., & Lander, E. S. (1993). Protein secondary structure prediction using nearest-neighbor methods. Journal of Molecular Biology. https://doi.org/10.1006/jmbi.1993.1464 google scholar

Toplam 25 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Bölüm	Araştırma Makalesi
Yazarlar	Ezgi Çakmak 0000-0002-6970-8651 İhsan Hakan Selvi 0000-0002-8837-2137
Gönderilme Tarihi	11 Ekim 2021
Yayımlanma Tarihi	28 Haziran 2022
DOI	https://doi.org/10.26650/acin.1008075
IZ	https://izlik.org/JA94TJ96ER
Yayımlandığı Sayı	Yıl 2022 Cilt: 6 Sayı: 1

Kaynak Göster

APA	Çakmak, E., & Selvi, İ. H. (2022). Derin Öğrenme (CNN, RNN, LSTM, GRU) Kullanarak Protein İkincil Yapı Tahmini. Acta Infologica, 6(1), 43-52. https://doi.org/10.26650/acin.1008075
AMA	1.Çakmak E, Selvi İH. Derin Öğrenme (CNN, RNN, LSTM, GRU) Kullanarak Protein İkincil Yapı Tahmini. ACIN. 2022;6(1):43-52. doi:10.26650/acin.1008075
Chicago	Çakmak, Ezgi, ve İhsan Hakan Selvi. 2022. “Derin Öğrenme (CNN, RNN, LSTM, GRU) Kullanarak Protein İkincil Yapı Tahmini”. Acta Infologica 6 (1): 43-52. https://doi.org/10.26650/acin.1008075.
EndNote	Çakmak E, Selvi İH (01 Haziran 2022) Derin Öğrenme (CNN, RNN, LSTM, GRU) Kullanarak Protein İkincil Yapı Tahmini. Acta Infologica 6 1 43–52.
IEEE	[1]E. Çakmak ve İ. H. Selvi, “Derin Öğrenme (CNN, RNN, LSTM, GRU) Kullanarak Protein İkincil Yapı Tahmini”, ACIN, c. 6, sy 1, ss. 43–52, Haz. 2022, doi: 10.26650/acin.1008075.
ISNAD	Çakmak, Ezgi - Selvi, İhsan Hakan. “Derin Öğrenme (CNN, RNN, LSTM, GRU) Kullanarak Protein İkincil Yapı Tahmini”. Acta Infologica 6/1 (01 Haziran 2022): 43-52. https://doi.org/10.26650/acin.1008075.
JAMA	1.Çakmak E, Selvi İH. Derin Öğrenme (CNN, RNN, LSTM, GRU) Kullanarak Protein İkincil Yapı Tahmini. ACIN. 2022;6:43–52.
MLA	Çakmak, Ezgi, ve İhsan Hakan Selvi. “Derin Öğrenme (CNN, RNN, LSTM, GRU) Kullanarak Protein İkincil Yapı Tahmini”. Acta Infologica, c. 6, sy 1, Haziran 2022, ss. 43-52, doi:10.26650/acin.1008075.
Vancouver	1.Ezgi Çakmak, İhsan Hakan Selvi. Derin Öğrenme (CNN, RNN, LSTM, GRU) Kullanarak Protein İkincil Yapı Tahmini. ACIN. 01 Haziran 2022;6(1):43-52. doi:10.26650/acin.1008075

Makale Dosyaları

Tam Metin