Research Article
BibTex RIS Cite

Classifying Protein Sequences Using Convolutional Neural Network

Year 2020, Volume: 9 Issue: 4, 1663 - 1671, 25.12.2020
https://doi.org/10.17798/bitlisfen.662816

Abstract

One of the major challenges in bioinformatics is the classification and identification of protein structure and function. Large amounts of RNA data cannot be managed using traditional laboratory methods. For this, proteins should be separated according to their structure and families. Therefore, proteins need to be classified to define their biological families and functions. In traditional machine learning approaches, various feature extraction algorithms are used to classify proteins. In manual feature extraction, the selected features directly affect performance. Therefore, in the proposed method of this study, protein sequences were digitized by amino acid composition technique. The digitized protein sequences were converted to spectrograms, and automatic feature extraction was performed using 2D CNN models (VGG19, ResNet). The extracted features were classified with SVM and kNN. As a result, the accuracy with 95.03% was achieved in the classification of protein sequences using ResNet.

References

  • [1] Satpute B, Yadav R. 2018. Machine Intelligence Techniques for Protein Classification. 3rd International Conference for Convergence in Technology (I2CT). IEEE, pp 1–4, 6-8 April, Pune, India.
  • [2] Lina Yang, Yuan Yan Tang, Yang Lu, Huiwu Luo. 2015. A Fractal Dimension and Wavelet Transform Based Method for Protein Sequence Similarity Analysis. IEEE/ACM Trans Comput Biol Bioinforma;12 (2):348–359.
  • [3] Charuvaka A, Rangwala H. 2014. Classifying Protein Sequences Using Regularized Multi-Task Learning. IEEE/ACM Trans Comput Biol Bioinforma. 11 (6):1087–1098.
  • [4] Wang D, Huang G Bin. 2005. Protein sequence classification using extreme learning machine. Proceedings of the International Joint Conference on Neural Networks, 31 July-4 Aug, Montreal,Que., Canada.
  • [5] Bandyopadhyay S. 2005. An efficient technique for superfamily classification of amino acid sequences: Feature extraction, fuzzy clustering and prototype selection. Fuzzy Sets Syst.152(1):5-16.
  • [6] Ma PCH, Chan KCC. 2008. UPSEC: An Algorithm for Classifying Unaligned Protein Sequences into Functional Families. J Comput Biol, 15 (4):431–443.
  • [7] Jaakkola T, Diekhans M, Haussler D. 2000. A Discriminative Framework for Detecting Remote Protein Homologies. J Comput Biol, 7 (1–2):95–114.
  • [8] Saigo H, Vert J-P, Ueda N, Akutsu T. 2004. Protein homology detection using string alignment kernels. Bioinformatics,20 (11):1682–1689.
  • [9]. Bharill N, Tiwari A, Rawat A. 2005. A Novel Technique of Feature Extraction with Dual Similarity Measures for Protein Sequence Classification. Procedia Comput Sci, 48 795–801.
  • [10] Tsuda K, Shin HJ, Schölkopf B. 2005. Fast protein classification with multiple networks. Bioinformatics. 21(2):59-65.
  • [11] Huang DS, Zhao XM, Huang G Bin, Cheung YM. 2005. Classifying protein sequences using hydropathy blocks. Pattern Recognit.39(12):2293-2300.
  • [12] Hayat M, Khan A. 2010. Membrane protein prediction using wavelet decomposition and pseudo amino acid based feature extraction. 6th International Conference on Emerging Technologies (ICET). IEEE, pp 1–6,18-19 Oct, Islamabad, Pakistan.
  • [13] Chaturvedi B, Patil N. 2015. A novel semi-supervised approach for protein sequence classification. Souvenir of the IEEE International Advance Computing Conference, IACC 2015, 12-13 June, Banglore, India.
  • [14] Mansoori EG, Zolghadri MJ, Katebi SD. 2009. Protein superfamily classification using fuzzy rule-based classifier. IEEE Trans Nanobioscience. doi: 10.1109/TNB.2009.2016484
  • [15] Lacey A, Deng J, Xie X. 2014. Protein classification using Hidden Markov models and randomised decision trees. 7th International Conference on Biomedical Engineering and Informatics. IEEE, pp 659–664. 14-16 Oct, Dalian, China.
  • [16] Iqbal MJ, Faye I, Samir BB, Md Said A. 2014. Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics. Sci World J;2014 pp:1–12.
  • [17] Yang W-Y, Lu B-L, Yang Y. 2006. A Comparative Study on Feature Extraction from Protein Sequences for Subcellular Localization Prediction. IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology. IEEE, pp 1–8, 28-29 Sept, Toronto, Ont., Canada.
  • [18] Gopalakrishnan K, Khaitan SK, Choudhary A, Agrawal A. 2017. Deep Convolutional Neural Networks with transfer learning for computer vision-based data-driven pavement distress detection. Constr Build Mater;157 322–330.
  • [19] Ullah I, Hussain M, Qazi E-H, Aboalsamh H.2018. An automated system for epilepsy detection using EEG brain signals based on deep learning approach. Expert Syst Appl. 107 61–71.
  • [20] Documentation K. Keras. https://keras.io/. (Accessed 1 October 2019)
  • [21] He K, Zhang X, Ren S, Sun J. 2016. Deep residual learning for image recognition. IEEE on Computer Vision and Pattern Recognition. pp:770-778. 27-30 June. LasVegas, NV, USA.
  • [22] Zagoruyko S, Nikos Komodakis. 2017. Wide residual networks. arXiv 2017;1–15.
  • [23] Reddy N, Rattani A, Derakhshani R. 2018. Comparison of Deep Learning Models for Biometric-based Mobile User Authentication. IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS). 22- 25 Oct, Redondo Beach, CA, USA.
  • [24] Arslan Tuncer S, Akılotu B, Toraman S (2019) A deep learning-based decision support system for diagnosis of OSAS using PTT signals. Med Hypotheses
  • [25] Toraman S, Girgin M, Üstündağ B, Türkoğlu İ (2019) Classification of the likelihood of colon cancer with machine learning techniquesusing FTIR signals obtained from plasma. Turkish J Electr Eng Comput Sci 27:1765–1779.
  • [26] Chrysostomou C, Seker H. 2016. Structural classification of protein sequences based on signal processing and support vector machines. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, 16-20 Aug, Orlando, FL, USA.
  • [27] Iqbal MJ, Faye I, Said AM, 2015. Belhaouari Samir B. Optimized tree-classification algorithm for classification of protein sequences. International Symposium on Mathematical Sciences and Computing Research (iSMSC). IEEE, pp 110–115, 19-20 May, Ipon, Malaysia.
  • [28] Shawkat ZM, Rahman J. 2018. Prediction of Protein-Protein Interaction from Amino Acid Sequence Using Ensemble Classifier. International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (IC4ME2). 8-9 Feb, Rajshahi, Bangladesh.

Classifying Protein Sequences Using Convolutional Neural Network

Year 2020, Volume: 9 Issue: 4, 1663 - 1671, 25.12.2020
https://doi.org/10.17798/bitlisfen.662816

Abstract

Biyoinformatikteki en büyük zorluklardan biri, protein yapısının ve fonksiyonunun öngörülmesi ve sınıflandırılmasıdır. Çok miktarda RNA verisi geleneksel laboratuvar yolu kullanılarak yönetilemez. Bunun için proteinler yapılarına ve ailelerine göre ayrılmalıdır. Bu nedenle proteinlerin biyolojik ailelerini ve fonksiyonlarını tanımlamak için sınıflandırılması gerekmektedir. Geleneksel makine öğrenme yaklaşımlarında, proteinler sınıflandırılırken çeşitli özellik çıkarım algoritmaları kullanılmaktadır. Elle özellik çıkarımında, seçilen özellikler, başarımı doğrudan etkilemektedir. Bu nedenle, bu çalışmada önerilen yaklaşımda ise protein sekanslarını amino acid composition yöntemi ile sayısallaştırılmıştır. Sayısallaştırılan protein dizilimleri spektrograma dönüştürülmüş ve 2 boyutlu ESA modelleri (VGG19, ResNet) kullanılarak otomatik özellik çıkarımı gerçekleştirilmiştir. Çıkarılan özellikler DVM ve kNN ile sınıflandırılmıştır. Sonuç olarak, ResNet kullanılarak gerçekleştirilen protein sekanslarının sınıflandırma işleminde % 95.03’lük bir doğruluğa ulaşılmıştır.

References

  • [1] Satpute B, Yadav R. 2018. Machine Intelligence Techniques for Protein Classification. 3rd International Conference for Convergence in Technology (I2CT). IEEE, pp 1–4, 6-8 April, Pune, India.
  • [2] Lina Yang, Yuan Yan Tang, Yang Lu, Huiwu Luo. 2015. A Fractal Dimension and Wavelet Transform Based Method for Protein Sequence Similarity Analysis. IEEE/ACM Trans Comput Biol Bioinforma;12 (2):348–359.
  • [3] Charuvaka A, Rangwala H. 2014. Classifying Protein Sequences Using Regularized Multi-Task Learning. IEEE/ACM Trans Comput Biol Bioinforma. 11 (6):1087–1098.
  • [4] Wang D, Huang G Bin. 2005. Protein sequence classification using extreme learning machine. Proceedings of the International Joint Conference on Neural Networks, 31 July-4 Aug, Montreal,Que., Canada.
  • [5] Bandyopadhyay S. 2005. An efficient technique for superfamily classification of amino acid sequences: Feature extraction, fuzzy clustering and prototype selection. Fuzzy Sets Syst.152(1):5-16.
  • [6] Ma PCH, Chan KCC. 2008. UPSEC: An Algorithm for Classifying Unaligned Protein Sequences into Functional Families. J Comput Biol, 15 (4):431–443.
  • [7] Jaakkola T, Diekhans M, Haussler D. 2000. A Discriminative Framework for Detecting Remote Protein Homologies. J Comput Biol, 7 (1–2):95–114.
  • [8] Saigo H, Vert J-P, Ueda N, Akutsu T. 2004. Protein homology detection using string alignment kernels. Bioinformatics,20 (11):1682–1689.
  • [9]. Bharill N, Tiwari A, Rawat A. 2005. A Novel Technique of Feature Extraction with Dual Similarity Measures for Protein Sequence Classification. Procedia Comput Sci, 48 795–801.
  • [10] Tsuda K, Shin HJ, Schölkopf B. 2005. Fast protein classification with multiple networks. Bioinformatics. 21(2):59-65.
  • [11] Huang DS, Zhao XM, Huang G Bin, Cheung YM. 2005. Classifying protein sequences using hydropathy blocks. Pattern Recognit.39(12):2293-2300.
  • [12] Hayat M, Khan A. 2010. Membrane protein prediction using wavelet decomposition and pseudo amino acid based feature extraction. 6th International Conference on Emerging Technologies (ICET). IEEE, pp 1–6,18-19 Oct, Islamabad, Pakistan.
  • [13] Chaturvedi B, Patil N. 2015. A novel semi-supervised approach for protein sequence classification. Souvenir of the IEEE International Advance Computing Conference, IACC 2015, 12-13 June, Banglore, India.
  • [14] Mansoori EG, Zolghadri MJ, Katebi SD. 2009. Protein superfamily classification using fuzzy rule-based classifier. IEEE Trans Nanobioscience. doi: 10.1109/TNB.2009.2016484
  • [15] Lacey A, Deng J, Xie X. 2014. Protein classification using Hidden Markov models and randomised decision trees. 7th International Conference on Biomedical Engineering and Informatics. IEEE, pp 659–664. 14-16 Oct, Dalian, China.
  • [16] Iqbal MJ, Faye I, Samir BB, Md Said A. 2014. Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics. Sci World J;2014 pp:1–12.
  • [17] Yang W-Y, Lu B-L, Yang Y. 2006. A Comparative Study on Feature Extraction from Protein Sequences for Subcellular Localization Prediction. IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology. IEEE, pp 1–8, 28-29 Sept, Toronto, Ont., Canada.
  • [18] Gopalakrishnan K, Khaitan SK, Choudhary A, Agrawal A. 2017. Deep Convolutional Neural Networks with transfer learning for computer vision-based data-driven pavement distress detection. Constr Build Mater;157 322–330.
  • [19] Ullah I, Hussain M, Qazi E-H, Aboalsamh H.2018. An automated system for epilepsy detection using EEG brain signals based on deep learning approach. Expert Syst Appl. 107 61–71.
  • [20] Documentation K. Keras. https://keras.io/. (Accessed 1 October 2019)
  • [21] He K, Zhang X, Ren S, Sun J. 2016. Deep residual learning for image recognition. IEEE on Computer Vision and Pattern Recognition. pp:770-778. 27-30 June. LasVegas, NV, USA.
  • [22] Zagoruyko S, Nikos Komodakis. 2017. Wide residual networks. arXiv 2017;1–15.
  • [23] Reddy N, Rattani A, Derakhshani R. 2018. Comparison of Deep Learning Models for Biometric-based Mobile User Authentication. IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS). 22- 25 Oct, Redondo Beach, CA, USA.
  • [24] Arslan Tuncer S, Akılotu B, Toraman S (2019) A deep learning-based decision support system for diagnosis of OSAS using PTT signals. Med Hypotheses
  • [25] Toraman S, Girgin M, Üstündağ B, Türkoğlu İ (2019) Classification of the likelihood of colon cancer with machine learning techniquesusing FTIR signals obtained from plasma. Turkish J Electr Eng Comput Sci 27:1765–1779.
  • [26] Chrysostomou C, Seker H. 2016. Structural classification of protein sequences based on signal processing and support vector machines. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, 16-20 Aug, Orlando, FL, USA.
  • [27] Iqbal MJ, Faye I, Said AM, 2015. Belhaouari Samir B. Optimized tree-classification algorithm for classification of protein sequences. International Symposium on Mathematical Sciences and Computing Research (iSMSC). IEEE, pp 110–115, 19-20 May, Ipon, Malaysia.
  • [28] Shawkat ZM, Rahman J. 2018. Prediction of Protein-Protein Interaction from Amino Acid Sequence Using Ensemble Classifier. International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (IC4ME2). 8-9 Feb, Rajshahi, Bangladesh.
There are 28 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Araştırma Makalesi
Authors

Bihter Daş 0000-0002-2498-3297

Suat Toraman 0000-0002-7568-4131

Publication Date December 25, 2020
Submission Date December 21, 2019
Acceptance Date April 9, 2020
Published in Issue Year 2020 Volume: 9 Issue: 4

Cite

IEEE B. Daş and S. Toraman, “Classifying Protein Sequences Using Convolutional Neural Network”, Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, vol. 9, no. 4, pp. 1663–1671, 2020, doi: 10.17798/bitlisfen.662816.

Bitlis Eren University
Journal of Science Editor
Bitlis Eren University Graduate Institute
Bes Minare Mah. Ahmet Eren Bulvari, Merkez Kampus, 13000 BITLIS