TY - JOUR T1 - Siamese Neural Networks Based Ensemble Model for the Prediction of Protein-Protein Interactions TT - Protein-Protein Etkileşimlerinin Tahmini İçin Siyam Sinir Ağı Tabanlı Topluluk Modeli AU - Kalaycı Demir, Güleser AU - Geçkin, Duygu PY - 2024 DA - July Y2 - 2024 DO - 10.7212/karaelmasfen.1415248 JF - Karaelmas Fen ve Mühendislik Dergisi PB - Zonguldak Bülent Ecevit Üniversitesi WT - DergiPark SN - 2146-7277 SP - 13 EP - 28 VL - 14 IS - 2 LA - en AB - A wide range of biological processes, including signal transmission, immunological responses, and metabolic cycles, are impacted by protein-protein interactions. These interactions have enormous implications for figuring out the origins of diseases and creating treatments. However, experimental methods for identifying PPIs are resource-intensive, time-consuming, and have limited coverage. Thus, computational techniques are essential to help and enhance activities related to protein identification. This study aims to build a deep learning network for predicting protein-protein interactions using only sequence information. Three different encoding methods are used to encode protein sequences: Binary Encoding, Autocovariance, and Position Specific Scoring Matrix. In order to predict protein-protein interactions, a convolutional Siamese neural network is employed to find complex patterns between protein sequence pairs. This network consists of two identical subnetworks with matched parameters. When applied to the human dataset, the suggested technique shows strong prediction performance with an accuracy of 84.07%, sensitivity of 92.45%, and precision of 91.45% for the model using the PSSM protein representation approach. An ensemble approach is suggested to combine the outputs from these three encoders because it is known that different encoding techniques capture various aspects of the same protein sequence. The accuracy obtained increased to 86.27% for the ensemble approach on the test set, with a sensitivity of 93.07% and a precision of 92.15%. The outcome highlights the importance of integrating several encoding methods to benefit from their complementary features and raise the accuracy of protein-protein interaction prediction. KW - Deep learning KW - one-hot encoding KW - position-specific scoring matrices KW - autocovariance N2 - Sinyal iletimi, immünolojik yanıtlar ve metabolik döngüler dahil olmak üzere çok çeşitli biyolojik süreçler, protein-protein etkileşimlerinden etkilenir. Bu etkileşimlerin, hastalıkların kökeninin anlaşılması ve tedavilerin oluşturulması açısından çok büyük etkileri vardır. Ancak protein-protein etkileşimlerini belirlemeye yönelik deneysel yöntemler yoğun kaynak gerektirir, zaman alıcıdır ve kapsamı sınırlıdır. Bu nedenle, protein tanımlamayla ilgili faaliyetlere yardımcı olmak ve bunları geliştirmek için hesaplamalı teknikler önemlidir. Bu çalışma, yalnızca dizi bilgisini kullanarak protein-protein etkileşimlerini tahmin etmek için derin öğrenme ağı oluşturmayı amaçlamaktadır. Protein dizilerini kodlamak için üç farklı kodlama yöntemi kullanılmıştır: İkili Kodlama, Otokovaryans ve Konuma Özel Puanlama Matrisi. Protein-protein etkileşimlerini tahmin etmek amacıyla, protein dizi çiftleri arasındaki karmaşık modelleri bulmak için evrişimli bir Siyam sinir ağı kullanılmıştır. Bu ağ, eşleşen parametrelere sahip iki özdeş alt ağdan oluşmaktadır. Önerilen teknik, insan veri kümesine uygulandığında, PSSM protein temsili yaklaşımını kullanan model için %84.07 doğruluk, %92.45 hassasiyet ve %91.45 kesinlik ile güçlü tahmin performansı göstermektedir. Farklı kodlama tekniklerinin aynı protein dizisinin farklı yönlerini yakaladığı bilindiğinden bu üç kodlayıcıdan gelen çıktıları birleştirmek için bir topluluk yaklaşımı önerilmektedir. Test setinde topluluk yaklaşımı için elde edilen doğruluk %86.27’ye hassasiyet ve %93.07’ye kesinlik ise %92.15’e artırılmıştır. Sonuç, tamamlayıcı özelliklerinden yararlanmak ve protein-protein etkileşimi tahmininin doğruluğunu artırmak için çeşitli kodlama yöntemlerinin entegre edilmesinin önemini vurgulamaktadır. CR - Alkhalid, FF. 2022. The effect of optimizers on siamese neural network performance. Proceedings of the International Conference on Industrial Engineering and Operations Management. Doi:10.46254/an12.20221019 CR - Altschul, SF., Madden, TL., Schäffer, AA., Zhang, J., Zhang, Z., Miller, W., Lipman, DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25(17):3389–3402. Doi:10.1093/nar/25.17.3389 CR - Angermueller, C., Pärnamaa, T., Parts, L. , Stegle, O. 2016. Deep learning for computational biology: Molecular Systems Biology, 12:878. Doi:10.15252/msb.20156651 CR - Browne, F., Zheng, H., Wang, H., Azuaje, F. 2010. From Experimental Approaches to Computational Techniques: A Review on the Prediction of Protein-Protein Interactions. Advances in Artificial Intelligence, 2010:924529. Doi:10.1155/2010/924529 CR - Chen, W., Wang, S., Song, T., Li, X., Han, P., Gao, C. 2022. DCSE:Double-Channel-Siamese-Ensemble model for protein protein interaction prediction. BMC Genomics, 23(1):555. Doi:10.1186/s12864-022-08772-6 CR - ElAbd, H., Bromberg, Y., Hoarfrost, A., Lenz, T., Franke, A., Wendorff, M. 2020. Amino acid encoding for deep learning applications. BMC Bioinformatics, 21(1):235. Doi:10.1186/s12859-020-03546-x CR - Elnaggar, A., Heinzinger, M., Dallago, C., Rihawi, G., Wang, Y., Jones, L., … Rost, B. 2021. ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Learning. Doi:10.48550/arXiv.2007.06225 CR - Fauchère, J., Charton, M., Kier, L., Verloop, A. , Pliska, V. 1988. Amino acid side chain parameters for correlation studies in biology and pharmacology. International journal of peptide and protein research, 32:269–278. Doi:10.1111/j.1399-3011.1988.tb01261.x CR - Gao, H., Chen, C., Li, S., Wang, C., Zhou, W., Yu, B. 2023. Prediction of protein-protein interactions based on ensemble residual convolutional neural network. Computers in Biology and Medicine. 152:106471. Doi:10.1016/j.compbiomed.2022.106471 CR - Gao, ZG., Wang, L., Xia, SX., You, ZH., Yan, X., Zhou, Y. 2016. Ens-PPI: A Novel Ensemble Classifier for Predicting the Interactions of Proteins Using Autocovariance Transformation from PSSM. BioMed Research International, 2016:4563524. Doi:10.1155/2016/4563524 CR - Gligorijević, V., Renfrew, PD., Kosciolek, T., Leman, JK., Berenberg, D., Vatanen, T., … Bonneau, R. 2021. Structure-based protein function prediction using graph convolutional networks. Nature Communications. 12(1):3168. Doi:10.1038/s41467-021-23303-9 CR - Guo, Y., Yu, L., Wen, Z., Li, M. 2008. Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic acids research, 36:3025–3030. Doi:10.1093/nar/gkn159 CR - Hashemifar, S., Neyshabur, B., Khan, A.A., Xu, J. 2018. Predicting protein–protein interactions through sequence-based deep learning. Bioinformatics, 34(17):i802–i810. Doi:10.1093/bioinformatics/bty573 CR - Jia, L-N., Yan, X., You, Z-H., Zhou, X., Li, L-P., Wang, L., Song, K-J. 2020. NLPEI: A Novel Self-Interacting Protein Prediction Model Based on Natural Language Processing and Evolutionary Information. Evolutionary Bioinformatics, 16:1176934320984171. Doi:10.1177/1176934320984171 CR - Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., … Hassabis, D. 2021. Highly accurate protein structure prediction with AlphaFold. Nature. 596(7873):583–589. Doi:10.1038/s41586-021-03819-2 CR - Li, J., Chen, Y. 2013. Auto Covariance Combined with Artificial Neural Network for Predicting Protein-Protein Interactions, V. 765–767. Doi:10.2991/icsem.2013.153 CR - Madan, S., Demina, V., Stapf, M., Ernst, O., Fröhlich, H. 2022. Accurate prediction of virus-host protein-protein interactions via a Siamese neural network using deep protein sequence embeddings, Patterns. 3(9):100551. Doi:10.1016/j.patter.2022.100551 CR - Nevers, Y., Glover, NM., Dessimoz, C., Lecompte, O. 2023. Protein length distribution is remarkably uniform across the tree of life. Genome Biology, 24(1). Doi:10.1186/s13059-023-02973-2 CR - Nourani, E., Asgari, E., McHardy, A.C., Mofrad, M.R.K. 2022. TripletProt: Deep Representation Learning of Proteins Based On Siamese Networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(6):3744–3753. Doi:10.1109/TCBB.2021.3108718 CR - Özger, Z.B., Çakabay, Z. 2023. Computational Prediction of Interactions Between SARS-CoV-2 and Human Protein Pairs by PSSM-Based Images. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 12(1):166–179. Doi:10.17798/bitlisfen.1220301 CR - Poplin, R., Chang, P-C., Alexander, D., Schwartz, S., Colthurst, T., Ku, A., … DePristo, M.A. 2018. A universal SNP and small-indel variant caller using deep neural networks. Nature Biotechnology, 36(10):983–987. Doi:10.1038/nbt.4235 CR - Richoux, F., Servantie, C., Borès, C., Téletchéa, S. 2019. Comparing two deep learning sequence-based models for protein-protein interaction prediction. Doi:10.48550/arXiv.1901.06268 CR - Shen, J., Zhang, J., Luo, X., Zhu, W., Yu, K., Chen, K., Li, Y., Jiang, H. 2007. Predicting protein–protein interactions based only on sequences information. Proceedings of the National Academy of Sciences, 104(11):4337–4341. Doi:10.1073/pnas.0607879104 CR - Sun, T., Zhou, B., Lai, L., Pei, J. 2017. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinformatics, 18(1):277. Doi:10.1186/s12859-017-1700-2 CR - Tong, J., Tammi, M. 2008. Prediction of protein allergenicity using local description of amino acid sequence. Frontiers in bioscience : a journal and virtual library, 13:6072–6078. Doi:10.2741/3138 CR - Trieu, T., Martinez-Fundichely, A., Khurana, E. 2020. DeepMILO: a deep learning approach to predict the impact of non-coding sequence variants on 3D chromatin structure. Genome Biology, 21(1):79. Doi:10.1186/s13059-020-01987-4 CR - Wang, L., Yo, Z-H., Xia, S-X., Liu, F., Chen, X., Yan, X., Zhou, Y. 2017. Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier. Journal of Theoretical Biology, 418:105–110. Doi:10.1016/j.jtbi.2017.01.003 CR - Wang, X., Wang, R., Wei, Y., Gui, Y. 2019. A novel conjoint triad auto covariance (CTAC) coding method for predicting protein-protein interaction based on amino acid sequence. Mathematical Biosciences, 313:41–47. Doi:10.1016/j.mbs.2019.04.002 CR - Wold, S., Jonsson, J., Sjörström, M., Sandberg, M., Rännar, S. 1993. DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures. Analytica Chimica Acta, 277:239–253. Doi:10.1016/0003-2670(93)80437-P CR - Yang, X., Zhang, Z., Wuchty, S. 2021. Multi-scale convolutional neural networks for the prediction of human-virus protein interactions. In: ICAART 2021 - Proceedings of the 13th International Conference on Agents and Artificial Intelligence,. V. 2. SciTePress, 41–48. Doi:10.5220/0010185300410048 CR - You, Z-H., Lei, Y-K., Zhu, L., Xia, J., Wang, B. 2013. Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinformatics, 14(8):S10. Doi:10.1186/1471-2105-14-S8-S10 CR - You, Z-H., Chan, K., Hu, P. 2015. Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest. PloS one, 10:e0125811. Doi:10.1371/journal.pone.0125811 CR - Zahiri, J., Yaghoubi, O., Mohammad-Noori, M., Ebrahimpour, R., Masoudi-Nejad, A. 2013. PPIevo: Protein–protein interaction prediction from PSSM based evolutionary information. Genomics, 102(4):237–242. Doi:10.1016/j.ygeno.2013.05.006 CR - Zhang, L., Yu, G., Xia, D., Wang, J. 2019. Protein–protein interactions prediction based on ensemble deep neural networks. Neurocomputing, 324:10–19. Doi:10.1016/j.neucom.2018.02.097 CR - Zhu, H-J., You, Z-H., Shi, W-L., Xu, S-K., Jiang, T-H., Zhuang, L-H. 2019. Improved Prediction of Protein-Protein Interactions Using Descriptors Derived From PSSM via Gray Level Co-Occurrence Matrix. IEEE Access, 7:49456–49465. Doi:10.1109/ACCESS.2019.2907132 UR - https://doi.org/10.7212/karaelmasfen.1415248 L1 - https://dergipark.org.tr/tr/download/article-file/3641716 ER -