Research Article
BibTex RIS Cite

Drug Solubility Prediction: A Comparative Analysis of GNN, MLP, and Traditional Machine Learning Algorithms

Year 2024, Volume: 12 Issue: 1, 164 - 175, 25.03.2024
https://doi.org/10.29109/gujsc.1371519

Abstract

The effective development and design of pharmaceuticals hold fundamental importance in the fields of medicine and the pharmaceutical industry. In this process, the accurate prediction of drug molecule solubility is a critical factor influencing the bioavailability, pharmacokinetics, and toxicity of drugs. Traditionally, mathematical equations based on chemical and physical properties have been used for drug solubility prediction. However, in recent years, with the advancement of artificial intelligence and machine learning techniques, new approaches have been developed in this field. This study evaluated different modeling approaches consisting of Graph Neural Networks (GNN), Multilayer Perceptron (MLP), and traditional Machine Learning (ML) algorithms. The Random Forest (RF) model stands out as the optimal performer, manifesting superior efficacy through the attainment of minimal error rates. It attains a Root Mean Square Error (RMSE) value of 1.2145, a Mean Absolute Error (MAE) value of 0.9221, and an R-squared (R2) value of 0.6575. In contrast, GNN model displays comparatively suboptimal performance, as evidenced by an RMSE value of 1.8389, an MAE value of 1.4684, and an R2 value of 0.2147. These values suggest that the predictions of this model contain higher errors compared to other models, and its explanatory power is lower. These findings highlight the performance differences among different modeling approaches in drug solubility prediction. The RF model is shown to be more effective than other methods, while the GNN model performs less effectively. This information provides valuable insights into which model should be preferred in pharmaceutical design and development processes.

Ethical Statement

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Supporting Institution

This study was not supported by any funding organisation.

References

  • [1] Prieto-Martínez, F. D., López-López, E., Juárez-Mercado, K. E., & Medina-Franco, J. L. (2019). Computational drug design methods—current and future perspectives. In silico drug design, 19-44.
  • [2] Barrett, Jaclyn A., et al. "Discovery solubility measurement and assessment of small molecules with drug development in mind." Drug Discovery Today 27.5 (2022): 1315-1325.
  • [3] Vora, Lalitkumar K., et al. "Artificial Intelligence in Pharmaceutical Technology and Drug Delivery Design." Pharmaceutics 15.7 (2023): 1916.
  • [4] Budak, Cafer, Vasfiye Mençik, and Veysel Gider. "Determining similarities of COVID-19–lung cancer drugs and affinity binding mode analysis by graph neural network-based GEFA method." Journal of Biomolecular Structure and Dynamics 41.2 (2023): 659-671.
  • [5] Gider, Veysel, and Cafer Budak. "Instruction of molecular structure similarity and scaffolds of drugs under investigation in ebola virus treatment by atom-pair and graph network: A combination of favipiravir and molnupiravir." Computational biology and chemistry 101 (2022): 107778.
  • [6] Gardner, Matt W., and S. R. Dorling. "Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences." Atmospheric environment 32.14-15 (1998): 2627-2636.
  • [7] Hu, Pingfan, et al. "Development of solubility prediction models with ensemble learning." Industrial & Engineering Chemistry Research 60.30 (2021): 11627-11635.
  • [8] Selvaraj, Chandrabose, Ishwar Chandra, and Sanjeev Kumar Singh. "Artificial intelligence and machine learning approaches for drug design: challenges and opportunities for the pharmaceutical industries." Molecular diversity (2021): 1-21.
  • [9] Kherouf, Soumaya, et al. "Modeling of linear and nonlinear quantitative structure property relationships of the aqueous solubility of phenol derivatives." Journal of the Serbian Chemical Society 84.6 (2019): 575-590.
  • [10] Eros, Daniel, et al. "Comparison of predictive ability of water solubility QSPR models generated by MLR, PLS and ANN methods." Mini Reviews in Medicinal Chemistry 4.2 (2004): 167-177.
  • [11] Sinha, Priyanka, et al. "Integrating Machine Learning and Molecular Simulation for Material Design and Discovery." Transactions of the Indian National Academy of Engineering 8.3 (2023): 325-340.
  • [12] Reiser, Patrick, et al. "Graph neural networks for materials science and chemistry." Communications Materials 3.1 (2022): 93.
  • [13] Qin, Yongfei, et al. "MLP-based regression prediction model for compound bioactivity." Frontiers in Bioengineering and Biotechnology 10 (2022): 946329.
  • [14] Ahmad, Waqar, Hilal Tayara, and Kil To Chong. "Attention-Based Graph Neural Network for Molecular Solubility Prediction." ACS omega 8.3 (2023): 3236-3244.
  • [15] Lee, Sangho, et al. "Multi-order graph attention network for water solubility prediction and interpretation." Scientific Reports 13.1 (2023): 957.
  • [16] Hamdi, Mohammad Erfan, et al. "Prediction of Aqueous Solubility of Drug Molecules by Embedding Spatial Conformers Using Graph Neural Networks." 2022 29th National and 7th International Iranian Conference on Biomedical Engineering (ICBME). IEEE, 2022.
  • [17] Ge, Kai, and Yuanhui Ji. "Novel computational approach by combining machine learning with molecular thermodynamics for predicting drug solubility in solvents." Industrial & Engineering Chemistry Research 60.25 (2021): 9259-9268.
  • [18] Alzhrani, Rami M., Atiah H. Almalki, and Sameer Alshehri. "Novel numerical simulation of drug solubility in supercritical CO2 using machine learning technique: Lenalidomide case study." Arabian Journal of Chemistry 15.11 (2022): 104180.
  • [19] Sadeghi, Arash, et al. "Machine learning simulation of pharmaceutical solubility in supercritical carbon dioxide: Prediction and experimental validation for busulfan drug." Arabian Journal of Chemistry 15.1 (2022): 103502.
  • [20] Meng, Di, and Zhenyu Liu. "Machine learning aided pharmaceutical engineering: Model development and validation for estimation of drug solubility in green solvent." Journal of Molecular Liquids 392 (2023): 123286.
  • [21] Li, Mengshan, et al. "Prediction of the aqueous solubility of compounds based on light gradient boosting machines with molecular fingerprints and the cuckoo search algorithm." ACS omega 7.46 (2022): 42027-42035.
  • [22] Sadybekov, Anastasiia V., and Vsevolod Katritch. "Computational approaches streamlining drug discovery." Nature 616.7958 (2023): 673-685.
  • [23] KAGGLE, Online (2023). https://www.kaggle.com/code/mmelahi/physical-chemistry-esol/input Access: 02.09.2023.
  • [24] Gong, Weiyi, and Qimin Yan. "Graph-based deep learning frameworks for molecules and solid-state materials." Computational Materials Science 195 (2021): 110332.
  • [25] Liu, Yanli, Yourong Wang, and Jian Zhang. "New machine learning algorithm: Random forest." Information Computing and Applications: Third International Conference, ICICA 2012, Chengde, China, September 14-16, 2012. Proceedings 3. Springer Berlin Heidelberg, 2012.
  • [26] Friedman, Jerome H. "Greedy function approximation: a gradient boosting machine." Annals of statistics (2001): 1189-1232.
  • [27] Bentéjac, Candice, Anna Csörgő, and Gonzalo Martínez-Muñoz. "A comparative analysis of gradient boosting algorithms." Artificial Intelligence Review 54 (2021): 1937-1967.

İlaç Çözünürlüğü Tahmini: GNN, MLP ve Geleneksel Makine Öğrenimi Algoritmalarının Karşılaştırmalı Analizi

Year 2024, Volume: 12 Issue: 1, 164 - 175, 25.03.2024
https://doi.org/10.29109/gujsc.1371519

Abstract

İlaçların etkin bir şekilde geliştirilmesi ve tasarlanması, tıp ve ilaç endüstrisi alanlarında temel öneme sahiptir. Bu süreçte, ilaç molekülünün çözünürlüğünün doğru bir şekilde tahmin edilmesi, ilaçların biyoyararlanımını, farmakokinetiğini ve toksisitesini etkileyen kritik bir faktördür. Geleneksel olarak, ilaç çözünürlüğü tahmini için kimyasal ve fiziksel özelliklere dayalı matematiksel denklemler kullanılmıştır. Ancak son yıllarda yapay zekâ ve makine öğrenimi tekniklerinin ilerlemesiyle bu alanda yeni yaklaşımlar geliştirilmiştir. Bu çalışmada, Grafik Sinir Ağları (GNN), Çok Katmanlı Algılayıcı (MLP) ve geleneksel Makine Öğrenmesi (ML) algoritmalarından oluşan farklı modelleme yaklaşımları değerlendirilmiştir. Rastgele Orman (RF) modeli, minimum hata oranlarına ulaşarak üstün etkinlik gösteren en iyi performans gösteren model olarak öne çıkmaktadır. Kök Ortalama Kare Hata (RMSE) değeri 1,2145, Ortalama Mutlak Hata (MAE) değeri 0,9221 ve R-kare (R2) değeri 0,6575'tir. Buna karşılık GNN modeli, 1,8389 RMSE değeri, 1,4684 MAE değeri ve 0,2147 R2 değeri ile kanıtlandığı üzere nispeten düşük bir performans sergilemektedir. Bu değerler, bu modelin tahminlerinin diğer modellere kıyasla daha yüksek hata içerdiğini ve açıklayıcı gücünün daha düşük olduğunu göstermektedir. Bu bulgular, ilaç çözünürlüğü tahmininde farklı modelleme yaklaşımları arasındaki performans farklılıklarını vurgulamaktadır. RF modelinin diğer yöntemlere göre daha etkili olduğu, GNN modelinin ise daha az etkili performans gösterdiği görülmektedir. Bu bilgi, farmasötik tasarım ve geliştirme süreçlerinde hangi modelin tercih edilmesi gerektiği konusunda değerli bilgiler sağlamaktadır.

References

  • [1] Prieto-Martínez, F. D., López-López, E., Juárez-Mercado, K. E., & Medina-Franco, J. L. (2019). Computational drug design methods—current and future perspectives. In silico drug design, 19-44.
  • [2] Barrett, Jaclyn A., et al. "Discovery solubility measurement and assessment of small molecules with drug development in mind." Drug Discovery Today 27.5 (2022): 1315-1325.
  • [3] Vora, Lalitkumar K., et al. "Artificial Intelligence in Pharmaceutical Technology and Drug Delivery Design." Pharmaceutics 15.7 (2023): 1916.
  • [4] Budak, Cafer, Vasfiye Mençik, and Veysel Gider. "Determining similarities of COVID-19–lung cancer drugs and affinity binding mode analysis by graph neural network-based GEFA method." Journal of Biomolecular Structure and Dynamics 41.2 (2023): 659-671.
  • [5] Gider, Veysel, and Cafer Budak. "Instruction of molecular structure similarity and scaffolds of drugs under investigation in ebola virus treatment by atom-pair and graph network: A combination of favipiravir and molnupiravir." Computational biology and chemistry 101 (2022): 107778.
  • [6] Gardner, Matt W., and S. R. Dorling. "Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences." Atmospheric environment 32.14-15 (1998): 2627-2636.
  • [7] Hu, Pingfan, et al. "Development of solubility prediction models with ensemble learning." Industrial & Engineering Chemistry Research 60.30 (2021): 11627-11635.
  • [8] Selvaraj, Chandrabose, Ishwar Chandra, and Sanjeev Kumar Singh. "Artificial intelligence and machine learning approaches for drug design: challenges and opportunities for the pharmaceutical industries." Molecular diversity (2021): 1-21.
  • [9] Kherouf, Soumaya, et al. "Modeling of linear and nonlinear quantitative structure property relationships of the aqueous solubility of phenol derivatives." Journal of the Serbian Chemical Society 84.6 (2019): 575-590.
  • [10] Eros, Daniel, et al. "Comparison of predictive ability of water solubility QSPR models generated by MLR, PLS and ANN methods." Mini Reviews in Medicinal Chemistry 4.2 (2004): 167-177.
  • [11] Sinha, Priyanka, et al. "Integrating Machine Learning and Molecular Simulation for Material Design and Discovery." Transactions of the Indian National Academy of Engineering 8.3 (2023): 325-340.
  • [12] Reiser, Patrick, et al. "Graph neural networks for materials science and chemistry." Communications Materials 3.1 (2022): 93.
  • [13] Qin, Yongfei, et al. "MLP-based regression prediction model for compound bioactivity." Frontiers in Bioengineering and Biotechnology 10 (2022): 946329.
  • [14] Ahmad, Waqar, Hilal Tayara, and Kil To Chong. "Attention-Based Graph Neural Network for Molecular Solubility Prediction." ACS omega 8.3 (2023): 3236-3244.
  • [15] Lee, Sangho, et al. "Multi-order graph attention network for water solubility prediction and interpretation." Scientific Reports 13.1 (2023): 957.
  • [16] Hamdi, Mohammad Erfan, et al. "Prediction of Aqueous Solubility of Drug Molecules by Embedding Spatial Conformers Using Graph Neural Networks." 2022 29th National and 7th International Iranian Conference on Biomedical Engineering (ICBME). IEEE, 2022.
  • [17] Ge, Kai, and Yuanhui Ji. "Novel computational approach by combining machine learning with molecular thermodynamics for predicting drug solubility in solvents." Industrial & Engineering Chemistry Research 60.25 (2021): 9259-9268.
  • [18] Alzhrani, Rami M., Atiah H. Almalki, and Sameer Alshehri. "Novel numerical simulation of drug solubility in supercritical CO2 using machine learning technique: Lenalidomide case study." Arabian Journal of Chemistry 15.11 (2022): 104180.
  • [19] Sadeghi, Arash, et al. "Machine learning simulation of pharmaceutical solubility in supercritical carbon dioxide: Prediction and experimental validation for busulfan drug." Arabian Journal of Chemistry 15.1 (2022): 103502.
  • [20] Meng, Di, and Zhenyu Liu. "Machine learning aided pharmaceutical engineering: Model development and validation for estimation of drug solubility in green solvent." Journal of Molecular Liquids 392 (2023): 123286.
  • [21] Li, Mengshan, et al. "Prediction of the aqueous solubility of compounds based on light gradient boosting machines with molecular fingerprints and the cuckoo search algorithm." ACS omega 7.46 (2022): 42027-42035.
  • [22] Sadybekov, Anastasiia V., and Vsevolod Katritch. "Computational approaches streamlining drug discovery." Nature 616.7958 (2023): 673-685.
  • [23] KAGGLE, Online (2023). https://www.kaggle.com/code/mmelahi/physical-chemistry-esol/input Access: 02.09.2023.
  • [24] Gong, Weiyi, and Qimin Yan. "Graph-based deep learning frameworks for molecules and solid-state materials." Computational Materials Science 195 (2021): 110332.
  • [25] Liu, Yanli, Yourong Wang, and Jian Zhang. "New machine learning algorithm: Random forest." Information Computing and Applications: Third International Conference, ICICA 2012, Chengde, China, September 14-16, 2012. Proceedings 3. Springer Berlin Heidelberg, 2012.
  • [26] Friedman, Jerome H. "Greedy function approximation: a gradient boosting machine." Annals of statistics (2001): 1189-1232.
  • [27] Bentéjac, Candice, Anna Csörgő, and Gonzalo Martínez-Muñoz. "A comparative analysis of gradient boosting algorithms." Artificial Intelligence Review 54 (2021): 1937-1967.
There are 27 citations in total.

Details

Primary Language English
Subjects Circuits and Systems, Electrical Engineering (Other), Chemical Engineering (Other)
Journal Section Tasarım ve Teknoloji
Authors

Veysel Gider 0000-0001-7538-262X

Cafer Budak 0000-0002-8470-4579

Early Pub Date March 5, 2024
Publication Date March 25, 2024
Submission Date October 5, 2023
Published in Issue Year 2024 Volume: 12 Issue: 1

Cite

APA Gider, V., & Budak, C. (2024). Drug Solubility Prediction: A Comparative Analysis of GNN, MLP, and Traditional Machine Learning Algorithms. Gazi University Journal of Science Part C: Design and Technology, 12(1), 164-175. https://doi.org/10.29109/gujsc.1371519

                                TRINDEX     16167        16166    21432    logo.png

      

    e-ISSN:2147-9526