In-Silico Mutajenisite Tahmininde İstatistiksel Öğrenme Modeli

Enis Gümüştaş; Ayça Çakmak Pehlivanlı

doi:10.19113/sdufenbed.867067

Research Article

Statistical Learning Model for In-Silico Mutagenicity Prediction

Year 2021, Volume: 25 Issue: 2, 365 - 370, 20.08.2021

Enis Gümüştaş , Ayça Çakmak Pehlivanlı

https://doi.org/10.19113/sdufenbed.867067

Cited By: 1

Abstract

Among the toxicity tests, mutagenicity defined as a genetic change that can occur due to an agent, has an important place. In this study, statistical learning algorithms were used within the scope of in-silico approach in order to improve the mutagenicity determination process in general. This approach has been applied to the set of molecules containing mutagenicity information obtained by experiments and remarkable classification success were achieved. In order to use in this study, Bursi and Benchmark data sets consisting of molecules found in the literature were combined and the properties of molecules were calculated by means of the Molecular Operating Environment (MOE). As a result of the calculation, decision trees algorithms were applied on the data set with 10835 molecules and 193 variables and parameter selection was performed with grid search approach. The selection of variables was made according to their level of importance in predicting mutagenicity as a result of models established with the best parameters obtained, and the number of descriptors variables was reduced to the 72 most effective descriptor variables. Various statistical learning algorithms were applied to the reduced data set consisting of the selected variables, and five classification algorithms with the best results were decided. By the algorithms whose model performances were increased by means of parameter optimization, accurate prediction rates were obtained approximately 90% for mutagenicity classification.

Keywords

Classification , Ensemble Learning , XGBoost , LightGBM , Feature Selection , Toxicity

Project Number

2018-30

References

[1] Honma, M., Kitazawa, A., Cayley, A., Williams, R. V., Barber, C., Hanser, T., Saiakhov, R., Chakravarti, S., Myatt, G. J., Cross, K. P., Benfenati, E., Raitano, G., Mekenyan, O., Petkov, P., Bossa, C., Benigni, R., Battistelli, C. L., Giuliani, A., Tcheremenskaia, O., … Rathman, J. 2019. Improvement of quantitative structure-activity relationship (QSAR) tools for predicting Ames mutagenicity: Outcomes of the Ames/QSAR International Challenge Project. Mutagenesis, 34(1) 41-48.
[2] Bakhtyari, N. G., Raitano, G., Benfenati, E., Martin, T., Young, D. 2013. Comparison of in silico models for prediction of mutagenicity. Journal of Environmental Science and Health - Part C Env. Carcinogenesis and Ecotoxicology Reviews, 31(1), 45–66.
[3] Hansch, C. 1980. Use of quantitative structure-activity relationships (QSAR) in drug design (review). In Pharmaceutical Chemistry Journal 14(10).
[4] Greene, N., Judson, P. N., Langowski, J. J., Marchant, C. A. 1999. Knowledge-based expert systems for toxicity and metabolism prediction: DEREK, StAR and METEOR. SAR and QSAR in Environmental Research, 10:2-3, 299-314.
[5] Hanser, T., Barber, C., Rosser, E., Vessey, J. D., Webb, S. J., Werner, S. 2014. Self organising hypothesis networks: A new approach for representing and structuring SAR knowledge. Journal of Cheminformatics, 6(21).
[6] Mazzatorta, P., Tran, L. A., Schilter, B., Grigorov, M. 2007. Integration of structure - Activity relationship and artificial intelligence systems to improve in silico prediction of ames test mutagenicity. Journal of Chemical Information and Modeling, 47(1), 34–38.
[7] Zheng, M., Liu, Z., Xue, C., Zhu, W., Chen, K., Luo, X., Jiang, H. 2006. Mutagenic probability estimation of chemical compounds by a novel molecular electrophilicity vector and support vector machine. Bioinformatics, 22(17), 2099–2106.
[8] Liao, Q., Yao, J., & Yuan, S. 2007. Prediction of mutagenic toxicity by combination of Recursive Partitioning and Support Vector Machines. Molecular Diversity, 11, 59–72.
[9] Xu, C., Cheng, F., Chen, L., Du, Z., Li, W., Liu, G., Lee, P. W., Tang, Y. 2012. In silico prediction of chemical ames mutagenicity. Journal of Chemical Information and Modeling, 52(11), 2840–2847.
[10] Moorthy, N. H. N., Kumar, S., Poongavanam, V. 2017. Classification of carcinogenic and mutagenic properties using machine learning method. Computational Toxicology, 3, 33-43.
[11] Zhang, H., Kang, Y. L., Zhu, Y. Y., Zhao, K. X., Liang, J. Y., Ding, L., ... Zhang, J. 2017. Novel naïve Bayes classification models for predicting the chemical Ames mutagenicity. Toxicology in Vitro, 41, 56-63.
[12] Webb, S. J., Hanser, T., Howlin, B., Krause, P., Vessey, J. D. 2014. Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity. Journal of cheminformatics, 6(1), 1-21.
[13] Seal, A., Passi, A., Jaleel, U. A., Wild, D. J., Open Source Drug Discovery Consortium. 2012. In-silico predictive mutagenicity model generation using supervised learning approaches. Journal of cheminformatics, 4(1), 10.
[14] Ji, X., Tong, W., Liu, Z., Shi, T. 2019. Five-feature Model for Developing the Classifier for Synergistic vs Antagonistic Drug Combinations Built by XGBoost. Frontiers in Genetics, 10, 1-13.
[15] Hansen, K., Mika, S., Schroeter, T., Sutter, A., Ter Laak, A., Steger-Hartmann, T., ... Müller, K. R. 2009. Benchmark data set for in silico prediction of Ames mutagenicity. Journal of chemical information and modeling, 49(9), 2077-2081.
[16] Kazius, J., McGuire, R., Bursi, R. 2005. Derivation and validation of toxicophores for mutagenicity prediction. Journal of medicinal chemistry, 48(1), 312-320.
[17] MOE, Molecular Operational Environment. Chemical Computing Group Inc., Montreal, Canada.
[18] Breiman, L., 2021. Random forests. Maching Learning, 45(1), 5–32.
[19] Geurts, P., Ernst, D., Wehenkel, L. 2006. Extremely randomized trees. Machine learning, 63(1), 3-42.
[20] Breiman, L. 1996. Bagging predictors. Machine learning, 24(2), 123-140.
[21] Freund, Y., Schapire, R. E. 1996. Experiments with a new boosting algorithm. Machine Learning: Proceedings of the Thirteenth International Conference, July 1996, Italy 148-156.
[22] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... Liu, T. Y. 2017. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, Dec 4-9, Long Beach, CA 3146-3154.
[23] Chen, T., Guestrin, C. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2016, San Fransisco, California, 785-794.

In-Silico Mutajenisite Tahmininde İstatistiksel Öğrenme Modeli

Year 2021, Volume: 25 Issue: 2, 365 - 370, 20.08.2021

Enis Gümüştaş , Ayça Çakmak Pehlivanlı

https://doi.org/10.19113/sdufenbed.867067

Cited By: 1

Abstract

Toksisite testleri arasında, bir etken nedeniyle ortaya çıkabilecek genetik değişim (mutasyon) olarak tanımlanabilen mutajenisite önemli yer tutmaktadır. Bu çalışmada genel olarak mutajenisite belirleme sürecini iyileştirebilmek adına in-silico yaklaşım kapsamında istatistiksel öğrenme algoritmaları kullanılmıştır. Söz konusu yaklaşım deneyler ile elde edilen mutajenisite bilgisi içeren molekül setine uygulanmış ve dikkate değer sınıflama başarıları elde edilmiştir. Çalışmada kullanılmak üzere literatürde bulunan, moleküllerden oluşan Bursi ile Benchmark veri setleri birleştirilmiş ve Molecular Operating Environment (MOE) programı aracılığı ile moleküllerin özellikleri hesaplanmıştır. Hesaplama sonucunda 10835 gözleme ve 193 değişkene sahip veri seti üzerinde karar ağaçları algoritmaları uygulanarak grid arama yaklaşımı ile parametre seçimi gerçekleştirilmiştir. Elde edilen en iyi parametreler ile kurulan modeller sonucunda değişkenlerin seçimi mutajenisiteyi tahmin etmedeki önem düzeylerine göre yapılmış ve verinin boyutu en etkili 72 değişkene indirgenmiştir. Seçilen değişkenlerden oluşan yeni veriye farklı istatistiksel öğrenme algoritmaları uygulanmış ve içlerinden en iyi sonuç veren beş sınıflama algoritmasına karar verilmiştir. Parametre en iyilemesi ile model başarımları arttırılan bu algoritmalar kullanılarak yaklaşık %90 mutajenisiteyi doğru sınıflama oranları elde edilmiştir.

Keywords

Sınıflama , Topluluk Öğrenmesi , XGBoost , LightGBM , Değişken Seçimi , Toksisite

Supporting Institution

Mimar Sinan Güzel Sanatlar Üniversitesi

Project Number

2018-30

Thanks

Bu çalışma, Mimar Sinan Güzel Sanatlar Üniversitesi Bilimsel Araştırma Projeleri Komisyonu tarafından desteklenmiştir. (Proje No: 2018-30).

References

[1] Honma, M., Kitazawa, A., Cayley, A., Williams, R. V., Barber, C., Hanser, T., Saiakhov, R., Chakravarti, S., Myatt, G. J., Cross, K. P., Benfenati, E., Raitano, G., Mekenyan, O., Petkov, P., Bossa, C., Benigni, R., Battistelli, C. L., Giuliani, A., Tcheremenskaia, O., … Rathman, J. 2019. Improvement of quantitative structure-activity relationship (QSAR) tools for predicting Ames mutagenicity: Outcomes of the Ames/QSAR International Challenge Project. Mutagenesis, 34(1) 41-48.
[2] Bakhtyari, N. G., Raitano, G., Benfenati, E., Martin, T., Young, D. 2013. Comparison of in silico models for prediction of mutagenicity. Journal of Environmental Science and Health - Part C Env. Carcinogenesis and Ecotoxicology Reviews, 31(1), 45–66.
[3] Hansch, C. 1980. Use of quantitative structure-activity relationships (QSAR) in drug design (review). In Pharmaceutical Chemistry Journal 14(10).
[4] Greene, N., Judson, P. N., Langowski, J. J., Marchant, C. A. 1999. Knowledge-based expert systems for toxicity and metabolism prediction: DEREK, StAR and METEOR. SAR and QSAR in Environmental Research, 10:2-3, 299-314.
[5] Hanser, T., Barber, C., Rosser, E., Vessey, J. D., Webb, S. J., Werner, S. 2014. Self organising hypothesis networks: A new approach for representing and structuring SAR knowledge. Journal of Cheminformatics, 6(21).
[6] Mazzatorta, P., Tran, L. A., Schilter, B., Grigorov, M. 2007. Integration of structure - Activity relationship and artificial intelligence systems to improve in silico prediction of ames test mutagenicity. Journal of Chemical Information and Modeling, 47(1), 34–38.
[7] Zheng, M., Liu, Z., Xue, C., Zhu, W., Chen, K., Luo, X., Jiang, H. 2006. Mutagenic probability estimation of chemical compounds by a novel molecular electrophilicity vector and support vector machine. Bioinformatics, 22(17), 2099–2106.
[8] Liao, Q., Yao, J., & Yuan, S. 2007. Prediction of mutagenic toxicity by combination of Recursive Partitioning and Support Vector Machines. Molecular Diversity, 11, 59–72.
[9] Xu, C., Cheng, F., Chen, L., Du, Z., Li, W., Liu, G., Lee, P. W., Tang, Y. 2012. In silico prediction of chemical ames mutagenicity. Journal of Chemical Information and Modeling, 52(11), 2840–2847.
[10] Moorthy, N. H. N., Kumar, S., Poongavanam, V. 2017. Classification of carcinogenic and mutagenic properties using machine learning method. Computational Toxicology, 3, 33-43.
[11] Zhang, H., Kang, Y. L., Zhu, Y. Y., Zhao, K. X., Liang, J. Y., Ding, L., ... Zhang, J. 2017. Novel naïve Bayes classification models for predicting the chemical Ames mutagenicity. Toxicology in Vitro, 41, 56-63.
[12] Webb, S. J., Hanser, T., Howlin, B., Krause, P., Vessey, J. D. 2014. Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity. Journal of cheminformatics, 6(1), 1-21.
[13] Seal, A., Passi, A., Jaleel, U. A., Wild, D. J., Open Source Drug Discovery Consortium. 2012. In-silico predictive mutagenicity model generation using supervised learning approaches. Journal of cheminformatics, 4(1), 10.
[14] Ji, X., Tong, W., Liu, Z., Shi, T. 2019. Five-feature Model for Developing the Classifier for Synergistic vs Antagonistic Drug Combinations Built by XGBoost. Frontiers in Genetics, 10, 1-13.
[15] Hansen, K., Mika, S., Schroeter, T., Sutter, A., Ter Laak, A., Steger-Hartmann, T., ... Müller, K. R. 2009. Benchmark data set for in silico prediction of Ames mutagenicity. Journal of chemical information and modeling, 49(9), 2077-2081.
[16] Kazius, J., McGuire, R., Bursi, R. 2005. Derivation and validation of toxicophores for mutagenicity prediction. Journal of medicinal chemistry, 48(1), 312-320.
[17] MOE, Molecular Operational Environment. Chemical Computing Group Inc., Montreal, Canada.
[18] Breiman, L., 2021. Random forests. Maching Learning, 45(1), 5–32.
[19] Geurts, P., Ernst, D., Wehenkel, L. 2006. Extremely randomized trees. Machine learning, 63(1), 3-42.
[20] Breiman, L. 1996. Bagging predictors. Machine learning, 24(2), 123-140.
[21] Freund, Y., Schapire, R. E. 1996. Experiments with a new boosting algorithm. Machine Learning: Proceedings of the Thirteenth International Conference, July 1996, Italy 148-156.
[22] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... Liu, T. Y. 2017. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, Dec 4-9, Long Beach, CA 3146-3154.
[23] Chen, T., Guestrin, C. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2016, San Fransisco, California, 785-794.

There are 23 citations in total.

Details

Primary Language	Turkish
Subjects	Engineering
Journal Section	Articles
Authors	Enis Gümüştaş 0000-0003-0220-4544 Ayça Çakmak Pehlivanlı 0000-0001-9884-6538
Project Number	2018-30
Publication Date	August 20, 2021
Published in Issue	Year 2021 Volume: 25 Issue: 2

Cite

APA	Gümüştaş, E., & Çakmak Pehlivanlı, A. (2021). In-Silico Mutajenisite Tahmininde İstatistiksel Öğrenme Modeli. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 25(2), 365-370. https://doi.org/10.19113/sdufenbed.867067
AMA	Gümüştaş E, Çakmak Pehlivanlı A. In-Silico Mutajenisite Tahmininde İstatistiksel Öğrenme Modeli. J. Nat. Appl. Sci. August 2021;25(2):365-370. doi:10.19113/sdufenbed.867067
Chicago	Gümüştaş, Enis, and Ayça Çakmak Pehlivanlı. “In-Silico Mutajenisite Tahmininde İstatistiksel Öğrenme Modeli”. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 25, no. 2 (August 2021): 365-70. https://doi.org/10.19113/sdufenbed.867067.
EndNote	Gümüştaş E, Çakmak Pehlivanlı A (August 1, 2021) In-Silico Mutajenisite Tahmininde İstatistiksel Öğrenme Modeli. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 25 2 365–370.
IEEE	E. Gümüştaş and A. Çakmak Pehlivanlı, “In-Silico Mutajenisite Tahmininde İstatistiksel Öğrenme Modeli”, J. Nat. Appl. Sci., vol. 25, no. 2, pp. 365–370, 2021, doi: 10.19113/sdufenbed.867067.
ISNAD	Gümüştaş, Enis - Çakmak Pehlivanlı, Ayça. “In-Silico Mutajenisite Tahmininde İstatistiksel Öğrenme Modeli”. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 25/2 (August2021), 365-370. https://doi.org/10.19113/sdufenbed.867067.
JAMA	Gümüştaş E, Çakmak Pehlivanlı A. In-Silico Mutajenisite Tahmininde İstatistiksel Öğrenme Modeli. J. Nat. Appl. Sci. 2021;25:365–370.
MLA	Gümüştaş, Enis and Ayça Çakmak Pehlivanlı. “In-Silico Mutajenisite Tahmininde İstatistiksel Öğrenme Modeli”. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 25, no. 2, 2021, pp. 365-70, doi:10.19113/sdufenbed.867067.
Vancouver	Gümüştaş E, Çakmak Pehlivanlı A. In-Silico Mutajenisite Tahmininde İstatistiksel Öğrenme Modeli. J. Nat. Appl. Sci. 2021;25(2):365-70.

Cited By

Dengesiz Sınıf Dağılımında Kayıp Gözlem Sorunu için Topluluk Öğrenmesi Sonuçlarının İstatistiksel Değerlendirmesi

Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi

https://doi.org/10.19113/sdufenbed.1090596

Download Cover Image

Article Files

Full Text

e-ISSN :1308-6529
Linking ISSN (ISSN-L): 1300-7688

All published articles in the journal can be accessed free of charge and are open access under the Creative Commons CC BY-NC (Attribution-NonCommercial) license. All authors and other journal users are deemed to have accepted this situation. Click here to access detailed information about the CC BY-NC license.