Improving Drug-Target Interaction Prediction Using SVM-SMOTE: A Solution for Imbalanced Dataset

Sara Naghib Zadeh; Zümrüt Ecevit Satı; Ali Ghanbari Sorkhi

doi:10.33461/uybisbbd.1661593

Research Article

SVM-SMOTE Kullanarak İlaç-Hedef Etkileşimi Tahminini İyileştirme: Dengesiz Veri Setleri İçin Bir Çözüm

Year 2025, Volume: 9 Issue: 1, 10 - 28, 30.06.2025

Sara Naghib Zadeh , Zümrüt Ecevit Satı , Ali Ghanbari Sorkhi

https://doi.org/10.33461/uybisbbd.1661593

Abstract

İlaç-hedef etkileşimi (DTI) tahmini, ilaç keşfi sürecinin kritik bir aşamasıdır çünkü deneysel yöntemler genellikle zaman alıcı ve maliyetlidir. Bu görev için makine öğrenimi teknikleri etkili alternatifler olarak ortaya çıkmıştır. Ancak, DTI veri kümeleri genellikle ciddi bir sınıf dengesizliği sorunu yaşar; gerçek etkileşimlerin sayısı negatif örneklerden önemli ölçüde azdır ve bu durum model eğitimi için ciddi bir zorluk oluşturur.Bu çalışma, DTI tahmini için etkili bir çerçeve önermektedir. Model, protein özelliklerini çıkarmak için amino asit kompozisyonu (AAC) ve dipeptit kompozisyonu (DPC) yöntemlerini kullanırken, ilaç özelliklerini temsil etmek için FP2 moleküler parmak izlerinden yararlanır. Sınıf dengesizliği sorununu ele almak amacıyla, destek vektör makineleri (SVM) tabanlı sentetik azınlık çoğaltma yöntemi olan SVM-SMOTE tekniği uygulanmıştır. Modelin eğitimi için Lineer Destek Vektör Makineleri (LSVM) algoritması kullanılmıştır. Önerilen model, Enzyme, GPCR, Ion Channel ve Nuclear Receptor gibi standart veri kümeleri kullanılarak mevcut ileri düzey yöntemlerle karşılaştırılmış ve üstün performans sergilediği görülmüştür. Model tasarımının çeşitli aşamalarında geniş kapsamlı deneyler gerçekleştirilmiş ve AUC, doğruluk, F1 skoru ve hatırlama (recall) gibi değerlendirme metrikleri kullanılarak önerilen yaklaşımın etkinliği doğrulanmıştır.

Keywords

İlaç hedef etkileşimi, Özellik çıkarımı, Veri dengeleme, Svm_Smote, Doğrusal SVM

References

Abbasi, K., Razzaghi, P., Poso, A., Amanlou, M., Ghasemi, J. B., & Masoudi-Nejad, A. (2020). DeepCDA: deep cross-domain compound–protein affinity prediction through LSTM and convolutional neural networks. Bioinformatics, 36(17), 4633-4642.
Ai, H., Zhang, L., Zhang, J., Cui, T., Chang, A. K., & Liu, H. (2018). Discrimination of thermophilic and mesophilic proteins using support vector machine and decision tree. Current Proteomics, 15(5), 374-383.
Aljawazneh, H., Mora, A. M., García-Sánchez, P., & Castillo-Valdivieso, P. A. (2021). Comparing the performance of deep learning methods to predict companies’ financial failure. IEEE Access, 9, 97010-97038.
Alpay, B. A., Gosink, M., & Aguiar, D. (2022). Evaluating molecular fingerprint-based models of drug side effects against a statistical control. Drug Discovery Today, 27(11), 103364.
An, Q., & Yu, L. (2021). A heterogeneous network embedding framework for predicting similarity-based drug-target interactions. Briefings in bioinformatics, 22(6), bbab275.
Atta Mills, E. F. E., Deng, Z., Zhong, Z., & Li, J. (2024). Data-driven prediction of soccer outcomes using enhanced machine and deep learning techniques. Journal of Big Data, 11(1), 170.
Azlim Khan, A. K., & Ahamed Hassain Malim, N. H. (2023). Comparative studies on resampling techniques in machine learning and deep learning models for drug-target interaction prediction. Molecules, 28(4), 1663.
Bagherian, M., Kim, R. B., Jiang, C., Sartor, M. A., Derksen, H., & Najarian, K. (2021). Coupled matrix–matrix and coupled tensor–matrix completion methods for predicting drug–target interactions. Briefings in bioinformatics, 22(2), 2161-2171.
Bagherian, M., Sabeti, E., Wang, K., Sartor, M. A., Nikolovska-Coleska, Z., & Najarian, K. (2021). Machine learning approaches and databases for prediction of drug–target interaction: a survey paper. Briefings in bioinformatics, 22(1), 247-269.
Bekkar, M., Djemaa, H. K., & Alitouche, T. A. (2013). Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl, 3(10).
Bian, J., Zhang, X., Zhang, X., Xu, D., & Wang, G. (2023). MCANet: shared-weight-based MultiheadCrossAttention network for drug–target interaction prediction. Briefings in Bioinformatics, 24(2), bbad082.
Caron, G., Digiesi, V., Solaro, S., & Ermondi, G. (2020). Flexibility in early drug discovery: focus on the beyond-Rule-of-5 chemical space. Drug Discovery Today, 25(4), 621-627.
Charoenkwan, P., Chotpatiwetchkul, W., Lee, V. S., Nantasenamat, C., & Shoombuatong, W. (2021). A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides. Scientific Reports, 11(1), 23782.
Chen, F., Zhao, Z., Ren, Z., Lu, K., Yu, Y., & Wang, W. (2025). Prediction of drug target interaction based on under sampling strategy and random forest algorithm. PloS one, 20(3), e0318420.
Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
Chen, Z., Zhao, P., Li, F., Leier, A., Marquez-Lago, T. T., Wang, Y., ... & Song, J. (2018). iFeature: a python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics, 34(14), 2499-2502.
Dong, J., Yao, Z. J., Zhang, L., Luo, F., Lin, Q., Lu, A. P., ... & Cao, D. S. (2018). PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions. Journal of cheminformatics, 10, 1-11.
El-Behery, H., Attia, A. F., El-Fishawy, N., & Torkey, H. (2022). An ensemble-based drug–target interaction prediction approach using multiple feature information with data balancing. Journal of Biological Engineering, 16(1), 21.
Elreedy, D., & Atiya, A. F. (2019). A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance. Information Sciences, 505, 32-64.
Ezzat, A., Wu, M., Li, X., & Kwoh, C. K. (2018). Computational prediction of drug-target interactions via ensemble learning. In Computational methods for drug repurposing (pp. 239-254). New York, NY: Springer New York.
Ezzat, A., Wu, M., Li, X. L., & Kwoh, C. K. (2016). Drug-target interaction prediction via class imbalance-aware ensemble learning. BMC bioinformatics, 17, 267-276.
Faccini, D., Maggioni, F., & Potra, F. A. (2022). Robust and distributionally robust optimization models for linear support vector machine. Computers & Operations Research, 147, 105930.
Faulon, J. L., Misra, M., Martin, S., Sale, K., & Sapra, R. (2008). Genome scale enzyme–metabolite and drug–target interaction predictions using the signature molecular descriptor. Bioinformatics, 24(2), 225-233.
Gao, K. Y., Fokoue, A., Luo, H., Iyengar, A., Dey, S., & Zhang, P. (2018, July). Interpretable drug target prediction using deep neural representation. In IJCAI (Vol. 2018, pp. 3371-3377).
Gao, S., Liu, Z., & Li, Y. (2022). Networks and algorithms in heterogeneous network-based methods for drug-target interaction prediction: A survey and comparison. In Proceedings of the 1st International Conference on Health Big Data and Intelligent Healthcare.
Günther, S., Kuhn, M., Dunkel, M., Campillos, M., Senger, C., Petsalaki, E., ... & Preissner, R. (2007). SuperTarget and Matador: resources for exploring drug-target relationships. Nucleic acids research, 36(suppl_1), D919-D922.
Guo, Z., Wang, P., Liu, Z., & Zhao, Y. (2020). Discrimination of thermophilic proteins and non-thermophilic proteins using feature dimension reduction. Frontiers in Bioengineering and Biotechnology, 8, 584807.
Hasanin, T., Khoshgoftaar, T. M., Leevy, J. L., & Bauder, R. A. (2019). Severely imbalanced big data challenges: investigating data sampling approaches. Journal of Big Data, 6(1), 1-25.
Herle, A., Channegowda, J., & Prabhu, D. (2020, July). Quasar detection using linear support vector machine with learning from mistakes methodology. In 2020 IEEE international conference on electronics, computing and communication technologies (CONECCT) (pp. 1-6). IEEE.
Hu, S., Xia, D., Su, B., Chen, P., Wang, B., & Li, J. (2019). A convolutional neural network system to discriminate drug-target interactions. IEEE/ACM transactions on computational biology and bioinformatics, 18(4), 1315-1324.
Huang, K., Xiao, C., Glass, L. M., & Sun, J. (2021). MolTrans: molecular interaction transformer for drug–target interaction prediction. Bioinformatics, 37(6), 830-836.
Huang, M.-W., Chiu, C.-H., Tsai, C.-F., & Lin, W.-C. (2021). On Combining Feature Selection and Over-Sampling Techniques for Breast Cancer Prediction. Applied Sciences, 11(14), 6574. https://doi.org/10.3390/app11146574
Ikechukwu, D., & Kumar, A. (2023). Drug-Target-Interaction Prediction with Contrastive and Siamese Transformers. bioRxiv, 2023-10.
Jailani, N. S. J., Muhammad, Z., Rahiman, M. H. F., & Taib, M. N. (2022). Intelligent grading of kaffir lime oil quality using non-linear support vector machine. International Journal of Electrical and Computer Engineering (IJECE), 12(6), 6716-6723.
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M., & Tanabe, M. (2012). KEGG for integration and interpretation of large-scale molecular data sets. Nucleic acids research, 40(D1), D109-D114.
Khojasteh, H., Pirgazi, J., & Ghanbari Sorkhi, A. (2023). Improving prediction of drug-target interactions based on fusing multiple features with data balancing and feature selection techniques. Plos one, 18(8), e0288173.
Latief, M. A., Nabila, L. R., Miftakhurrahman, W., Ma’rufatullah, S., & Tantyoko, H. (2024). Handling Imbalance Data using Hybrid Sampling SMOTE-ENN in Lung Cancer Classification. Int. J. Eng. Comput. Sci. Appl, 3(1), 11-18.
Lee, I., Keum, J., & Nam, H. (2019). DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS computational biology, 15(6), e1007129.
Lee, T. Y., Chen, S. A., Hung, H. Y., & Ou, Y. Y. (2011). Incorporating distant sequence features and radial basis function networks to identify ubiquitin conjugation sites. PloS one, 6(3), e17331.
Li, Y., Cui, X., Yang, X., Liu, G., & Zhang, J. (2024). Artificial intelligence in predicting pathogenic microorganisms’ antimicrobial resistance: challenges, progress, and prospects. Frontiers in Cellular and Infection Microbiology, 14, 1482186.
Liyaqat, T., & Ahmad, T. (2023). A machine learning strategy with clustering under sampling of majority instances for predicting drug target interactions. Molecular Informatics, 42(5), 2200102.
Lo, Y. C., Rensi, S. E., Torng, W., & Altman, R. B. (2018). Machine learning in chemoinformatics and drug discovery. Drug discovery today, 23(8), 1538-1546.
Madhukar, N. S., Khade, P. K., Huang, L., Gayvert, K., Galletti, G., Stogniew, M., ... & Elemento, O. (2019). A Bayesian machine learning approach for drug target identification using diverse data types. Nature communications, 10(1), 5221.
Mahmud, S. H., Chen, W., Jahan, H., Liu, Y., Sujan, N. I., & Ahmed, S. (2019). iDTi-CSsmoteB: identification of drug–target interaction based on drug chemical structure and protein sequence using XGBoost with over-sampling technique SMOTE. IEEE Access, 7, 48699-48714.
Mahmud, S. H., Chen, W., Liu, Y., Awal, M. A., Ahmed, K., Rahman, M. H., & Moni, M. A. (2021). PreDTIs: prediction of drug–target interactions based on multiple feature information using gradient boosting framework with data balancing and feature selection techniques. Briefings in bioinformatics, 22(5), bbab046.
Mitchell, J. B. (2014). Machine learning methods in chemoinformatics. Wiley Interdisciplinary Reviews: Computational Molecular Science, 4(5), 468-481.
Moesgaard, L. K. (2024). Understanding P-glycoprotein inhibition from a molecular basis–development of rational design strategies for P-glycoprotein inhibitors.
Naveja, J. J., & Medina-Franco, J. L. (2017). ChemMaps: Towards an approach for visualizing the chemical space based on adaptive satellite compounds. F1000Research, 6, Chem-Inf.
Padhi, A., Agarwal, A., Saxena, S. K., & Katoch, C. D. S. (2023). Transforming clinical virology with AI, machine learning and deep learning: a comprehensive review and outlook. VirusDisease, 34(3), 345-355.
Prasetyo, V. P., & Anggraeni, W. (2024, August). Drug-Target Interactions Prediction Using Stacking Ensemble Learning Approach. In 2024 International Electronics Symposium (IES) (pp. 681-686). IEEE.
Redkar, S., Mondal, S., Joseph, A., & Hareesha, K. S. (2020). A machine learning approach for drug‐target interaction prediction using wrapper feature selection and class balancing. Molecular informatics, 39(5), 1900062.
Saravanan, V., & Gautham, N. (2015). Harnessing computational biology for exact linear B-cell epitope prediction: a novel amino acid composition-based feature descriptor. Omics: a journal of integrative biology, 19(10), 648-658.
Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G., & Schomburg, D. (2004). BRENDA, the enzyme database: updates and major new developments. Nucleic acids research, 32(suppl_1), D431-D433.
Shi, H., Liu, S., Chen, J., Li, X., Ma, Q., & Yu, B. (2019). Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure. Genomics, 111(6), 1839-1852.
Sorkhi, A. G., Abbasi, Z., Mobarakeh, M. I., & Pirgazi, J. (2021). Drug–target interaction prediction using unifying of graph regularized nuclear norm with bilinear factorization. BMC bioinformatics, 22, 1-23.
Sun, S., Ao, C., Wang, D., & Dong, B. (2020). The frequencies of oppositely charged, uncharged polar, and β-branched amino acids determine proteins’ thermostability. IEEE Access, 8, 66839-66845.
Vlasiou, M. C. (2024). Computer-Aided Drug Discovery Methods: A Brief Introduction. Bentham Science Publishers.
Wang, D., Yang, L., Fu, Z., & Xia, J. (2011). Prediction of thermophilic protein with pseudo amino acid composition: an approach from combined feature selection and reduction. Protein and peptide letters, 18(7), 684-689.
Wang, L., Zhou, Y., & Chen, Q. (2023). Ammvf-dti: A novel model predicting drug–target interactions based on attention mechanism and multi-view fusion. International Journal of Molecular Sciences, 24(18), 14142.
Wang, X. R., Cao, T. T., Jia, C. M., Tian, X. M., & Wang, Y. (2021). Quantitative prediction model for affinity of drug–target interactions based on molecular vibrations and overall system of ligand-receptor. BMC bioinformatics, 22, 1-18.
Wishart, D. S., Knox, C., Guo, A. C., Shrivastava, S., Hassanali, M., Stothard, P., ... & Woolsey, J. (2006). DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic acids research, 34(suppl_1), D668-D672.
Xu, L., Ru, X., & Song, R. (2021). Application of machine learning for drug–target interaction prediction. Frontiers in genetics, 12, 680117.
Yamanishi, Y., Araki, M., Gutteridge, A., Honda, W., & Kanehisa, M. (2008). Prediction of drug–target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 24(13), i232-i240.
Yin, L., Du, X., Ma, C., & Gu, H. (2022). Virtual screening of drug proteins based on the prediction classification model of imbalanced data mining. Processes, 10(7), 1420.
Zheng, X. (2020). SMOTE variants for imbalanced binary classification: heart disease prediction. University of California, Los Angeles.

Improving Drug-Target Interaction Prediction Using SVM-SMOTE: A Solution for Imbalanced Dataset

Year 2025, Volume: 9 Issue: 1, 10 - 28, 30.06.2025

Sara Naghib Zadeh , Zümrüt Ecevit Satı , Ali Ghanbari Sorkhi

https://doi.org/10.33461/uybisbbd.1661593

Abstract

Drug–target interaction (DTI) prediction is a critical step in the drug discovery process, as experimental methods are often time-consuming and expensive. Machine learning techniques have emerged as effective alternatives for this task. However, DTI datasets commonly suffer from severe class imbalance, where the number of true interactions is significantly lower than negative ones—posing a serious challenge for model training. This study proposes an effective framework for DTI prediction. The model utilizes amino acid composition (AAC) and dipeptide composition (DPC) methods to extract protein features, while FP2 molecular fingerprints are used to represent drug features. To address the class imbalance problem, the SVM-SMOTE technique—an SVM-based synthetic minority oversampling method—is employed. For model training, a Linear Support Vector Machine (LSVM) algorithm is used. The proposed model was evaluated against several state-of-the-art methods using benchmark datasets, including Enzyme, GPCR, Ion Channel, and Nuclear Receptor. The results demonstrate that the proposed framework achieves superior performance. Extensive experiments were conducted at various stages of model design, using evaluation metrics such as AUC, accuracy, F1-score, and recall, all of which confirm the effectiveness of the proposed approach.

Keywords

Drug target interaction, Feature extraction, Data balancing, Svm_Smote, Linear SVM

References

Abbasi, K., Razzaghi, P., Poso, A., Amanlou, M., Ghasemi, J. B., & Masoudi-Nejad, A. (2020). DeepCDA: deep cross-domain compound–protein affinity prediction through LSTM and convolutional neural networks. Bioinformatics, 36(17), 4633-4642.
Ai, H., Zhang, L., Zhang, J., Cui, T., Chang, A. K., & Liu, H. (2018). Discrimination of thermophilic and mesophilic proteins using support vector machine and decision tree. Current Proteomics, 15(5), 374-383.
Aljawazneh, H., Mora, A. M., García-Sánchez, P., & Castillo-Valdivieso, P. A. (2021). Comparing the performance of deep learning methods to predict companies’ financial failure. IEEE Access, 9, 97010-97038.
Alpay, B. A., Gosink, M., & Aguiar, D. (2022). Evaluating molecular fingerprint-based models of drug side effects against a statistical control. Drug Discovery Today, 27(11), 103364.
An, Q., & Yu, L. (2021). A heterogeneous network embedding framework for predicting similarity-based drug-target interactions. Briefings in bioinformatics, 22(6), bbab275.
Atta Mills, E. F. E., Deng, Z., Zhong, Z., & Li, J. (2024). Data-driven prediction of soccer outcomes using enhanced machine and deep learning techniques. Journal of Big Data, 11(1), 170.
Azlim Khan, A. K., & Ahamed Hassain Malim, N. H. (2023). Comparative studies on resampling techniques in machine learning and deep learning models for drug-target interaction prediction. Molecules, 28(4), 1663.
Bagherian, M., Kim, R. B., Jiang, C., Sartor, M. A., Derksen, H., & Najarian, K. (2021). Coupled matrix–matrix and coupled tensor–matrix completion methods for predicting drug–target interactions. Briefings in bioinformatics, 22(2), 2161-2171.
Bagherian, M., Sabeti, E., Wang, K., Sartor, M. A., Nikolovska-Coleska, Z., & Najarian, K. (2021). Machine learning approaches and databases for prediction of drug–target interaction: a survey paper. Briefings in bioinformatics, 22(1), 247-269.
Bekkar, M., Djemaa, H. K., & Alitouche, T. A. (2013). Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl, 3(10).
Bian, J., Zhang, X., Zhang, X., Xu, D., & Wang, G. (2023). MCANet: shared-weight-based MultiheadCrossAttention network for drug–target interaction prediction. Briefings in Bioinformatics, 24(2), bbad082.
Caron, G., Digiesi, V., Solaro, S., & Ermondi, G. (2020). Flexibility in early drug discovery: focus on the beyond-Rule-of-5 chemical space. Drug Discovery Today, 25(4), 621-627.
Charoenkwan, P., Chotpatiwetchkul, W., Lee, V. S., Nantasenamat, C., & Shoombuatong, W. (2021). A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides. Scientific Reports, 11(1), 23782.
Chen, F., Zhao, Z., Ren, Z., Lu, K., Yu, Y., & Wang, W. (2025). Prediction of drug target interaction based on under sampling strategy and random forest algorithm. PloS one, 20(3), e0318420.
Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
Chen, Z., Zhao, P., Li, F., Leier, A., Marquez-Lago, T. T., Wang, Y., ... & Song, J. (2018). iFeature: a python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics, 34(14), 2499-2502.
Dong, J., Yao, Z. J., Zhang, L., Luo, F., Lin, Q., Lu, A. P., ... & Cao, D. S. (2018). PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions. Journal of cheminformatics, 10, 1-11.
El-Behery, H., Attia, A. F., El-Fishawy, N., & Torkey, H. (2022). An ensemble-based drug–target interaction prediction approach using multiple feature information with data balancing. Journal of Biological Engineering, 16(1), 21.
Elreedy, D., & Atiya, A. F. (2019). A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance. Information Sciences, 505, 32-64.
Ezzat, A., Wu, M., Li, X., & Kwoh, C. K. (2018). Computational prediction of drug-target interactions via ensemble learning. In Computational methods for drug repurposing (pp. 239-254). New York, NY: Springer New York.
Ezzat, A., Wu, M., Li, X. L., & Kwoh, C. K. (2016). Drug-target interaction prediction via class imbalance-aware ensemble learning. BMC bioinformatics, 17, 267-276.
Faccini, D., Maggioni, F., & Potra, F. A. (2022). Robust and distributionally robust optimization models for linear support vector machine. Computers & Operations Research, 147, 105930.
Faulon, J. L., Misra, M., Martin, S., Sale, K., & Sapra, R. (2008). Genome scale enzyme–metabolite and drug–target interaction predictions using the signature molecular descriptor. Bioinformatics, 24(2), 225-233.
Gao, K. Y., Fokoue, A., Luo, H., Iyengar, A., Dey, S., & Zhang, P. (2018, July). Interpretable drug target prediction using deep neural representation. In IJCAI (Vol. 2018, pp. 3371-3377).
Gao, S., Liu, Z., & Li, Y. (2022). Networks and algorithms in heterogeneous network-based methods for drug-target interaction prediction: A survey and comparison. In Proceedings of the 1st International Conference on Health Big Data and Intelligent Healthcare.
Günther, S., Kuhn, M., Dunkel, M., Campillos, M., Senger, C., Petsalaki, E., ... & Preissner, R. (2007). SuperTarget and Matador: resources for exploring drug-target relationships. Nucleic acids research, 36(suppl_1), D919-D922.
Guo, Z., Wang, P., Liu, Z., & Zhao, Y. (2020). Discrimination of thermophilic proteins and non-thermophilic proteins using feature dimension reduction. Frontiers in Bioengineering and Biotechnology, 8, 584807.
Hasanin, T., Khoshgoftaar, T. M., Leevy, J. L., & Bauder, R. A. (2019). Severely imbalanced big data challenges: investigating data sampling approaches. Journal of Big Data, 6(1), 1-25.
Herle, A., Channegowda, J., & Prabhu, D. (2020, July). Quasar detection using linear support vector machine with learning from mistakes methodology. In 2020 IEEE international conference on electronics, computing and communication technologies (CONECCT) (pp. 1-6). IEEE.
Hu, S., Xia, D., Su, B., Chen, P., Wang, B., & Li, J. (2019). A convolutional neural network system to discriminate drug-target interactions. IEEE/ACM transactions on computational biology and bioinformatics, 18(4), 1315-1324.
Huang, K., Xiao, C., Glass, L. M., & Sun, J. (2021). MolTrans: molecular interaction transformer for drug–target interaction prediction. Bioinformatics, 37(6), 830-836.
Huang, M.-W., Chiu, C.-H., Tsai, C.-F., & Lin, W.-C. (2021). On Combining Feature Selection and Over-Sampling Techniques for Breast Cancer Prediction. Applied Sciences, 11(14), 6574. https://doi.org/10.3390/app11146574
Ikechukwu, D., & Kumar, A. (2023). Drug-Target-Interaction Prediction with Contrastive and Siamese Transformers. bioRxiv, 2023-10.
Jailani, N. S. J., Muhammad, Z., Rahiman, M. H. F., & Taib, M. N. (2022). Intelligent grading of kaffir lime oil quality using non-linear support vector machine. International Journal of Electrical and Computer Engineering (IJECE), 12(6), 6716-6723.
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M., & Tanabe, M. (2012). KEGG for integration and interpretation of large-scale molecular data sets. Nucleic acids research, 40(D1), D109-D114.
Khojasteh, H., Pirgazi, J., & Ghanbari Sorkhi, A. (2023). Improving prediction of drug-target interactions based on fusing multiple features with data balancing and feature selection techniques. Plos one, 18(8), e0288173.
Latief, M. A., Nabila, L. R., Miftakhurrahman, W., Ma’rufatullah, S., & Tantyoko, H. (2024). Handling Imbalance Data using Hybrid Sampling SMOTE-ENN in Lung Cancer Classification. Int. J. Eng. Comput. Sci. Appl, 3(1), 11-18.
Lee, I., Keum, J., & Nam, H. (2019). DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS computational biology, 15(6), e1007129.
Lee, T. Y., Chen, S. A., Hung, H. Y., & Ou, Y. Y. (2011). Incorporating distant sequence features and radial basis function networks to identify ubiquitin conjugation sites. PloS one, 6(3), e17331.
Li, Y., Cui, X., Yang, X., Liu, G., & Zhang, J. (2024). Artificial intelligence in predicting pathogenic microorganisms’ antimicrobial resistance: challenges, progress, and prospects. Frontiers in Cellular and Infection Microbiology, 14, 1482186.
Liyaqat, T., & Ahmad, T. (2023). A machine learning strategy with clustering under sampling of majority instances for predicting drug target interactions. Molecular Informatics, 42(5), 2200102.
Lo, Y. C., Rensi, S. E., Torng, W., & Altman, R. B. (2018). Machine learning in chemoinformatics and drug discovery. Drug discovery today, 23(8), 1538-1546.
Madhukar, N. S., Khade, P. K., Huang, L., Gayvert, K., Galletti, G., Stogniew, M., ... & Elemento, O. (2019). A Bayesian machine learning approach for drug target identification using diverse data types. Nature communications, 10(1), 5221.
Mahmud, S. H., Chen, W., Jahan, H., Liu, Y., Sujan, N. I., & Ahmed, S. (2019). iDTi-CSsmoteB: identification of drug–target interaction based on drug chemical structure and protein sequence using XGBoost with over-sampling technique SMOTE. IEEE Access, 7, 48699-48714.
Mahmud, S. H., Chen, W., Liu, Y., Awal, M. A., Ahmed, K., Rahman, M. H., & Moni, M. A. (2021). PreDTIs: prediction of drug–target interactions based on multiple feature information using gradient boosting framework with data balancing and feature selection techniques. Briefings in bioinformatics, 22(5), bbab046.
Mitchell, J. B. (2014). Machine learning methods in chemoinformatics. Wiley Interdisciplinary Reviews: Computational Molecular Science, 4(5), 468-481.
Moesgaard, L. K. (2024). Understanding P-glycoprotein inhibition from a molecular basis–development of rational design strategies for P-glycoprotein inhibitors.
Naveja, J. J., & Medina-Franco, J. L. (2017). ChemMaps: Towards an approach for visualizing the chemical space based on adaptive satellite compounds. F1000Research, 6, Chem-Inf.
Padhi, A., Agarwal, A., Saxena, S. K., & Katoch, C. D. S. (2023). Transforming clinical virology with AI, machine learning and deep learning: a comprehensive review and outlook. VirusDisease, 34(3), 345-355.
Prasetyo, V. P., & Anggraeni, W. (2024, August). Drug-Target Interactions Prediction Using Stacking Ensemble Learning Approach. In 2024 International Electronics Symposium (IES) (pp. 681-686). IEEE.
Redkar, S., Mondal, S., Joseph, A., & Hareesha, K. S. (2020). A machine learning approach for drug‐target interaction prediction using wrapper feature selection and class balancing. Molecular informatics, 39(5), 1900062.
Saravanan, V., & Gautham, N. (2015). Harnessing computational biology for exact linear B-cell epitope prediction: a novel amino acid composition-based feature descriptor. Omics: a journal of integrative biology, 19(10), 648-658.
Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G., & Schomburg, D. (2004). BRENDA, the enzyme database: updates and major new developments. Nucleic acids research, 32(suppl_1), D431-D433.
Shi, H., Liu, S., Chen, J., Li, X., Ma, Q., & Yu, B. (2019). Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure. Genomics, 111(6), 1839-1852.
Sorkhi, A. G., Abbasi, Z., Mobarakeh, M. I., & Pirgazi, J. (2021). Drug–target interaction prediction using unifying of graph regularized nuclear norm with bilinear factorization. BMC bioinformatics, 22, 1-23.
Sun, S., Ao, C., Wang, D., & Dong, B. (2020). The frequencies of oppositely charged, uncharged polar, and β-branched amino acids determine proteins’ thermostability. IEEE Access, 8, 66839-66845.
Vlasiou, M. C. (2024). Computer-Aided Drug Discovery Methods: A Brief Introduction. Bentham Science Publishers.
Wang, D., Yang, L., Fu, Z., & Xia, J. (2011). Prediction of thermophilic protein with pseudo amino acid composition: an approach from combined feature selection and reduction. Protein and peptide letters, 18(7), 684-689.
Wang, L., Zhou, Y., & Chen, Q. (2023). Ammvf-dti: A novel model predicting drug–target interactions based on attention mechanism and multi-view fusion. International Journal of Molecular Sciences, 24(18), 14142.
Wang, X. R., Cao, T. T., Jia, C. M., Tian, X. M., & Wang, Y. (2021). Quantitative prediction model for affinity of drug–target interactions based on molecular vibrations and overall system of ligand-receptor. BMC bioinformatics, 22, 1-18.
Wishart, D. S., Knox, C., Guo, A. C., Shrivastava, S., Hassanali, M., Stothard, P., ... & Woolsey, J. (2006). DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic acids research, 34(suppl_1), D668-D672.
Xu, L., Ru, X., & Song, R. (2021). Application of machine learning for drug–target interaction prediction. Frontiers in genetics, 12, 680117.
Yamanishi, Y., Araki, M., Gutteridge, A., Honda, W., & Kanehisa, M. (2008). Prediction of drug–target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 24(13), i232-i240.
Yin, L., Du, X., Ma, C., & Gu, H. (2022). Virtual screening of drug proteins based on the prediction classification model of imbalanced data mining. Processes, 10(7), 1420.
Zheng, X. (2020). SMOTE variants for imbalanced binary classification: heart disease prediction. University of California, Los Angeles.

There are 65 citations in total.

Details

Primary Language	English
Subjects	Machine Learning (Other), Artificial Intelligence (Other)
Journal Section	Research Paper
Authors	Sara Naghib Zadeh 0009-0005-6959-1165 Zümrüt Ecevit Satı 0000-0002-7246-6518 Ali Ghanbari Sorkhi 0000-0001-7064-5857
Publication Date	June 30, 2025
Submission Date	March 20, 2025
Acceptance Date	May 27, 2025
Published in Issue	Year 2025 Volume: 9 Issue: 1

Cite

APA	Naghib Zadeh, S., Ecevit Satı, Z., & Ghanbari Sorkhi, A. (2025). Improving Drug-Target Interaction Prediction Using SVM-SMOTE: A Solution for Imbalanced Dataset. International Journal of Management Information Systems and Computer Science, 9(1), 10-28. https://doi.org/10.33461/uybisbbd.1661593

Download Cover Image

Article Files

Full Text

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.