Research Article
BibTex RIS Cite

IncRNA-HASTALIK TAHMİNİ İÇİN GRAPH TABANLI BİR ÖĞRENME MODELİNDE K-FOLD CROSS VALIDATION İLE FARKLI K DEĞERLERİNİN PERFORMANSININ KARŞILAŞTIRILMASI

Year 2023, Volume: 9 Issue: 1, 63 - 82, 30.06.2023
https://doi.org/10.34186/klujes.1248062

Abstract

Makine öğrenmesinde, k-katlı çapraz doğrulama yöntemindeki k değeri, oluşturulan modelin performansını önemli ölçüde etkilemektedir. Yapılmış olan çalışmalarda genellikle k değeri beş veya on alınmaktadır çünkü bu iki değerin ortalama tahminler ürettiği düşünülmektedir. Ancak resmi bir kural yoktur. Farklı modellerin eğitiminde farklı k değerlerinin kullanılması için az sayıda çalışma yapıldığı görülmüştür. Bu çalışmada, çeşitli k değerleri (2, 3, 4, 5, 6, 7, 8, 9 ve 10) ve veri setleri kullanılarak IncRNA-hastalık modeli üzerinde bir performans değerlendirilmesi yapılmıştır. Elde edilen sonuçlar karşılaştırılmış ve model için en uygun k değeri belirtilmiştir. Gelecekte yapılacak olan çalışmalarda veri seti sayısının arttırılması ile daha geniş kapsamlı bir çalışma yapılması hedeflenmektedir.

References

  • Coşan, D.T., Yağcı, E., Kurt, H., Epigenetikten Kansere Uzanan Çizgiler: Uzun Kodlamayan RNA’lar. Osmangazi Journal of Medicine, 40(3), S 114-121, 2018.
  • Karaarslan, Z. Ö., Serin, M. S., Hastalıkların tanı ve tedavi stratejilerinde miRNA ve diğer non-protein-coding RNA’lar. Mersin Üniversitesi Sağlık Bilimleri Dergisi, 9(3), S 159-172, 2016.
  • Sun, M., Xia, R., Jin, F., Xu, T., Liu, Z., De, W., Liu, X., Downregulated long noncoding RNA MEG3 is associated with poor prognosis and promotes cell proliferation in gastric cancer. Tumor Biology, 35(2), S 1065-1073, 2014.
  • Faghihi, M.A., Modarresi, F., Khalil, A.M., Wood, D.E., ahagan, B.G., Morgan, T.E., Finch, C.E., Laurent, G., Kenny, P.J., Wahlestedt, C., Expression of a noncoding RNA is elevated in Alzheimer's disease and drives rapid feed-forward regulation of β-secretase. Nature medicine, 14(7), S 723-730, 2008.
  • Chen, X., Yan, G. Y., Novel human lncRNA–disease association inference based on lncRNA expression profiles. Bioinformatics, 29(20), S 2617-2624, 2013.
  • Lu, C., Yang, M., Luo, F., Wu, F.X., Li, M., Pan, Y., Li, Y., Wang, J., Prediction of lncRNA–disease associations based on inductive matrix completion. Bioinformatics, 34(19), S 3357-3364, 2018.
  • Lan, W., Li, M., Zhao, K., Liu, J., Wu, F. X., Pan, Y., Wang, J., LDAP: a web server for lncRNA-disease association prediction. Bioinformatics, 33(3), S 458-460, 2017.
  • Xuan, P., Pan, S., Zhang, T., Liu, Y., Sun, H., Graph convolutional network and convolutional neural network based method for predicting lncRNA-disease associations. Cells, 8(9), 1012, 2019.
  • Wu, X., Lan, W., Chen, Q., Dong, Y., Liu, J., & Peng, W., Inferring LncRNA-disease associations based on graph autoencoder matrix completion. Computational Biology and Chemistry, 87, 107282, 2020.
  • Tamilarasi, P., Rani, R., Diagnosis of crime rate against women using k-fold cross validation through machine learning. In 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), S 1034-1038, 2020.
  • Jung, K., Bae, D. H., Um, M. J., Kim, S., Jeon, S., Park, D., Evaluation of nitrate load estimations using neural networks and canonical correlation analysis with k-fold cross-validation. Sustainability, 12(1), 400, 2020.
  • Fang, L., Liu, S., Huang, Z., Uncertain Johnson–Schumacher growth model with imprecise observations and k-fold cross-validation test. Soft Computing, 24(4), S 2715-2720, 2020.
  • Wayahdi, M. R., Syahputra, D., Ginting, S. H. N., Evaluation of the K-Nearest Neighbor Model With K-Fold Cross Validation on Image Classification. INFOKUM, 9(1), S 1-6, 2020.
  • Marcot, B. G., Hanea, A. M., What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?. Computational Statistics, 36(3), S 2009-2031, 2021.
  • Yao, D., Zhan, X., Zhan, X., Kwoh, C. K., Li, P., Wang, J., A random forest based computational model for predicting novel lncRNA-disease associations. BMC bioinformatics, 21(1), S 1-18, 2020.
  • Shi, Z., Zhang, H., Jin, C., Quan, X., & Yin, Y., A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations. BMC bioinformatics, 22(1), S 1-20, 2021.
  • Zhang, G., Li, M., Deng, H., Xu, X., Liu, X., Zhang, W., SGNNMD: signed graph neural network for predicting deregulation types of miRNA-disease associations. Briefings in Bioinformatics, 23(1), 2022.
  • Sheng, N., Huang, L., Lu, Y., Wang, H., Yang, L., Gao, L., Xie, X., Fu, Y., Wang, Y., Data resources and computational methods for lncRNA-disease association prediction. Computers in Biology and Medicine, 2023.
  • Wu, Q.-W., Xia, J.-F., Ni, J.-C., Zheng, C-H., GAERF: predicting lncRNA-disease associations by graph auto-encoder and random forest. Briefings Bioinf, 22(5), 2021.
  • Wu, Q. W., Cao, R. F., Xia, J. F., Ni, J. C., Zheng, C. H., Su, Y. S., Extra Trees Method for Predicting LncRNA-Disease Association Based On Multi-Layer Graph Embedding Aggregation. IEEE/ACM transactions on computational biology and bioinformatics, 19(6), S 3171–3178, 2022.
  • Sheng, N., Huang, L., Wang, Y., Zhao, J., Xuan, P., Gao, L., Cao, Y., Multi-channel graph attention autoencoders for disease-related lncRNAs prediction. Briefings in bioinformatics, 23(2), 2022.
  • Lan, W., Wu, X., Chen, Q., Peng, W., Wang, J., Chen, Y.-P., GANLDA: graph attention network for lncRNAdisease associations prediction. Neurocomputing, 469, S 384–393, 2022.
  • Xuan, P., Zhan, L., Cui, H., Zhang, T., Nakaguchi, T., Zhang, W., Graph triple-attention network for disease-related LncRNA prediction. IEEE journal of biomedical and health informatics, 26(6), S 2839–2849.
  • Fan, Y., Chen, M., Pan, X., GCRFLDA: scoring lncRNA-disease associations using graph convolution matrix completion with conditional random field. Briefings in bioinformatics, 23(1), 2021.
  • Zhao, X., Zhao, X., Yin, M., Heterogeneous graph attention network based on metapaths for lncRNA-disease association prediction. Briefings in bioinformatics, 23(1), 2021.
  • Song, Z., Yang, X., Xu, Z., & King, I., Graph-based semi-supervised learning: A comprehensive review. IEEE Transactions on Neural Networks and Learning Systems, S 21, 2022.
  • Qu, M., Bengio, Y., Tang, J., Gmnn: Graph markov neural networks, In International conference on machine learning, Long Beach, California, PMLR 97, S 5241-5250, 2019.
  • Monti, F., Bronstein, M., Bresson, X., Geometric matrix completion with recurrent multi-graph neural networks. Advances in neural information processing systems, 30, 2017.
  • Wang, Y., Xu, B., Kwak, M., Zeng, X., A simple training strategy for graph autoencoder. In Proceedings of the 2020 12th International Conference on Machine Learning and Computing, S 341-345, 2020.
  • Nti, I. K., Nyarko-Boateng, O., Aning, J., Performance of Machine Learning Algorithms with Different K Values in K-fold Cross-Validation. Inter. J. Info. Technol. Comp. Sci., 13, S 61-71, 2021.
  • Chen, G., Wang, Z., Wang, D., Qiu, C., Liu, M., Chen, X., Zhang, Q., Yan, G., Cui, Q., LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic acids research, 41(Database issue), S D983–D986, 2013.
  • Fu, G., Wang, J., Domeniconi, C., Yu, G., Matrix factorization-based data fusion for the prediction of lncRNA–disease associations. Bioinformatics, 34(9), S 1529-1537, 2018.
  • Huang, Z., Shi, J., Gao, Y., Cui, C., Zhang, S., Li, J., Zhou, Y., Cui, Q., HMDD v3.0: a database for experimentally supported human microRNA-disease associations. Nucleic acids research, 47(D1), S 1013–1017, 2019.
  • Anguita, D., Ghelardoni, L., Ghio, A., Oneto, L., Ridella, S., The'K'in K-fold Cross Validation. In ESANN, S 441-446, 2012.

COMPARISON OF PERFORMANCE OF DIFFERENT K VALUES WITH K-FOLD CROSS VALIDATION IN A GRAPH-BASED LEARNING MODEL FOR IncRNA-DISEASE PREDICTION

Year 2023, Volume: 9 Issue: 1, 63 - 82, 30.06.2023
https://doi.org/10.34186/klujes.1248062

Abstract

In machine learning, the k value in the k-fold cross-validation method significantly affects the performance of the created model. In the studies that have been done, the k value is usually taken as five or ten because these two values are thought to produce average estimates. However, there is no official rule. It has been observed that few studies have been carried out to use different k values in the training of different models. In this study, a performance evaluation was performed on the IncRNA-disease model using various k values (2, 3, 4, 5, 6, 7, 8, 9, and 10) and datasets. The obtained results were compared and the most suitable k value for the model was determined. In future studies, it is aimed to carry out a more comprehensive study by increasing the number of data sets.

References

  • Coşan, D.T., Yağcı, E., Kurt, H., Epigenetikten Kansere Uzanan Çizgiler: Uzun Kodlamayan RNA’lar. Osmangazi Journal of Medicine, 40(3), S 114-121, 2018.
  • Karaarslan, Z. Ö., Serin, M. S., Hastalıkların tanı ve tedavi stratejilerinde miRNA ve diğer non-protein-coding RNA’lar. Mersin Üniversitesi Sağlık Bilimleri Dergisi, 9(3), S 159-172, 2016.
  • Sun, M., Xia, R., Jin, F., Xu, T., Liu, Z., De, W., Liu, X., Downregulated long noncoding RNA MEG3 is associated with poor prognosis and promotes cell proliferation in gastric cancer. Tumor Biology, 35(2), S 1065-1073, 2014.
  • Faghihi, M.A., Modarresi, F., Khalil, A.M., Wood, D.E., ahagan, B.G., Morgan, T.E., Finch, C.E., Laurent, G., Kenny, P.J., Wahlestedt, C., Expression of a noncoding RNA is elevated in Alzheimer's disease and drives rapid feed-forward regulation of β-secretase. Nature medicine, 14(7), S 723-730, 2008.
  • Chen, X., Yan, G. Y., Novel human lncRNA–disease association inference based on lncRNA expression profiles. Bioinformatics, 29(20), S 2617-2624, 2013.
  • Lu, C., Yang, M., Luo, F., Wu, F.X., Li, M., Pan, Y., Li, Y., Wang, J., Prediction of lncRNA–disease associations based on inductive matrix completion. Bioinformatics, 34(19), S 3357-3364, 2018.
  • Lan, W., Li, M., Zhao, K., Liu, J., Wu, F. X., Pan, Y., Wang, J., LDAP: a web server for lncRNA-disease association prediction. Bioinformatics, 33(3), S 458-460, 2017.
  • Xuan, P., Pan, S., Zhang, T., Liu, Y., Sun, H., Graph convolutional network and convolutional neural network based method for predicting lncRNA-disease associations. Cells, 8(9), 1012, 2019.
  • Wu, X., Lan, W., Chen, Q., Dong, Y., Liu, J., & Peng, W., Inferring LncRNA-disease associations based on graph autoencoder matrix completion. Computational Biology and Chemistry, 87, 107282, 2020.
  • Tamilarasi, P., Rani, R., Diagnosis of crime rate against women using k-fold cross validation through machine learning. In 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), S 1034-1038, 2020.
  • Jung, K., Bae, D. H., Um, M. J., Kim, S., Jeon, S., Park, D., Evaluation of nitrate load estimations using neural networks and canonical correlation analysis with k-fold cross-validation. Sustainability, 12(1), 400, 2020.
  • Fang, L., Liu, S., Huang, Z., Uncertain Johnson–Schumacher growth model with imprecise observations and k-fold cross-validation test. Soft Computing, 24(4), S 2715-2720, 2020.
  • Wayahdi, M. R., Syahputra, D., Ginting, S. H. N., Evaluation of the K-Nearest Neighbor Model With K-Fold Cross Validation on Image Classification. INFOKUM, 9(1), S 1-6, 2020.
  • Marcot, B. G., Hanea, A. M., What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?. Computational Statistics, 36(3), S 2009-2031, 2021.
  • Yao, D., Zhan, X., Zhan, X., Kwoh, C. K., Li, P., Wang, J., A random forest based computational model for predicting novel lncRNA-disease associations. BMC bioinformatics, 21(1), S 1-18, 2020.
  • Shi, Z., Zhang, H., Jin, C., Quan, X., & Yin, Y., A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations. BMC bioinformatics, 22(1), S 1-20, 2021.
  • Zhang, G., Li, M., Deng, H., Xu, X., Liu, X., Zhang, W., SGNNMD: signed graph neural network for predicting deregulation types of miRNA-disease associations. Briefings in Bioinformatics, 23(1), 2022.
  • Sheng, N., Huang, L., Lu, Y., Wang, H., Yang, L., Gao, L., Xie, X., Fu, Y., Wang, Y., Data resources and computational methods for lncRNA-disease association prediction. Computers in Biology and Medicine, 2023.
  • Wu, Q.-W., Xia, J.-F., Ni, J.-C., Zheng, C-H., GAERF: predicting lncRNA-disease associations by graph auto-encoder and random forest. Briefings Bioinf, 22(5), 2021.
  • Wu, Q. W., Cao, R. F., Xia, J. F., Ni, J. C., Zheng, C. H., Su, Y. S., Extra Trees Method for Predicting LncRNA-Disease Association Based On Multi-Layer Graph Embedding Aggregation. IEEE/ACM transactions on computational biology and bioinformatics, 19(6), S 3171–3178, 2022.
  • Sheng, N., Huang, L., Wang, Y., Zhao, J., Xuan, P., Gao, L., Cao, Y., Multi-channel graph attention autoencoders for disease-related lncRNAs prediction. Briefings in bioinformatics, 23(2), 2022.
  • Lan, W., Wu, X., Chen, Q., Peng, W., Wang, J., Chen, Y.-P., GANLDA: graph attention network for lncRNAdisease associations prediction. Neurocomputing, 469, S 384–393, 2022.
  • Xuan, P., Zhan, L., Cui, H., Zhang, T., Nakaguchi, T., Zhang, W., Graph triple-attention network for disease-related LncRNA prediction. IEEE journal of biomedical and health informatics, 26(6), S 2839–2849.
  • Fan, Y., Chen, M., Pan, X., GCRFLDA: scoring lncRNA-disease associations using graph convolution matrix completion with conditional random field. Briefings in bioinformatics, 23(1), 2021.
  • Zhao, X., Zhao, X., Yin, M., Heterogeneous graph attention network based on metapaths for lncRNA-disease association prediction. Briefings in bioinformatics, 23(1), 2021.
  • Song, Z., Yang, X., Xu, Z., & King, I., Graph-based semi-supervised learning: A comprehensive review. IEEE Transactions on Neural Networks and Learning Systems, S 21, 2022.
  • Qu, M., Bengio, Y., Tang, J., Gmnn: Graph markov neural networks, In International conference on machine learning, Long Beach, California, PMLR 97, S 5241-5250, 2019.
  • Monti, F., Bronstein, M., Bresson, X., Geometric matrix completion with recurrent multi-graph neural networks. Advances in neural information processing systems, 30, 2017.
  • Wang, Y., Xu, B., Kwak, M., Zeng, X., A simple training strategy for graph autoencoder. In Proceedings of the 2020 12th International Conference on Machine Learning and Computing, S 341-345, 2020.
  • Nti, I. K., Nyarko-Boateng, O., Aning, J., Performance of Machine Learning Algorithms with Different K Values in K-fold Cross-Validation. Inter. J. Info. Technol. Comp. Sci., 13, S 61-71, 2021.
  • Chen, G., Wang, Z., Wang, D., Qiu, C., Liu, M., Chen, X., Zhang, Q., Yan, G., Cui, Q., LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic acids research, 41(Database issue), S D983–D986, 2013.
  • Fu, G., Wang, J., Domeniconi, C., Yu, G., Matrix factorization-based data fusion for the prediction of lncRNA–disease associations. Bioinformatics, 34(9), S 1529-1537, 2018.
  • Huang, Z., Shi, J., Gao, Y., Cui, C., Zhang, S., Li, J., Zhou, Y., Cui, Q., HMDD v3.0: a database for experimentally supported human microRNA-disease associations. Nucleic acids research, 47(D1), S 1013–1017, 2019.
  • Anguita, D., Ghelardoni, L., Ghio, A., Oneto, L., Ridella, S., The'K'in K-fold Cross Validation. In ESANN, S 441-446, 2012.
There are 34 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Issue
Authors

Zeynep Barut 0000-0002-9363-818X

Volkan Altuntaş 0000-0003-3144-8724

Publication Date June 30, 2023
Published in Issue Year 2023 Volume: 9 Issue: 1

Cite

APA Barut, Z., & Altuntaş, V. (2023). COMPARISON OF PERFORMANCE OF DIFFERENT K VALUES WITH K-FOLD CROSS VALIDATION IN A GRAPH-BASED LEARNING MODEL FOR IncRNA-DISEASE PREDICTION. Kirklareli University Journal of Engineering and Science, 9(1), 63-82. https://doi.org/10.34186/klujes.1248062