COMPARATIVE OF SUCCESS OF KNN WITH NEW PROPOSED K-SPLIT METHOD AND STRATIFIED CROSS VALIDATION ON REMOTE HOMOLOGUE PROTEIN DETECTION
Abstract
Keywords
Remote Homolog Protein, k-nearest Neighbor (kNN), Bag of words model, Distances, k-fold Cross Validation
Supporting Institution
Project Number
References
- [1] Li J, Wong L, Yang Q. Guest editors' introduction: Data Mining in Bioinformatics. IEEE Intell. Systems, 2005; 20(6):16-18.
- [2] Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang J.-F, Hua L. Data mining in healthcare and biomedicine: a survey of the literature. Journal of medical systems, 2012; 36(4):2431-2448.
- [3] Chen J, Guo M, Wang X, Liu B. A comprehensive review and comparison of different computational methods for protein remote homology detection. Briefings in Bioinformatics, 2018; 19(2): 231-244.
- [4] Liao L, Noble WS. Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. Journal of computational biology, 2003; 10(6), 857-868.
- [5] Lovato P, Cristani M, Bicego M. Soft Ngram representation and modeling for protein remote homology detection. IEEE/ACM transactions on computational biology and bioinformatics, 2016; 14(6), 1482-1488.
- [6] Dong QW. Lin L, Wang XL, Li MH. A pattern-based SVM for protein remote homology detection. In 2005 International Conference on Machine Learning and Cybernetics, 2005; Vol.6, 3363-3368. IEEE.
- [7] Beaume N, Ramstein G, Jacques Y. An expert-based approach for the identification of remote homologs. WCSB. 2008; (pp. 17-20).
- [8] Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of molecular biology, 1995; 247(4), 536-540.
- [9] Harris A, Jones SH. Words. In Writing for Performance. 2016; (pp. 19-35). Rotterdam, Netherlands: Sense.
- [10] Ofer D, Brandes N, Linial M. The language of proteins: NLP, machine learning & protein sequences. Computational and Structural Biotechnology Journal, 2021.