Determination of highly effective attributes in fold level classification of proteins
Abstract
In this paper it is aimed to determine which of the protein features or attributes is the most significant for classification of proteins according to their folds. Proteins in the database used in this study are represented by six feature groups called attributes and by a 125-dimensional feature vector. The representation of proteins with very high dimensional vectors such as 125 causes increasing computational load of the classification process and extending the process time. In this study “dimension reduction” solution is offered for this negative situation. Hence, with two different approaches, the features and attributes having high classification performance are determined. In the first approach, which attribute gives higher performance is determined by testing separately each of the six attributes. In the second approach, the most significant of the 125 features are determined using Divergence Analysis method. In this study, a classic classifier KNN (K-nearest neighbor) and artificial neural network models GAL (Grow and Learn) and SOM (Self-Organizing Map) networks are used as classifier and classification performance is analyzed for reduced dimension datasets.
Keywords
References
- 1. Hashemi, H.B., Shakery, A., Naeini, M.P, Protein fold pattern recognition using Bayesian ensemble of RBF neural networks, in SOCPAR2009: Malaysia. p. 436-441.
- 2. Cantoni, V., Ferone, A., Ozbudak, O. and Petrosino, A., Searching structural blocks by SS exhaustive matching, Lecture Notes in Bioinformatics. Leif Peterson, Giuseppe Russo, Francesco Masulli (Eds.), 2013. p. 57-69.
- 3. Protein Data Bank, http://www.rcsb.org, last access date: 31.12.2018.
- 4. Murzin, A.G., Brenner, S.E., Hubbard, T. and Chothia, C., SCOP: A structural classification of proteins database for the investigation of sequences and structures, Journal of Molecular Biology, 1995. 247(4), p. 536–540.
- 5. Dubchak, I., Muchnik, I., Mayor, C., Dralyuk, I. and Kim, S.H., Recognition of a protein fold in the context of the structural classifications of proteins (SCOP) classification, Proteins: Structure, Function and Bioinformatics, 1999. 35(4), p. 401–407.
- 6. Reczko, M. and Bohr, H., The DEF data base of sequence based protein fold class predictions, Nucleic acids research, 1994. 22(17), p. 3616-3619.
- 7. Edler, L., Grassmann, J. and Suhai, S., Role and results of statistical methods in protein fold class prediction, Mathematical and Computer Modelling, 2001. 33(12), p. 1401–1417.
- 8. Ding, C.H.Q. and Dubchak, I., Multi-class protein fold recognition problem using support vector machines and neural networks, Bioinformatics, 2001. 17(4), p. 349–358.
Details
Primary Language
English
Subjects
-
Journal Section
Research Article
Authors
Özlem Polat
*
0000-0002-9395-4465
Türkiye
Publication Date
April 15, 2019
Submission Date
February 28, 2018
Acceptance Date
January 13, 2019
Published in Issue
Year 2019 Volume: 3 Number: 1
