In this paper it is aimed to determine which of the
protein features or attributes is the most significant for classification of
proteins according to their folds. Proteins in the database used in this study
are represented by six feature groups called attributes and by a 125-dimensional
feature vector. The representation of proteins with very high dimensional
vectors such as 125 causes increasing computational load of the classification
process and extending the process time. In this study “dimension reduction”
solution is offered for this negative situation. Hence, with two different
approaches, the features and attributes having high classification performance
are determined. In the first approach, which attribute gives higher performance
is determined by testing separately each of the six attributes. In the second
approach, the most significant of the 125 features are determined using
Divergence Analysis method. In this study, a classic classifier KNN (K-nearest
neighbor) and artificial neural network models GAL (Grow and Learn) and SOM
(Self-Organizing Map) networks are used as classifier and classification
performance is analyzed for reduced dimension datasets.
Primary Language | English |
---|---|
Journal Section | Research Articles |
Authors | |
Publication Date | April 15, 2019 |
Submission Date | February 28, 2018 |
Acceptance Date | January 13, 2019 |
Published in Issue | Year 2019 Volume: 3 Issue: 1 |