In this paper it is aimed to determine which of the
protein features or attributes is the most significant for classification of
proteins according to their folds. Proteins in the database used in this study
are represented by six feature groups called attributes and by a 125-dimensional
feature vector. The representation of proteins with very high dimensional
vectors such as 125 causes increasing computational load of the classification
process and extending the process time. In this study “dimension reduction”
solution is offered for this negative situation. Hence, with two different
approaches, the features and attributes having high classification performance
are determined. In the first approach, which attribute gives higher performance
is determined by testing separately each of the six attributes. In the second
approach, the most significant of the 125 features are determined using
Divergence Analysis method. In this study, a classic classifier KNN (K-nearest
neighbor) and artificial neural network models GAL (Grow and Learn) and SOM
(Self-Organizing Map) networks are used as classifier and classification
performance is analyzed for reduced dimension datasets.
divergence analysis fold recognition attrbutes neural networks protein fold classification
Birincil Dil | İngilizce |
---|---|
Bölüm | Research Articles |
Yazarlar | |
Yayımlanma Tarihi | 15 Nisan 2019 |
Gönderilme Tarihi | 28 Şubat 2018 |
Kabul Tarihi | 13 Ocak 2019 |
Yayımlandığı Sayı | Yıl 2019 Cilt: 3 Sayı: 1 |