Due to the heterogeneous structure of cancerous tissues, they have several subclasses. Unless the subclasses are detected, the cancer treatment cannot be carried out accurately. With the advent of microarray gene technology and data science technology, employing machine learning methods that use the microarray gene expression data of the cancerous tissues for classifying the cancer subclasses has gained an increasing popularity. However, as there exists one feature for each gene, the issue of the curse of dimensionality arises. In the present study, the microarray gene expression data of various cancer types were transferred to some dimensionality reduced spaces by the means of three metric learning methods: LMNN, ITML and NCA. As a result, the instances of the same classes come closer in the reduced space; while those from different classes locate far from each other, which is different from the conventional dimensionality reduction methods, such as PCA, do. To verify this, dimensionality reduced spaces created by the t-SNE method were monitored. Additionally, to show that the classification algorithms perform better in such new spaces, instance-based classifiers, e.g. k-NN, the nearest mean classifier and the LVQ, were built and then it was observed that the performances of the classifiers increased up to 30% in comparison with their performances in the original space.
Cancer Classification Metric Learning Microarray Gene Expressions Instance-based Classification
Primary Language | Turkish |
---|---|
Subjects | Engineering |
Journal Section | Articles |
Authors | |
Publication Date | October 31, 2021 |
Published in Issue | Year 2021 |