DNA sequence classification is an important challenge in genomic studies due to non-linear and chaotic behavior of DNA oxidation signals of Adenine, Cytosine, Guanine, and Thymine bases. To achieve genotype identification of samples derived from biological sources accurately, Machine Learning (ML) methods have been commonly preferred instead of expert-based methods due to the ability in handling such these complex-structured biological sequences. Reducing the dimension without sacrificing important information that should not be omitted during the classification process is an important task in ML applications. This study presents a new feature extraction method to detect two sub-types of hepatitis nucleic acid trace files. The proposed method combines both discrete wavelet transform (DWT) and entropy. The DWT decomposes the bases signals up to three levels and thus all necessary information that is hidden in both spatial and frequency domains is aimed to captured. To achieve a good summarization of DNA trace files having different length, multi-scale permutation entropy (MPE) measures are then computed from approximate and detail coefficients o f signals s tored in the s ub-bands. Different feature sets are extracted with the proposed method using real data covering 200 hepatitis DNA trace files and then fed to a simple memory-based learning classifier, k-NN. The classification performance of the proposed feature extraction method iscompared against a method based on MPE features without wavelet decomposition.The results indicate, in classifying hepatitis DNA trace files, the average accuracyreaches up to nearly 99% with feature sets based on proposed method even at 30%training samples proportion.
Discrete Wavelet Decomposition DNA Capillary Electrophoresis Signal Feature Extraction Multiscale Permutation Entropy Nucleic Acid Sequencing
Primary Language | English |
---|---|
Subjects | Engineering |
Journal Section | Research Articles |
Authors | |
Publication Date | October 9, 2022 |
Submission Date | October 24, 2020 |
Published in Issue | Year 2022 Volume: 40 Issue: 3 |
IMPORTANT NOTE: JOURNAL SUBMISSION LINK https://eds.yildiz.edu.tr/sigma/