Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) causes the COVID-19 disease, which turns into a pandemic and threatens public health. Appearing of SARS-CoV-2 variants show a significant challenge to determine the risk of infection, develop vaccines as well as antiviral agents, monitor the changes, and assess the evolution of SARS-CoV-2. In this study, we propose a method identifying SARS-CoV-2 variants in Turkey. To achieve this goal, nucleotide occurrences are computed from the whole genome sequences that include four nucleotides, A, C, T, and G. Thus, 30 000 bps genome sequences are represented by only four integer numbers. After features are extracted, four classification methods, support vector machines, k-nearest neighbor, neural network, and decision tree are employed to identify SARS-CoV-2 variants. Experimental results are conducted on a dataset including 1403 genome sequences from Turkey and belonging to variants of SARS-CoV-2, B.1.1.7 (Alpha), B.1.351 (Beta), P.1. (Gamma), as well as B.1.617 (Delta). Experimental results present that the KNN classifier achieves an accuracy of 0.94, a precision of 0.81, a recall of 0.80, and an F-score of 0.80 on average.
Primary Language | English |
---|---|
Subjects | Engineering |
Journal Section | Research Article |
Authors | |
Publication Date | June 30, 2022 |
Submission Date | April 8, 2022 |
Acceptance Date | April 25, 2022 |
Published in Issue | Year 2022 Volume: 6 Issue: 1 |