Araştırma Makalesi

Evaluation of Differences of Fast and High Accuracy Base Calling Models of Guppy on Variant Calling Using Low Coverage WGS Data

Cilt: 6 Sayı: 3 20 Aralık 2023
PDF İndir
TR EN

Evaluation of Differences of Fast and High Accuracy Base Calling Models of Guppy on Variant Calling Using Low Coverage WGS Data

Öz

Long-read sequencing technologies such as Oxford Nanopore Technologies (ONT) enabled researchers to sequence long reads fast and cost-effectively. ONT sequencing uses nanopores integrated into semiconductor surfaces and sequences the genomic materials using changes in current across the surface as each nucleotide passes through the nanopore. The default output of ONT sequencers is in FAST5 format. The first and one of the most important steps of ONT data analysis is the conversion of FAST5 files to FASTQ files using “base caller” tools. Generally, base caller tools pre-trained deep learning models to transform electrical signals into reads. Guppy, the most commonly used base caller, uses 2 main model types, fast and high accuracy. Since the computation duration is significantly different between these two models, the effect of models on the variant calling process has not been fully understood. This study aims to evaluate the effect of different models on performance on variant calling. In this study, 15 low-coverage long-read sequencing results coming from different flow cells of NA12878 (gold standard data) were used to compare the variant calling results of Guppy. Obtained results indicated that the number of output FASTQ files, read counts and average read lengths between fast and high accuracy models are not statistically significant while pass/fail ratios of the base called datasets are significantly higher in high accuracy models. Results also indicated that the difference in pass/fail ratios arises in a significant difference in the number of called Single Nucleotide Polymorphisms (SNPs), insertions and deletions (InDels). Interestingly the true positive rates of SNPs are not significantly different. These results show that using fast models for SNP calling does not affect the true positive rates statistically. The primary observation in this study, using fast models does not decrease the true positive rate but decreases the called variants that arise due to altered pass/fail ratios. Also, it is not advised to use fast models for InDel calling while both the number of InDels and true positive rates are significantly lower in fast models. This study, to the best of our knowledge, is the first study that evaluates the effect of different base calling models of Guppy, one of the most common and ONT-supported base callers, on variant calling.

Anahtar Kelimeler

Destekleyen Kurum

TUBITAK

Proje Numarası

20AG005

Teşekkür

We also thank Dr. Pınar Pir for advises on statistical testing.

Kaynakça

  1. Logsdon, Glennis A., Mitchell R. Vollger, and Evan E. Eichler. Long-Read Human Genome Sequencing and Its Applications. Nature Reviews Genetics 21, no. 10 (June 5, 2020): 597–614. https://doi.org/10.1038/s41576-020-0236-x
  2. Wang, Y., et al., Nanopore Sequencing Technology, Bioinformatics and Applications. Nature Biotechnology 39, no. 11 (November 1, 2021): 1348–65. https://doi.org/10.1038/s41587-021-01108-x.
  3. Loman, N. J., and R. A. Quinlan. Poretools: A Toolkit for Analyzing Nanopore Sequence Data. Bioinformatics 30, no. 23 (August 20, 2014): 3399–3401. https://doi.org/10.1093/bioinformatics/btu555.
  4. Peresini, P., et al., Nanopore Base Calling on the Edge. Bioinformatics 37, no. 24 (July 27, 2021): 4661–67. https://doi.org/10.1093/bioinformatics/btab528.
  5. Jain, M, et al. Nanopore Sequencing and Assembly of a Human Genome with Ultra-Long Reads. Nature Biotechnology 36, no. 4 (January 29, 2018): 338–45. https://doi.org/10.1038/nbt.4060
  6. aws/aws-cli: Universal Command Line Interface for Amazon Web Services. https://github.com/aws/aws-cli
  7. Li, H., Minimap2: Pairwise Alignment for Nucleotide Sequences. Bioinformatics 34, no. 18 (May 10, 2018): 3094–3100. https://doi.org/10.1093/bioinformatics/bty191.
  8. Heng, L., et al., The Sequence Alignment/Map Format and SAMtools. Bioinformatics 25, no. 16 (June 8, 2009): 2078–79. https://doi.org/10.1093/bioinformatics/btp352

Ayrıntılar

Birincil Dil

İngilizce

Konular

Yapısal Biyoloji , Mühendislik

Bölüm

Araştırma Makalesi

Erken Görünüm Tarihi

1 Aralık 2023

Yayımlanma Tarihi

20 Aralık 2023

Gönderilme Tarihi

1 Haziran 2023

Kabul Tarihi

9 Temmuz 2023

Yayımlandığı Sayı

Yıl 2023 Cilt: 6 Sayı: 3

Kaynak Göster

APA
Karakurt, H. U., Pekcan, H. A., Kahraman, A., Jihad, M., Akgün, B., Oksuz, C., & Onay, B. (2023). Evaluation of Differences of Fast and High Accuracy Base Calling Models of Guppy on Variant Calling Using Low Coverage WGS Data. International Journal of Life Sciences and Biotechnology, 6(3), 276-287. https://doi.org/10.38001/ijlsb.1308355
AMA
1.Karakurt HU, Pekcan HA, Kahraman A, vd. Evaluation of Differences of Fast and High Accuracy Base Calling Models of Guppy on Variant Calling Using Low Coverage WGS Data. Int J. Life Sci. Biotechnol. 2023;6(3):276-287. doi:10.38001/ijlsb.1308355
Chicago
Karakurt, Hamza Umut, Hasan Ali Pekcan, Ayşe Kahraman, vd. 2023. “Evaluation of Differences of Fast and High Accuracy Base Calling Models of Guppy on Variant Calling Using Low Coverage WGS Data”. International Journal of Life Sciences and Biotechnology 6 (3): 276-87. https://doi.org/10.38001/ijlsb.1308355.
EndNote
Karakurt HU, Pekcan HA, Kahraman A, Jihad M, Akgün B, Oksuz C, Onay B (01 Aralık 2023) Evaluation of Differences of Fast and High Accuracy Base Calling Models of Guppy on Variant Calling Using Low Coverage WGS Data. International Journal of Life Sciences and Biotechnology 6 3 276–287.
IEEE
[1]H. U. Karakurt vd., “Evaluation of Differences of Fast and High Accuracy Base Calling Models of Guppy on Variant Calling Using Low Coverage WGS Data”, Int J. Life Sci. Biotechnol., c. 6, sy 3, ss. 276–287, Ara. 2023, doi: 10.38001/ijlsb.1308355.
ISNAD
Karakurt, Hamza Umut - Pekcan, Hasan Ali - Kahraman, Ayşe - Jihad, Muntadher - Akgün, Bilçağ - Oksuz, Cuneyt - Onay, Bahadır. “Evaluation of Differences of Fast and High Accuracy Base Calling Models of Guppy on Variant Calling Using Low Coverage WGS Data”. International Journal of Life Sciences and Biotechnology 6/3 (01 Aralık 2023): 276-287. https://doi.org/10.38001/ijlsb.1308355.
JAMA
1.Karakurt HU, Pekcan HA, Kahraman A, Jihad M, Akgün B, Oksuz C, Onay B. Evaluation of Differences of Fast and High Accuracy Base Calling Models of Guppy on Variant Calling Using Low Coverage WGS Data. Int J. Life Sci. Biotechnol. 2023;6:276–287.
MLA
Karakurt, Hamza Umut, vd. “Evaluation of Differences of Fast and High Accuracy Base Calling Models of Guppy on Variant Calling Using Low Coverage WGS Data”. International Journal of Life Sciences and Biotechnology, c. 6, sy 3, Aralık 2023, ss. 276-87, doi:10.38001/ijlsb.1308355.
Vancouver
1.Hamza Umut Karakurt, Hasan Ali Pekcan, Ayşe Kahraman, Muntadher Jihad, Bilçağ Akgün, Cuneyt Oksuz, Bahadır Onay. Evaluation of Differences of Fast and High Accuracy Base Calling Models of Guppy on Variant Calling Using Low Coverage WGS Data. Int J. Life Sci. Biotechnol. 01 Aralık 2023;6(3):276-87. doi:10.38001/ijlsb.1308355

Cited By


Sosyal ağlarda bizi takip edin   19277 19276 20153 22366