Araştırma Makalesi

Assessment of Mutation Susceptibility in DNA Sequences with Word Vectors

Cilt: 3 Sayı: 1 20 Mart 2020
PDF İndir
EN

Assessment of Mutation Susceptibility in DNA Sequences with Word Vectors

Öz

With the advent of natural language processing (NLP) techniques empowered with deep learning approaches, more detailed relationships between words have been unraveled. Word2Vec is quite robust in discovering contextual and semantic relationships. Genome being a long text, is subject to similar studies to unravel yet to be discovered relationships between DNA k-mers. Dna2vec applies Word2Vec approach to whole genome so that DNA k-mers are represented as vectors. The cosine similarity queries on DNA vectors reveal unusual relationships between DNA k-mers. In this study, we examined DNA sequence based prediction of mutation susceptibility. Initially,we generated word vectors for human and mouse genome via dna2vec. On the other hand, we retrieved coordinates of common and all mutations from dbSNP. For each coordinate, we extracted 8 nucleotide k-mers intersecting mutations and results are aggregated. such a way that number of mutations for each 8-mer has been tabulated. These results are incorporated with dna2vec cosine similarity data. Our results showed that for a given k-mer, k-mers with highest cosine similarity coincide with highest mutation count k-mer. In other words, the neighbor with the highest cosine similarity for a k-mer was also seen to be the neighbor overlapping the mutation count. As a result of our studies, human and mouse, dna2vec vs. mutation overlap is 80% and 70%, respectively. In conclusion, dna2vec and other word embedding approaches can be used to reveal mutation or variation characteristics of genomes without sequencing or experimental data, solely using the genome sequence itself. This might pave the way for understanding the underlying mechanism or dynamics of mutations in genomes.

Anahtar Kelimeler

Kaynakça

  1. Abdul-Mageed, M., & Ungar, L. (2017, July). Emonet: Fine-grained emotion detection with gated recurrent neural networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 718-728).
  2. Akın, A. A., & Akın, M. D. (2007). Zemberek, an open source nlp framework for turkic languages. Structure, 10, 1-5.
  3. Arora, S., Li, Y., Liang, Y., Ma, T., & Risteski, A. (2018). Linear algebraic structure of word senses, with applications to polysemy. Transactions of the Association for Computational Linguistics, 6, 483-495.
  4. Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information processing systems (pp. 4349-4357).
  5. Chen, M. (2017). Efficient vector representation for documents through corruption. arXiv preprint arXiv:1707.02377.
  6. Chen, X., & Lawrence Zitnick, C. (2015). Mind's eye: A recurrent visual representation for image caption generation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2422-2431).
  7. De Boom, C., Van Canneyt, S., Bohez, S., Demeester, T., & Dhoedt, B. (2015, November). Learning semantic similarity for very short texts. In 2015 ieee international conference on data mining workshop (icdmw) (pp. 1229-1234). IEEE.
  8. Dos Santos, C., & Gatti, M. (2014, August). Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (pp. 69-78).

Ayrıntılar

Birincil Dil

İngilizce

Konular

Yapay Zeka

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

20 Mart 2020

Gönderilme Tarihi

14 Ocak 2020

Kabul Tarihi

7 Şubat 2020

Yayımlandığı Sayı

Yıl 2020 Cilt: 3 Sayı: 1

Kaynak Göster

APA
Yılmaz, A. (2020). Assessment of Mutation Susceptibility in DNA Sequences with Word Vectors. Journal of Intelligent Systems: Theory and Applications, 3(1), 1-6. https://doi.org/10.38016/jista.674910
AMA
1.Yılmaz A. Assessment of Mutation Susceptibility in DNA Sequences with Word Vectors. jista. 2020;3(1):1-6. doi:10.38016/jista.674910
Chicago
Yılmaz, Alper. 2020. “Assessment of Mutation Susceptibility in DNA Sequences with Word Vectors”. Journal of Intelligent Systems: Theory and Applications 3 (1): 1-6. https://doi.org/10.38016/jista.674910.
EndNote
Yılmaz A (01 Mart 2020) Assessment of Mutation Susceptibility in DNA Sequences with Word Vectors. Journal of Intelligent Systems: Theory and Applications 3 1 1–6.
IEEE
[1]A. Yılmaz, “Assessment of Mutation Susceptibility in DNA Sequences with Word Vectors”, jista, c. 3, sy 1, ss. 1–6, Mar. 2020, doi: 10.38016/jista.674910.
ISNAD
Yılmaz, Alper. “Assessment of Mutation Susceptibility in DNA Sequences with Word Vectors”. Journal of Intelligent Systems: Theory and Applications 3/1 (01 Mart 2020): 1-6. https://doi.org/10.38016/jista.674910.
JAMA
1.Yılmaz A. Assessment of Mutation Susceptibility in DNA Sequences with Word Vectors. jista. 2020;3:1–6.
MLA
Yılmaz, Alper. “Assessment of Mutation Susceptibility in DNA Sequences with Word Vectors”. Journal of Intelligent Systems: Theory and Applications, c. 3, sy 1, Mart 2020, ss. 1-6, doi:10.38016/jista.674910.
Vancouver
1.Alper Yılmaz. Assessment of Mutation Susceptibility in DNA Sequences with Word Vectors. jista. 01 Mart 2020;3(1):1-6. doi:10.38016/jista.674910

Cited By

Zeki Sistemler Teori ve Uygulamaları Dergisi