Research Article
BibTex RIS Cite

A Framework for Query Optimization Algorithms for Biological Data

Year 2019, Volume: 5 Issue: 2, 76 - 79, 31.07.2019
https://doi.org/10.22399/ijcesen.508889

Abstract

Recently,
the size of biological databases has significantly increased, with a
growing number of users and rates of queries. As a result, some
databases have reached a terabyte size. On the other hand, the need
to access the databases at the fastest possible rates is increasing.
At this point, the computer scientists could assist to organize the
data and query in a way that allows biologists to quickly search
existing information. In this paper, a query model for DNA and
protein sequence datasets is proposed. This method of dealing with
the query can effectively and rapidly retrieve all similar
proteins/DNA from a large database. A theoretical and conceptual
proposed framework is derived using query techniques form different
applications. The results show that the query optimization algorithms
reduce the query processing time in comparison with the normal query
searching algorithm.

References

  • [1] D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and E. W. Sayers (2010). Genbank, Nucleic acids research. 38(1), D46-D51, DOI: 10.1093/nar/gkx1094.
  • [2] P. Rice, I. Longden, and A. Bleasby (2000). Emboss: the european molecular biology open software suite. Trends in genetics, 16 (6), 276-277. DOI: 10.1016/S0168-9525(00)02024-2
  • [3] A. Bairoch and R. Apweiler (2000). The swiss-prot protein sequence database and its supplement trembl in 2000. Nucleic acids research, 28 (1), 45-48. DOI:10.1093/nar/28.1.45
  • [4] K. D. Pruitt, T. Tatusova, and D. R. Maglott (2007), Ncbi reference sequences (refseq): a curated non-redundant sequence database of genomes, transcripts and proteins, Nucleic acids research. 35(1), D61-D65. doi:  [10.1093/nar/gkl842]
  • [5] P. Librado and J. Rozas, Dnasp (2009). v5: a software for comprehensive analysis of dna polymorphism data, Bioinformatics. 25(11), 1451-1452. DOI: 10.1093/bioinformatics/btp187
  • [6] C. Plot (2000). The sequence manipulation suite: Javascript programs for analyzing and formatting protein and dna sequences, Biotechniques. 28(6), 1102-1104. DOI:10.2144/00286ir01
  • [7] Jaber, K. M., Abdullah, R., and Rashid, N (2014). A. Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model. International Journal of Bioinformatics Research and Applications. 10(3), 321-340.  doi: 10.1504/IJBRA.2014.060765.
  • [8] R. J. Block, D. Bolling et al. (1945). The amino acid composition of proteins and foods. analytical methods and results. The amino acid composition of proteins and foods. Analytical methods and results. 17(4).
  • [9] R. Leinonen, R. Akhtar, E. Birney, L. Bower, A. Cerdeno-Tarraga, Y. Cheng, I. Cleland, N. Faruque, N. Goodgame, R. Gibson et al. (2011). The european nucleotide archive, Nucleic acids research. 39, D28-D31.
  • [10] Ian Korf, M.Y., Joseph Bedell (2003). BLAST.
  • [11] Rieffel, M. A., Gill, T. G. and White, W. R. (2004). Bioinformatics clusters in action., Cluster World.
  • [12] Prasan Roy(2000). Rule-Based Query Optimization using the Volcano Framework., PhD thesis, IIT Bombay.
  • [13] NCBI Website, URL: http://blast.ncbi.nlm.nih.gov,2018.
  • [14] Whitford, D., Proteins (2005). Structure and Function., 1 Edition, Wiley, 2005.
  • [15] DDBJ Database Available at: http://www.ddbj.nig.ac.jp/breakdown_stats/dbgrowth-old-e.html. [Accessed 12 April 2017].
Year 2019, Volume: 5 Issue: 2, 76 - 79, 31.07.2019
https://doi.org/10.22399/ijcesen.508889

Abstract

References

  • [1] D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and E. W. Sayers (2010). Genbank, Nucleic acids research. 38(1), D46-D51, DOI: 10.1093/nar/gkx1094.
  • [2] P. Rice, I. Longden, and A. Bleasby (2000). Emboss: the european molecular biology open software suite. Trends in genetics, 16 (6), 276-277. DOI: 10.1016/S0168-9525(00)02024-2
  • [3] A. Bairoch and R. Apweiler (2000). The swiss-prot protein sequence database and its supplement trembl in 2000. Nucleic acids research, 28 (1), 45-48. DOI:10.1093/nar/28.1.45
  • [4] K. D. Pruitt, T. Tatusova, and D. R. Maglott (2007), Ncbi reference sequences (refseq): a curated non-redundant sequence database of genomes, transcripts and proteins, Nucleic acids research. 35(1), D61-D65. doi:  [10.1093/nar/gkl842]
  • [5] P. Librado and J. Rozas, Dnasp (2009). v5: a software for comprehensive analysis of dna polymorphism data, Bioinformatics. 25(11), 1451-1452. DOI: 10.1093/bioinformatics/btp187
  • [6] C. Plot (2000). The sequence manipulation suite: Javascript programs for analyzing and formatting protein and dna sequences, Biotechniques. 28(6), 1102-1104. DOI:10.2144/00286ir01
  • [7] Jaber, K. M., Abdullah, R., and Rashid, N (2014). A. Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model. International Journal of Bioinformatics Research and Applications. 10(3), 321-340.  doi: 10.1504/IJBRA.2014.060765.
  • [8] R. J. Block, D. Bolling et al. (1945). The amino acid composition of proteins and foods. analytical methods and results. The amino acid composition of proteins and foods. Analytical methods and results. 17(4).
  • [9] R. Leinonen, R. Akhtar, E. Birney, L. Bower, A. Cerdeno-Tarraga, Y. Cheng, I. Cleland, N. Faruque, N. Goodgame, R. Gibson et al. (2011). The european nucleotide archive, Nucleic acids research. 39, D28-D31.
  • [10] Ian Korf, M.Y., Joseph Bedell (2003). BLAST.
  • [11] Rieffel, M. A., Gill, T. G. and White, W. R. (2004). Bioinformatics clusters in action., Cluster World.
  • [12] Prasan Roy(2000). Rule-Based Query Optimization using the Volcano Framework., PhD thesis, IIT Bombay.
  • [13] NCBI Website, URL: http://blast.ncbi.nlm.nih.gov,2018.
  • [14] Whitford, D., Proteins (2005). Structure and Function., 1 Edition, Wiley, 2005.
  • [15] DDBJ Database Available at: http://www.ddbj.nig.ac.jp/breakdown_stats/dbgrowth-old-e.html. [Accessed 12 April 2017].
There are 15 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Research Articles
Authors

Khalid Mohammad Jaber 0000-0002-8458-401X

Nesreen A. Hamad This is me

Fatima M. Quıam This is me

Publication Date July 31, 2019
Submission Date January 6, 2019
Acceptance Date June 10, 2019
Published in Issue Year 2019 Volume: 5 Issue: 2

Cite

APA Jaber, K. M., Hamad, N. A., & Quıam, F. M. (2019). A Framework for Query Optimization Algorithms for Biological Data. International Journal of Computational and Experimental Science and Engineering, 5(2), 76-79. https://doi.org/10.22399/ijcesen.508889