A Framework for Query Optimization Algorithms for Biological Data

Khalid Mohammad Jaber; Nesreen A. Hamad; Fatima M. Quıam

doi:10.22399/ijcesen.508889

Research Article

A Framework for Query Optimization Algorithms for Biological Data

Year 2019, , 76 - 79, 31.07.2019

Khalid Mohammad Jaber , Nesreen A. Hamad Fatima M. Quıam

https://doi.org/10.22399/ijcesen.508889

Abstract

Recently,
the size of biological databases has significantly increased, with a
growing number of users and rates of queries. As a result, some
databases have reached a terabyte size. On the other hand, the need
to access the databases at the fastest possible rates is increasing.
At this point, the computer scientists could assist to organize the
data and query in a way that allows biologists to quickly search
existing information. In this paper, a query model for DNA and
protein sequence datasets is proposed. This method of dealing with
the query can effectively and rapidly retrieve all similar
proteins/DNA from a large database. A theoretical and conceptual
proposed framework is derived using query techniques form different
applications. The results show that the query optimization algorithms
reduce the query processing time in comparison with the normal query
searching algorithm.

Keywords

Query Optimization, Searching Algorithms, bioinformatics, parallel computing

References

[1] D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and E. W. Sayers (2010). Genbank, Nucleic acids research. 38(1), D46-D51, DOI: 10.1093/nar/gkx1094.
[2] P. Rice, I. Longden, and A. Bleasby (2000). Emboss: the european molecular biology open software suite. Trends in genetics, 16 (6), 276-277. DOI: 10.1016/S0168-9525(00)02024-2
[3] A. Bairoch and R. Apweiler (2000). The swiss-prot protein sequence database and its supplement trembl in 2000. Nucleic acids research, 28 (1), 45-48. DOI:10.1093/nar/28.1.45
[4] K. D. Pruitt, T. Tatusova, and D. R. Maglott (2007), Ncbi reference sequences (refseq): a curated non-redundant sequence database of genomes, transcripts and proteins, Nucleic acids research. 35(1), D61-D65. doi: [10.1093/nar/gkl842]
[5] P. Librado and J. Rozas, Dnasp (2009). v5: a software for comprehensive analysis of dna polymorphism data, Bioinformatics. 25(11), 1451-1452. DOI: 10.1093/bioinformatics/btp187
[6] C. Plot (2000). The sequence manipulation suite: Javascript programs for analyzing and formatting protein and dna sequences, Biotechniques. 28(6), 1102-1104. DOI:10.2144/00286ir01
[7] Jaber, K. M., Abdullah, R., and Rashid, N (2014). A. Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model. International Journal of Bioinformatics Research and Applications. 10(3), 321-340. doi: 10.1504/IJBRA.2014.060765.
[8] R. J. Block, D. Bolling et al. (1945). The amino acid composition of proteins and foods. analytical methods and results. The amino acid composition of proteins and foods. Analytical methods and results. 17(4).
[9] R. Leinonen, R. Akhtar, E. Birney, L. Bower, A. Cerdeno-Tarraga, Y. Cheng, I. Cleland, N. Faruque, N. Goodgame, R. Gibson et al. (2011). The european nucleotide archive, Nucleic acids research. 39, D28-D31.
[10] Ian Korf, M.Y., Joseph Bedell (2003). BLAST.
[11] Rieffel, M. A., Gill, T. G. and White, W. R. (2004). Bioinformatics clusters in action., Cluster World.
[12] Prasan Roy(2000). Rule-Based Query Optimization using the Volcano Framework., PhD thesis, IIT Bombay.
[13] NCBI Website, URL: http://blast.ncbi.nlm.nih.gov,2018.
[14] Whitford, D., Proteins (2005). Structure and Function., 1 Edition, Wiley, 2005.
[15] DDBJ Database Available at: http://www.ddbj.nig.ac.jp/breakdown_stats/dbgrowth-old-e.html. [Accessed 12 April 2017].

Year 2019, , 76 - 79, 31.07.2019

Khalid Mohammad Jaber , Nesreen A. Hamad Fatima M. Quıam

https://doi.org/10.22399/ijcesen.508889

Abstract

References

[1] D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and E. W. Sayers (2010). Genbank, Nucleic acids research. 38(1), D46-D51, DOI: 10.1093/nar/gkx1094.
[2] P. Rice, I. Longden, and A. Bleasby (2000). Emboss: the european molecular biology open software suite. Trends in genetics, 16 (6), 276-277. DOI: 10.1016/S0168-9525(00)02024-2
[3] A. Bairoch and R. Apweiler (2000). The swiss-prot protein sequence database and its supplement trembl in 2000. Nucleic acids research, 28 (1), 45-48. DOI:10.1093/nar/28.1.45
[4] K. D. Pruitt, T. Tatusova, and D. R. Maglott (2007), Ncbi reference sequences (refseq): a curated non-redundant sequence database of genomes, transcripts and proteins, Nucleic acids research. 35(1), D61-D65. doi: [10.1093/nar/gkl842]
[5] P. Librado and J. Rozas, Dnasp (2009). v5: a software for comprehensive analysis of dna polymorphism data, Bioinformatics. 25(11), 1451-1452. DOI: 10.1093/bioinformatics/btp187
[6] C. Plot (2000). The sequence manipulation suite: Javascript programs for analyzing and formatting protein and dna sequences, Biotechniques. 28(6), 1102-1104. DOI:10.2144/00286ir01
[7] Jaber, K. M., Abdullah, R., and Rashid, N (2014). A. Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model. International Journal of Bioinformatics Research and Applications. 10(3), 321-340. doi: 10.1504/IJBRA.2014.060765.
[8] R. J. Block, D. Bolling et al. (1945). The amino acid composition of proteins and foods. analytical methods and results. The amino acid composition of proteins and foods. Analytical methods and results. 17(4).
[9] R. Leinonen, R. Akhtar, E. Birney, L. Bower, A. Cerdeno-Tarraga, Y. Cheng, I. Cleland, N. Faruque, N. Goodgame, R. Gibson et al. (2011). The european nucleotide archive, Nucleic acids research. 39, D28-D31.
[10] Ian Korf, M.Y., Joseph Bedell (2003). BLAST.
[11] Rieffel, M. A., Gill, T. G. and White, W. R. (2004). Bioinformatics clusters in action., Cluster World.
[12] Prasan Roy(2000). Rule-Based Query Optimization using the Volcano Framework., PhD thesis, IIT Bombay.
[13] NCBI Website, URL: http://blast.ncbi.nlm.nih.gov,2018.
[14] Whitford, D., Proteins (2005). Structure and Function., 1 Edition, Wiley, 2005.
[15] DDBJ Database Available at: http://www.ddbj.nig.ac.jp/breakdown_stats/dbgrowth-old-e.html. [Accessed 12 April 2017].

There are 15 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Research Articles
Authors	Khalid Mohammad Jaber 0000-0002-8458-401X Nesreen A. Hamad This is me Fatima M. Quıam This is me
Publication Date	July 31, 2019
Submission Date	January 6, 2019
Acceptance Date	June 10, 2019
Published in Issue	Year 2019

Cite

APA	Jaber, K. M., Hamad, N. A., & Quıam, F. M. (2019). A Framework for Query Optimization Algorithms for Biological Data. International Journal of Computational and Experimental Science and Engineering, 5(2), 76-79. https://doi.org/10.22399/ijcesen.508889