Research Article
BibTex RIS Cite

Big data Analysis in Plant Science and Machine Learning Tool Applications in Genomics and Proteomics

Year 2018, Volume: 4 Issue: 2, 23 - 31, 02.07.2018
https://doi.org/10.22399/ijcesen.414984

Abstract

Abstract: Data extensions in plant biology and drastically increasing data volume in this field impose the scientists analyzing data by means of smart computer systems. Since, manually analyzing huge amount of data is cumbersome and even impossible. A comparative study of proteins a wide scale, is the proteomics knowledge. Nowadays, the proteomics analysis is considered as one of the most important methods in genomics and of the gene expression studies. Large amounts of data are big challenges in plant biology. Biological communities either need to create data making compatible with the parallel computing and the data management associated with its infrastructures or are looking for novel analytical patterns to extract information from a large amount of data. Machine learning provides promising analytical and computational solutions for large, heterogeneous, non-structured datasets for large-scale data, especially for the proteomics data. In particular, a conceptual review and applicable methods of machine learning are described by predicting that how machine learning with massive data technology can be an interface to facilitate basic researches and biotechnology plant sciences.

References

  • [1] A. Benso, S. Di Carlo, H. Ur Rehman, G. Politano, A. Savino, and P. Suravajhala, "A combined approach for genome wide protein function annotation/prediction," Proteome science, vol. 11, no. 1, p. S1, 2013.[2] K. W. Earley et al., "Gateway‐compatible vectors for plant functional genomics and proteomics," The Plant Journal, vol. 45, no. 4, pp. 616-629, 2006.[3] P. A. Rudnick et al., "Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses," Molecular & Cellular Proteomics, vol. 9, no. 2, pp. 225-241, 2010.[4] A. L. Tarca, V. J. Carey, X.-w. Chen, R. Romero, and S. Drăghici, "Machine learning and its applications to biology," PLoS computational biology, vol. 3, no. 6, p. e116, 2007.[5] A. Pandey and M. Mann, "Proteomics to study genes and genomes," Nature, vol. 405, no. 6788, pp. 837-846, 2000.[6] B. Domon and R. Aebersold, "Mass spectrometry and protein analysis," science, vol. 312, no. 5771, pp. 212-217, 2006.[7] R. Opgen-Rhein and K. Strimmer, "From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data," BMC systems biology, vol. 1, no. 1, p. 37, 2007.[8] V. Marx, "Biology: The big challenges of big data," Nature, vol. 498, no. 7453, pp. 255-260, 2013.[9] L. N. Mueller, M.-Y. Brusniak, D. Mani, and R. Aebersold, "An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data," Journal of proteome research, vol. 7, no. 01, pp. 51-61, 2008.[10] M. Wilkins, "Proteomics data mining," Expert review of proteomics, vol. 6, no. 6, pp. 599-603, 2009.[11] H. Lu, R. Setiono, and H. Liu, "Neurorule: A connectionist approach to data mining," arXiv preprint arXiv:1701.01358, 2017.[12] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016.[13] X. Wu, X. Zhu, G.-Q. Wu, and W. Ding, "Data mining with big data," IEEE transactions on knowledge and data engineering, vol. 26, no. 1, pp. 97-107, 2014.[14] M. Hardt, E. Price, and N. Srebro, "Equality of opportunity in supervised learning," in Advances in Neural Information Processing Systems, 2016, pp. 3315-3323.[15] A. El Azab, M. A. Mahmood, and A. El-Aziz, "Effectiveness of Web Usage Mining Techniques in Business Application," in Web Usage Mining Techniques and Applications Across Industries: IGI Global, 2017, pp. 324-350.[16] Y.-D. Seo, Y.-G. Kim, E. Lee, and D.-K. Baik, "Personalized recommender system based on friendship strength in social network services," Expert Systems with Applications, vol. 69, pp. 135-148, 2017.[17] N. Gandhi and L. J. Armstrong, "A review of the application of data mining techniques for decision making in agriculture," in Contemporary Computing and Informatics (IC3I), 2016 2nd International Conference on, 2016, pp. 1-6: IEEE.[18] S. Haug, A. Michaels, P. Biber, and J. Ostermann, "Plant classification system for crop/weed discrimination without segmentation," in Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on, 2014, pp. 1142-1149: IEEE.[19] F. Cannarile et al., "An unsupervised clustering method for assessing the degradation state of cutting tools used in the packaging industry," in on Proceedings of European Safety and Relaibility Conference, ESREL, 2017.[20] H. Kaneko and K. Funatsu, "Adaptive soft sensor based on online support vector regression and Bayesian ensemble learning for various states in chemical plants," Chemometrics and Intelligent Laboratory Systems, vol. 137, pp. 57-66, 2014.[21] P. W. Wilson, "FEAR: A software package for frontier efficiency analysis with R," Socio-economic planning sciences, vol. 42, no. 4, pp. 247-254, 2008.[22] A. C. Burns and R. F. Bush, Basic marketing research using Microsoft Excel data analysis. Prentice Hall Press, 2007.[23] J. Han, J. C. Rodriguez, and M. Beheshti, "Diabetes data analysis and prediction model discovery using rapidminer," in Future Generation Communication and Networking, 2008. FGCN'08. Second International Conference on, 2008, vol. 3, pp. 96-99: IEEE.[24] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: an update," ACM SIGKDD explorations newsletter, vol. 11, no. 1, pp. 10-18, 2009.[25] W. Menke, Geophysical data analysis: discrete inverse theory: MATLAB edition. Academic press, 2012.[26] A. Bryman and D. Cramer, Quantitative data analysis with IBM SPSS 17, 18 and 19. Routledge, 2011.[27] N. del Toro et al., "PRIDE Proteomes: a condensed view of the plethora of public proteomics data available in the PRIDE repository," DILS 2014, p. 21, 2014.[28] U. Kusebauch, E. W. Deutsch, D. S. Campbell, Z. Sun, T. Farrah, and R. L. Moritz, "Using PeptideAtlas, SRMAtlas, and PASSEL: comprehensive resources for discovery and targeted proteomics," Current protocols in bioinformatics, pp. 13.25. 1-13.25. 28, 2014.[29] D. Fenyö and R. C. Beavis, "The GPMDB REST interface," Bioinformatics, vol. 31, no. 12, pp. 2056-2058, 2015.[30] H. Sakai et al., "Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics," Plant and Cell Physiology, vol. 54, no. 2, pp. e6-e6, 2013.[31] Q. Sun, B. Zybailov, W. Majeran, G. Friso, P. D. B. Olinares, and K. J. van Wijk, "PPDB, the plant proteomics database at Cornell," Nucleic acids research, vol. 37, no. suppl_1, pp. D969-D974, 2008.[32] H. J. Joshi et al., "1001 Proteomes: a functional proteomics portal for the analysis of Arabidopsis thaliana accessions," Bioinformatics, vol. 28, no. 10, pp. 1303-1306, 2012.[33] M. Hirsch-Hoffmann, W. Gruissem, and K. Baerenfaller, "pep2pro: the high-throughput proteomics data processing, analysis, and visualization tool," Frontiers in plant science, vol. 3, 2012.[34] D. L. Wheeler et al., "Database resources of the national center for biotechnology information," Nucleic acids research, vol. 36, no. suppl_1, pp. D13-D21, 2007.[35] W. Li et al., "The EMBL-EBI bioinformatics web and programmatic tools framework," Nucleic acids research, vol. 43, no. W1, pp. W580-W584, 2015.[36] Y. Tateno et al., "DNA Data Bank of Japan (DDBJ) for genome scale research in life science," Nucleic acids research, vol. 30, no. 1, pp. 27-30, 2002.[37] R. L. Poole, "The TAIR database," Plant Bioinformatics: Methods and Protocols, pp. 179-212, 2007.[38] C. M. Andorf et al., "MaizeGDB update: new tools, data and interface for the maize model organism database," Nucleic acids research, vol. 44, no. D1, pp. D1195-D1201, 2015.[39] L. A. Mueller et al., "The SOL Genomics Network. A comparative resource for Solanaceae biology and beyond," Plant physiology, vol. 138, no. 3, pp. 1310-1317, 2005.[40] M. K. Monaco et al., "Gramene 2013: comparative plant genomics resources," Nucleic acids research, vol. 42, no. D1, pp. D1193-D1199, 2014.[41] M. G. Conte, S. Gaillard, N. Lanau, M. Rouard, and C. Périn, "GreenPhylDB: a database for plant comparative genomics," Nucleic acids research, vol. 36, no. suppl_1, pp. D991-D998, 2007.[42] D. M. Goodstein et al., "Phytozome: a comparative platform for green plant genomics," Nucleic acids research, vol. 40, no. D1, pp. D1178-D1186, 2011.[43] M. Van Bel et al., "Dissecting plant genomes with the PLAZA comparative genomics platform," Plant physiology, p. pp. 111.189514, 2011.[44] E. Marchiori, N. H. Heegaard, M. West-Nielsen, and C. R. Jimenez, "Feature selection for classification with proteomic data of mixed quality," in Computational Intelligence in Bioinformatics and Computational Biology, 2005. CIBCB'05. Proceedings of the 2005 IEEE Symposium on, 2005, pp. 1-7: IEEE.[45] Z. Liu and S. Lin, "Classification Using Mass Spectrometry Proteomic Data with Kernel-Based Algorithms," Engineering Letters, vol. 13, no. 4, 2006.[46] P. Du, S. M. Lin, W. A. Kibbe, and H. Wang, "Application of wavelet transform to the ms-based proteomics data preprocessing," in Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on, 2007, pp. 680-686: IEEE.[47] H. Grover and V. Gopalakrishnan, "Efficient processing of models for large-scale shotgun proteomics data," in Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), 2012 8th International Conference on, 2012, pp. 591-596: IEEE.[48] Q. Liu, M. Qiao, and A. H. Sung, "Distance metric learning and support vector machines for classification of mass spectrometry proteomics data," International Journal of Knowledge Engineering and Soft Data Paradigms, vol. 1, no. 3, pp. 216-226, 2009.[49] P. Szacherski et al., "Classification of proteomic ms data as bayesian solution of an inverse problem," IEEE Access, vol. 2, pp. 1248-1262, 2014.
Year 2018, Volume: 4 Issue: 2, 23 - 31, 02.07.2018
https://doi.org/10.22399/ijcesen.414984

Abstract

References

  • [1] A. Benso, S. Di Carlo, H. Ur Rehman, G. Politano, A. Savino, and P. Suravajhala, "A combined approach for genome wide protein function annotation/prediction," Proteome science, vol. 11, no. 1, p. S1, 2013.[2] K. W. Earley et al., "Gateway‐compatible vectors for plant functional genomics and proteomics," The Plant Journal, vol. 45, no. 4, pp. 616-629, 2006.[3] P. A. Rudnick et al., "Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses," Molecular & Cellular Proteomics, vol. 9, no. 2, pp. 225-241, 2010.[4] A. L. Tarca, V. J. Carey, X.-w. Chen, R. Romero, and S. Drăghici, "Machine learning and its applications to biology," PLoS computational biology, vol. 3, no. 6, p. e116, 2007.[5] A. Pandey and M. Mann, "Proteomics to study genes and genomes," Nature, vol. 405, no. 6788, pp. 837-846, 2000.[6] B. Domon and R. Aebersold, "Mass spectrometry and protein analysis," science, vol. 312, no. 5771, pp. 212-217, 2006.[7] R. Opgen-Rhein and K. Strimmer, "From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data," BMC systems biology, vol. 1, no. 1, p. 37, 2007.[8] V. Marx, "Biology: The big challenges of big data," Nature, vol. 498, no. 7453, pp. 255-260, 2013.[9] L. N. Mueller, M.-Y. Brusniak, D. Mani, and R. Aebersold, "An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data," Journal of proteome research, vol. 7, no. 01, pp. 51-61, 2008.[10] M. Wilkins, "Proteomics data mining," Expert review of proteomics, vol. 6, no. 6, pp. 599-603, 2009.[11] H. Lu, R. Setiono, and H. Liu, "Neurorule: A connectionist approach to data mining," arXiv preprint arXiv:1701.01358, 2017.[12] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016.[13] X. Wu, X. Zhu, G.-Q. Wu, and W. Ding, "Data mining with big data," IEEE transactions on knowledge and data engineering, vol. 26, no. 1, pp. 97-107, 2014.[14] M. Hardt, E. Price, and N. Srebro, "Equality of opportunity in supervised learning," in Advances in Neural Information Processing Systems, 2016, pp. 3315-3323.[15] A. El Azab, M. A. Mahmood, and A. El-Aziz, "Effectiveness of Web Usage Mining Techniques in Business Application," in Web Usage Mining Techniques and Applications Across Industries: IGI Global, 2017, pp. 324-350.[16] Y.-D. Seo, Y.-G. Kim, E. Lee, and D.-K. Baik, "Personalized recommender system based on friendship strength in social network services," Expert Systems with Applications, vol. 69, pp. 135-148, 2017.[17] N. Gandhi and L. J. Armstrong, "A review of the application of data mining techniques for decision making in agriculture," in Contemporary Computing and Informatics (IC3I), 2016 2nd International Conference on, 2016, pp. 1-6: IEEE.[18] S. Haug, A. Michaels, P. Biber, and J. Ostermann, "Plant classification system for crop/weed discrimination without segmentation," in Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on, 2014, pp. 1142-1149: IEEE.[19] F. Cannarile et al., "An unsupervised clustering method for assessing the degradation state of cutting tools used in the packaging industry," in on Proceedings of European Safety and Relaibility Conference, ESREL, 2017.[20] H. Kaneko and K. Funatsu, "Adaptive soft sensor based on online support vector regression and Bayesian ensemble learning for various states in chemical plants," Chemometrics and Intelligent Laboratory Systems, vol. 137, pp. 57-66, 2014.[21] P. W. Wilson, "FEAR: A software package for frontier efficiency analysis with R," Socio-economic planning sciences, vol. 42, no. 4, pp. 247-254, 2008.[22] A. C. Burns and R. F. Bush, Basic marketing research using Microsoft Excel data analysis. Prentice Hall Press, 2007.[23] J. Han, J. C. Rodriguez, and M. Beheshti, "Diabetes data analysis and prediction model discovery using rapidminer," in Future Generation Communication and Networking, 2008. FGCN'08. Second International Conference on, 2008, vol. 3, pp. 96-99: IEEE.[24] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: an update," ACM SIGKDD explorations newsletter, vol. 11, no. 1, pp. 10-18, 2009.[25] W. Menke, Geophysical data analysis: discrete inverse theory: MATLAB edition. Academic press, 2012.[26] A. Bryman and D. Cramer, Quantitative data analysis with IBM SPSS 17, 18 and 19. Routledge, 2011.[27] N. del Toro et al., "PRIDE Proteomes: a condensed view of the plethora of public proteomics data available in the PRIDE repository," DILS 2014, p. 21, 2014.[28] U. Kusebauch, E. W. Deutsch, D. S. Campbell, Z. Sun, T. Farrah, and R. L. Moritz, "Using PeptideAtlas, SRMAtlas, and PASSEL: comprehensive resources for discovery and targeted proteomics," Current protocols in bioinformatics, pp. 13.25. 1-13.25. 28, 2014.[29] D. Fenyö and R. C. Beavis, "The GPMDB REST interface," Bioinformatics, vol. 31, no. 12, pp. 2056-2058, 2015.[30] H. Sakai et al., "Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics," Plant and Cell Physiology, vol. 54, no. 2, pp. e6-e6, 2013.[31] Q. Sun, B. Zybailov, W. Majeran, G. Friso, P. D. B. Olinares, and K. J. van Wijk, "PPDB, the plant proteomics database at Cornell," Nucleic acids research, vol. 37, no. suppl_1, pp. D969-D974, 2008.[32] H. J. Joshi et al., "1001 Proteomes: a functional proteomics portal for the analysis of Arabidopsis thaliana accessions," Bioinformatics, vol. 28, no. 10, pp. 1303-1306, 2012.[33] M. Hirsch-Hoffmann, W. Gruissem, and K. Baerenfaller, "pep2pro: the high-throughput proteomics data processing, analysis, and visualization tool," Frontiers in plant science, vol. 3, 2012.[34] D. L. Wheeler et al., "Database resources of the national center for biotechnology information," Nucleic acids research, vol. 36, no. suppl_1, pp. D13-D21, 2007.[35] W. Li et al., "The EMBL-EBI bioinformatics web and programmatic tools framework," Nucleic acids research, vol. 43, no. W1, pp. W580-W584, 2015.[36] Y. Tateno et al., "DNA Data Bank of Japan (DDBJ) for genome scale research in life science," Nucleic acids research, vol. 30, no. 1, pp. 27-30, 2002.[37] R. L. Poole, "The TAIR database," Plant Bioinformatics: Methods and Protocols, pp. 179-212, 2007.[38] C. M. Andorf et al., "MaizeGDB update: new tools, data and interface for the maize model organism database," Nucleic acids research, vol. 44, no. D1, pp. D1195-D1201, 2015.[39] L. A. Mueller et al., "The SOL Genomics Network. A comparative resource for Solanaceae biology and beyond," Plant physiology, vol. 138, no. 3, pp. 1310-1317, 2005.[40] M. K. Monaco et al., "Gramene 2013: comparative plant genomics resources," Nucleic acids research, vol. 42, no. D1, pp. D1193-D1199, 2014.[41] M. G. Conte, S. Gaillard, N. Lanau, M. Rouard, and C. Périn, "GreenPhylDB: a database for plant comparative genomics," Nucleic acids research, vol. 36, no. suppl_1, pp. D991-D998, 2007.[42] D. M. Goodstein et al., "Phytozome: a comparative platform for green plant genomics," Nucleic acids research, vol. 40, no. D1, pp. D1178-D1186, 2011.[43] M. Van Bel et al., "Dissecting plant genomes with the PLAZA comparative genomics platform," Plant physiology, p. pp. 111.189514, 2011.[44] E. Marchiori, N. H. Heegaard, M. West-Nielsen, and C. R. Jimenez, "Feature selection for classification with proteomic data of mixed quality," in Computational Intelligence in Bioinformatics and Computational Biology, 2005. CIBCB'05. Proceedings of the 2005 IEEE Symposium on, 2005, pp. 1-7: IEEE.[45] Z. Liu and S. Lin, "Classification Using Mass Spectrometry Proteomic Data with Kernel-Based Algorithms," Engineering Letters, vol. 13, no. 4, 2006.[46] P. Du, S. M. Lin, W. A. Kibbe, and H. Wang, "Application of wavelet transform to the ms-based proteomics data preprocessing," in Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on, 2007, pp. 680-686: IEEE.[47] H. Grover and V. Gopalakrishnan, "Efficient processing of models for large-scale shotgun proteomics data," in Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), 2012 8th International Conference on, 2012, pp. 591-596: IEEE.[48] Q. Liu, M. Qiao, and A. H. Sung, "Distance metric learning and support vector machines for classification of mass spectrometry proteomics data," International Journal of Knowledge Engineering and Soft Data Paradigms, vol. 1, no. 3, pp. 216-226, 2009.[49] P. Szacherski et al., "Classification of proteomic ms data as bayesian solution of an inverse problem," IEEE Access, vol. 2, pp. 1248-1262, 2014.
There are 1 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Research Articles
Authors

Jalil Nourmohammadi Khiarak 0000-0002-1928-9081

Rana Valizadeh-kamran This is me

Ahmad Heydariyan This is me

Najmeh Damghani This is me

Publication Date July 2, 2018
Submission Date April 13, 2018
Acceptance Date May 9, 2018
Published in Issue Year 2018 Volume: 4 Issue: 2

Cite

APA Nourmohammadi Khiarak, J., Valizadeh-kamran, R., Heydariyan, A., Damghani, N. (2018). Big data Analysis in Plant Science and Machine Learning Tool Applications in Genomics and Proteomics. International Journal of Computational and Experimental Science and Engineering, 4(2), 23-31. https://doi.org/10.22399/ijcesen.414984