Research Article
BibTex RIS Cite

BEMO: A Parsimonious Big Data Mining Methodology

Year 2016, , 113 - 123, 01.07.2016
https://doi.org/10.5824/1309-1581.2016.3.007.x

Abstract

The Problem: Standardized processes are often followed to systematically conduct data mining projects. However while current models provide good descriptions, they are in need of updates given current Big Data challenges. Current data mining methods do not meet all requirements of businesses, in addition current methods are difficult to remember and do not cover all requisite steps. Given these limitations, usage of the traditional data mining process methods are fading in favor of independent data mining processes. What Was Done: BEMO Business Opportunity, Exploration, Modeling, and Operationalization is a standard parsimonious process developed for conducting data mining projects in a reusable and repeatable fashion in a Big Data environment. This model is vendor, technology, and industry agnostic. The process model is applied to a practical project example. Why this Work is Important: This manuscript allows a reusable and simplified model for data mining that can be applied to a variety of applications given a formalized and detailed process template. Given new technologies, Big Data and other developments a new data mining methodology is required to adequately meet these needs. The contribution of a parsimonious Big Data mining model also permits utilizing simpler models over complex models that can more efficiently generalize new problems.

References

  • Bensusan, H. (2014). God doesn’t always shave with Occam’s razor. School of Cognitive and Computing Sciences.
  • Brannick, M. (2016). Regression Basics. University of South Florida. Retrieved from: http://faculty.cas.usf.edu/mbrannick/regression/regbas.html.
  • Georges, J., Anderson, C. (2014). Advanced Predictive Modeling Using SAS Enterprise Miner 13.1. SAS Institute.
  • Laerd Statistics. (2013). Linear Regression Analysis Using Stata. Lund Research. Retrieved From: https://statistics.laerd.com/stata-tutorials/linear-regression-using-stata.php. Laerd Statistics. (2013). Types of Variable. Lund Research. Retrieved from: https://statistics.laerd.com/statistical-guides/types-of-variable.php.
  • Mike Masnick, M. (2012). Why Netflix Never Implemented The Algorithm That Won The Netflix $1 Million https://www.techdirt.com/blog/innovation/articles/20120409/03412518422/why-netflix-never- implemented-algorithm-that-won-netflix-1-million-challenge.shtml Retrieved from
  • Netflix. (2009). Netflix Prize. Retrieved from: http://www.netflixprize.com/
  • Netflix. (2016). Netflix Media Center. Retrieved from: https://media.netflix.com/en/about-netflix
  • Piatetsky, G. (2014). CRISP-DM, still the top methodology for analytics, data mining, or data science projects. KDnuggets. Retrieved from: http://www.kdnuggets.com/2014/10/crisp-dm-top-methodology- analytics-data-mining-data-science-projects.html
  • Prevosto, Virginia, Marotta, Peter. Does Big Data Need Bigger Data Quality and Data Management? Verisk Review, 2014.
  • Ravenna, A., Truxillo, C., Wells, C. (2015). SAS Visual Statistics. SAS Institute.
  • Shmueli, G., Patel, N., Bruce, P. (2010). Data Mining for Business Intelligence. Wiley, Hoboken, NJ.
  • Sharda, R., Delen, D., Turban, E. (2014). Business Intelligence A Managerial Perspective on Analytics. Pearson, Upper Saddle River, NJ.
  • Truxillo, C., Wells, C. (2014). Advanced Business Analytics. SAS Institute.
  • Vandekerckhove, J., Matzke, D. (2014). Model Comparison and the Principle of Parsimony. CIDLab. Retrieved from http://www.cidlab.com/prints/vandekerckhove2014model.pdf
  • Wirth, R., Hipp, J. (2001). CRISP-DM: Towards a Standard Process Model for Data
  • Mining. DaimlerChrysler Research & Technology, University of Tübingen.
  • Woodside, J.M. (2014). Big Data Veracity in Healthcare. The 2014 International Conference on Advances in Big Data Analytics.

BEMO: A Parsimonious Big Data Mining Methodology

Year 2016, , 113 - 123, 01.07.2016
https://doi.org/10.5824/1309-1581.2016.3.007.x

Abstract

References

  • Bensusan, H. (2014). God doesn’t always shave with Occam’s razor. School of Cognitive and Computing Sciences.
  • Brannick, M. (2016). Regression Basics. University of South Florida. Retrieved from: http://faculty.cas.usf.edu/mbrannick/regression/regbas.html.
  • Georges, J., Anderson, C. (2014). Advanced Predictive Modeling Using SAS Enterprise Miner 13.1. SAS Institute.
  • Laerd Statistics. (2013). Linear Regression Analysis Using Stata. Lund Research. Retrieved From: https://statistics.laerd.com/stata-tutorials/linear-regression-using-stata.php. Laerd Statistics. (2013). Types of Variable. Lund Research. Retrieved from: https://statistics.laerd.com/statistical-guides/types-of-variable.php.
  • Mike Masnick, M. (2012). Why Netflix Never Implemented The Algorithm That Won The Netflix $1 Million https://www.techdirt.com/blog/innovation/articles/20120409/03412518422/why-netflix-never- implemented-algorithm-that-won-netflix-1-million-challenge.shtml Retrieved from
  • Netflix. (2009). Netflix Prize. Retrieved from: http://www.netflixprize.com/
  • Netflix. (2016). Netflix Media Center. Retrieved from: https://media.netflix.com/en/about-netflix
  • Piatetsky, G. (2014). CRISP-DM, still the top methodology for analytics, data mining, or data science projects. KDnuggets. Retrieved from: http://www.kdnuggets.com/2014/10/crisp-dm-top-methodology- analytics-data-mining-data-science-projects.html
  • Prevosto, Virginia, Marotta, Peter. Does Big Data Need Bigger Data Quality and Data Management? Verisk Review, 2014.
  • Ravenna, A., Truxillo, C., Wells, C. (2015). SAS Visual Statistics. SAS Institute.
  • Shmueli, G., Patel, N., Bruce, P. (2010). Data Mining for Business Intelligence. Wiley, Hoboken, NJ.
  • Sharda, R., Delen, D., Turban, E. (2014). Business Intelligence A Managerial Perspective on Analytics. Pearson, Upper Saddle River, NJ.
  • Truxillo, C., Wells, C. (2014). Advanced Business Analytics. SAS Institute.
  • Vandekerckhove, J., Matzke, D. (2014). Model Comparison and the Principle of Parsimony. CIDLab. Retrieved from http://www.cidlab.com/prints/vandekerckhove2014model.pdf
  • Wirth, R., Hipp, J. (2001). CRISP-DM: Towards a Standard Process Model for Data
  • Mining. DaimlerChrysler Research & Technology, University of Tübingen.
  • Woodside, J.M. (2014). Big Data Veracity in Healthcare. The 2014 International Conference on Advances in Big Data Analytics.
There are 17 citations in total.

Details

Primary Language English
Journal Section Research Article
Authors

Joseph M. Woodside This is me

Publication Date July 1, 2016
Submission Date July 1, 2016
Published in Issue Year 2016

Cite

APA Woodside, J. M. (2016). BEMO: A Parsimonious Big Data Mining Methodology. AJIT-E: Academic Journal of Information Technology, 7(24), 113-123. https://doi.org/10.5824/1309-1581.2016.3.007.x