Year 2018,
, 189 - 204, 29.12.2018
Ayla Saylı
,
Ceyda Akbulut
Kemal Kosuta
References
- [1] Zikopoulos, P.C., Eaton, C., deRoos, D., Deutsch, T., Lapis, G., Understanding Big Data, McGrawHill, New York, 2012.
- [2] Witten, Ian H., et al., Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2016.
- [3] Friedman, J., Trevor H., and Tibshirani R., The Elements of Statistical Learning, Vol. 1. Springer series in statistics, New York, 2001.
- [4] Weidner CI, Lin Q, Koch CM, Eisele L, Beier F, Ziegler P, Bauerschlag DO, Jo¨ckel KH, Erbel R, Mu¨hleisen TW, Zenke M, Bru¨mmendorf TH, Wagner W., “Aging of Blood Can Be Tracked by DNA Methylation Changes at Just Three CpG Sites”, Genome Biol 15.2 (2014):1–11.
- [5] Gareth J., Witten D., Hastie T., Tibshirani R., An Introduction to Statistical Learning, Springer, New York, ISBN 978-1-4614-7137-0, 2015.
- [6] Putin E, Mamoshina P, Aliper A, Korzinkin M, Moskalev A., “Deep Biomarkers of Human Aging : Application of Deep Neural Networks to Biomarker Development”, Aging 8.5 (2016):1–13.
- [7] Hox, Joop J., Mirjam M., and Rens Van de Schoot, Multilevel Analysis: Techniques and Applications, Routledge, 2017.
- [8] Hu, Rui, et al., "A Short-term Power Load Forecasting Model based on the Generalized Regression Neural Network with Decreasing Step Fruit Fly Optimization Algorithm", Neurocomputing, 221 (2017): 24-31.
- [9] Kristof De W. and López-Torres L., "Efficiency in Education: a Review of Literature and a Way Forward", Journal of the Operational Research Society, 68.4 (2017): 339-363.
- [10] Gunasekaran M. and Lopez D., "Health Data Analytics Using Scalable Logistic Regression with Stochastic Gradient Descent", International Journal of Advanced Intelligence Paradigms, 10.1-2 (2018): 118-132.
- [11] Markus H., et al., "Economic Development Matters: A Meta‐Regression Analysis on the Relation between Environmental Management and Financial Performance", Journal of Industrial Ecology, 22.4 (2018): 720-744.
- [12] https://sites.google.com/site/frankverbo/data-and-software/data-set-on-the-european-car-market.
- [13] Aggarwal, C. C., An introduction to outlier analysis. In Outlier analysis, New York NY: Springer, (2013): 1-40.
- [14] Ilango, V., Subramanian, R., & Vasudevan, V., “A five step procedure for outlier analysis in data mining”, European Journal of Scientific Research, 75(3) (2012): 327-339.
Multiple Regression Analysis System in Machine Learning and Estimating Effects of Data Transformation&Min-Max Normalization
Year 2018,
, 189 - 204, 29.12.2018
Ayla Saylı
,
Ceyda Akbulut
Kemal Kosuta
Abstract
Machine learning area is a recent topic in data analysis and a researcher or worker of the area is called “Data Scientist” which nowadays has been a highly preferred job title in computing. In this study, we have two aims that the first is to implement a multiple regression analysis system which is developed in Ubuntu operating system on the Anaconda platform using Python3 in order to construct models of each attribute to make their estimations for future decisions taking less risk in advance of past experiences hided in cumulated data and the second aim is to find out effects of data transformation and min-max normalization in the data preparation before building models. After the system implementation, we test the system to determine the best estimation model of each attribute of the vehicles sold in the five European countries between 1970 and 1999. We have constructed six versions of the original dataset and these versions are used to construct regression models for further estimations. Finally, we compute the regression criterion value of R-Squared for each constructed-model and we compare the models according to these values. Computational results are very promising that the system can be used efficiently and the effects of the data transformation and min-max normalization are significant for some attributes.
References
- [1] Zikopoulos, P.C., Eaton, C., deRoos, D., Deutsch, T., Lapis, G., Understanding Big Data, McGrawHill, New York, 2012.
- [2] Witten, Ian H., et al., Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2016.
- [3] Friedman, J., Trevor H., and Tibshirani R., The Elements of Statistical Learning, Vol. 1. Springer series in statistics, New York, 2001.
- [4] Weidner CI, Lin Q, Koch CM, Eisele L, Beier F, Ziegler P, Bauerschlag DO, Jo¨ckel KH, Erbel R, Mu¨hleisen TW, Zenke M, Bru¨mmendorf TH, Wagner W., “Aging of Blood Can Be Tracked by DNA Methylation Changes at Just Three CpG Sites”, Genome Biol 15.2 (2014):1–11.
- [5] Gareth J., Witten D., Hastie T., Tibshirani R., An Introduction to Statistical Learning, Springer, New York, ISBN 978-1-4614-7137-0, 2015.
- [6] Putin E, Mamoshina P, Aliper A, Korzinkin M, Moskalev A., “Deep Biomarkers of Human Aging : Application of Deep Neural Networks to Biomarker Development”, Aging 8.5 (2016):1–13.
- [7] Hox, Joop J., Mirjam M., and Rens Van de Schoot, Multilevel Analysis: Techniques and Applications, Routledge, 2017.
- [8] Hu, Rui, et al., "A Short-term Power Load Forecasting Model based on the Generalized Regression Neural Network with Decreasing Step Fruit Fly Optimization Algorithm", Neurocomputing, 221 (2017): 24-31.
- [9] Kristof De W. and López-Torres L., "Efficiency in Education: a Review of Literature and a Way Forward", Journal of the Operational Research Society, 68.4 (2017): 339-363.
- [10] Gunasekaran M. and Lopez D., "Health Data Analytics Using Scalable Logistic Regression with Stochastic Gradient Descent", International Journal of Advanced Intelligence Paradigms, 10.1-2 (2018): 118-132.
- [11] Markus H., et al., "Economic Development Matters: A Meta‐Regression Analysis on the Relation between Environmental Management and Financial Performance", Journal of Industrial Ecology, 22.4 (2018): 720-744.
- [12] https://sites.google.com/site/frankverbo/data-and-software/data-set-on-the-european-car-market.
- [13] Aggarwal, C. C., An introduction to outlier analysis. In Outlier analysis, New York NY: Springer, (2013): 1-40.
- [14] Ilango, V., Subramanian, R., & Vasudevan, V., “A five step procedure for outlier analysis in data mining”, European Journal of Scientific Research, 75(3) (2012): 327-339.