Research Article
BibTex RIS Cite
Year 2023, Volume: 3 Issue: 1, 19 - 25, 30.06.2023

Abstract

Project Number

ELKARTEK 2021

References

  • [1] Candanedo, I.S, Nieves, E.H, González, S.R, Martín, M, Briones, A.G. Machine learning predictive model for industry 4.0. In: 13th International Conference on Knowledge Management in Organizations, KMO 2018; August 6-10, 2018: Springer, Cham, pp. 501-510.
  • [2] Khurana, U, Samulowitz H, Turaga, D. Feature engineering for predictive modeling using reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence; February 2-7, 2018: AAAI Press, pp. 3407–3414.
  • [3] Chen, Y.-W, Song, Q, Hu, X. Techniques for automated machine learning. ACM SIGKDD Explorations Newsletter 2021; 22: 35–50, https://doi.org/10.1145/3447556.3447567
  • [4] Olson, R.S, Bartley, N, Urbanowicz R.J, Moore, J.H. Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2016; July 20-24, 2016: Association for Computing Machinery, pp. 485–492.
  • [5] Viegas, F, Rocha, L, Gonçalves, M, Mourão, F, Sá, G, Salles, T, Andrade G, Sandin, I. A genetic programming approach for feature selection in highly dimensional skewed data. Neurocomputing 2018; 273: 554–569., https://doi.org/10.1016/j.neucom.2017.08.050
  • [6] Eiben, A, Smith, J. Introduction to Evolutionary Computing. Berlin: Springer, 2003.
  • [7] Khurana, U, Turaga, D, Samulowitz, H, Parthasrathy, S. Cognito: Automated feature engineering for supervised learning. In: IEEE 16th International Conference on Data Mining Workshops (ICDMW); December 12-15, 2016: IEEE, pp. 1304-1307.
  • [8] Lucas, Y, Portier, P.E, Laporte, L, He-Guelton, L, Caelen, O, Granitzer M, Calabretto, S. Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs. Future Generation Computer Systems 2020; 102: pp.393-402, https://doi.org/10.1016/j.future.2019.08.029
  • [9] Naser M, Alavi, A.H. Error metrics and performance fitness indicators for artificial intelligence and machine learning in engineering and sciences. Architecture, Structures and Construction 2021; 1–19, https://doi.org/10.1007/s44150-021-00015-8
  • [10] Shanmugasundar, G, Vanitha, M, Čep, R, Kumar, R, Kalita, K, Ramachandran, M. A comparative study of linear random forest and AdaBoost Regressions for modeling non-traditional machining. Processes 2015; 9: 1-14, https://doi.org/10.3390/pr9112015
  • [11] Bataineh, A.S.A. A gradient boosting regression-based approach for energy consumption prediction in buildings. Advances in Energy Research 2019; 6: 91-101, https://doi.org/10.12989/eri.2019.6.2.091

Genetic programming-based automated machine learning approach to solve regression problems

Year 2023, Volume: 3 Issue: 1, 19 - 25, 30.06.2023

Abstract

Automated machine learning aims to optimize machine learning pipelines automatically given a dataset, task type and a target variable. This research analyzes the use of genetic programming to perform automated feature engineering in regression problems. It introduces a methodology to perform feature selection and to construct new features departing from the original feature set by combining and selecting features in the leaf nodes of the genetic programming tree. A multiple feature generation technique is proposed, where three different feature sets are tested with linear regression, Random Forest regressor and Gradient Boosting regressor. The proposed approach is applied to an industrial process dataset where the target variable is an indicator of the performance of the process. The experimental results reveal the ability of the method to reduce the cardinality of the original feature set while maintaining the performance of the learning models. Moreover, they show the ability of the newly constructed feature to better discriminate the target variable.

Supporting Institution

Basque Government

Project Number

ELKARTEK 2021

Thanks

This work was supported by the Project BISUM under Grant ELKARTEK 2021 through Basque Government.

References

  • [1] Candanedo, I.S, Nieves, E.H, González, S.R, Martín, M, Briones, A.G. Machine learning predictive model for industry 4.0. In: 13th International Conference on Knowledge Management in Organizations, KMO 2018; August 6-10, 2018: Springer, Cham, pp. 501-510.
  • [2] Khurana, U, Samulowitz H, Turaga, D. Feature engineering for predictive modeling using reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence; February 2-7, 2018: AAAI Press, pp. 3407–3414.
  • [3] Chen, Y.-W, Song, Q, Hu, X. Techniques for automated machine learning. ACM SIGKDD Explorations Newsletter 2021; 22: 35–50, https://doi.org/10.1145/3447556.3447567
  • [4] Olson, R.S, Bartley, N, Urbanowicz R.J, Moore, J.H. Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2016; July 20-24, 2016: Association for Computing Machinery, pp. 485–492.
  • [5] Viegas, F, Rocha, L, Gonçalves, M, Mourão, F, Sá, G, Salles, T, Andrade G, Sandin, I. A genetic programming approach for feature selection in highly dimensional skewed data. Neurocomputing 2018; 273: 554–569., https://doi.org/10.1016/j.neucom.2017.08.050
  • [6] Eiben, A, Smith, J. Introduction to Evolutionary Computing. Berlin: Springer, 2003.
  • [7] Khurana, U, Turaga, D, Samulowitz, H, Parthasrathy, S. Cognito: Automated feature engineering for supervised learning. In: IEEE 16th International Conference on Data Mining Workshops (ICDMW); December 12-15, 2016: IEEE, pp. 1304-1307.
  • [8] Lucas, Y, Portier, P.E, Laporte, L, He-Guelton, L, Caelen, O, Granitzer M, Calabretto, S. Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs. Future Generation Computer Systems 2020; 102: pp.393-402, https://doi.org/10.1016/j.future.2019.08.029
  • [9] Naser M, Alavi, A.H. Error metrics and performance fitness indicators for artificial intelligence and machine learning in engineering and sciences. Architecture, Structures and Construction 2021; 1–19, https://doi.org/10.1007/s44150-021-00015-8
  • [10] Shanmugasundar, G, Vanitha, M, Čep, R, Kumar, R, Kalita, K, Ramachandran, M. A comparative study of linear random forest and AdaBoost Regressions for modeling non-traditional machining. Processes 2015; 9: 1-14, https://doi.org/10.3390/pr9112015
  • [11] Bataineh, A.S.A. A gradient boosting regression-based approach for energy consumption prediction in buildings. Advances in Energy Research 2019; 6: 91-101, https://doi.org/10.12989/eri.2019.6.2.091
There are 11 citations in total.

Details

Primary Language English
Subjects Artificial Intelligence
Journal Section Research Articles
Authors

Maialen Murua 0000-0001-7922-6771

Project Number ELKARTEK 2021
Publication Date June 30, 2023
Acceptance Date February 20, 2023
Published in Issue Year 2023 Volume: 3 Issue: 1

Cite

Vancouver Murua M. Genetic programming-based automated machine learning approach to solve regression problems. Computers and Informatics. 2023;3(1):19-25.

Computers and Informatics is licensed under CC BY-NC 4.0