Research Article

Genetic programming-based automated machine learning approach to solve regression problems

Volume: 3 Number: 1 June 30, 2023
EN

Genetic programming-based automated machine learning approach to solve regression problems

Abstract

Automated machine learning aims to optimize machine learning pipelines automatically given a dataset, task type and a target variable. This research analyzes the use of genetic programming to perform automated feature engineering in regression problems. It introduces a methodology to perform feature selection and to construct new features departing from the original feature set by combining and selecting features in the leaf nodes of the genetic programming tree. A multiple feature generation technique is proposed, where three different feature sets are tested with linear regression, Random Forest regressor and Gradient Boosting regressor. The proposed approach is applied to an industrial process dataset where the target variable is an indicator of the performance of the process. The experimental results reveal the ability of the method to reduce the cardinality of the original feature set while maintaining the performance of the learning models. Moreover, they show the ability of the newly constructed feature to better discriminate the target variable.

Keywords

Supporting Institution

Basque Government

Project Number

ELKARTEK 2021

Thanks

This work was supported by the Project BISUM under Grant ELKARTEK 2021 through Basque Government.

References

  1. [1] Candanedo, I.S, Nieves, E.H, González, S.R, Martín, M, Briones, A.G. Machine learning predictive model for industry 4.0. In: 13th International Conference on Knowledge Management in Organizations, KMO 2018; August 6-10, 2018: Springer, Cham, pp. 501-510.
  2. [2] Khurana, U, Samulowitz H, Turaga, D. Feature engineering for predictive modeling using reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence; February 2-7, 2018: AAAI Press, pp. 3407–3414.
  3. [3] Chen, Y.-W, Song, Q, Hu, X. Techniques for automated machine learning. ACM SIGKDD Explorations Newsletter 2021; 22: 35–50, https://doi.org/10.1145/3447556.3447567
  4. [4] Olson, R.S, Bartley, N, Urbanowicz R.J, Moore, J.H. Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2016; July 20-24, 2016: Association for Computing Machinery, pp. 485–492.
  5. [5] Viegas, F, Rocha, L, Gonçalves, M, Mourão, F, Sá, G, Salles, T, Andrade G, Sandin, I. A genetic programming approach for feature selection in highly dimensional skewed data. Neurocomputing 2018; 273: 554–569., https://doi.org/10.1016/j.neucom.2017.08.050
  6. [6] Eiben, A, Smith, J. Introduction to Evolutionary Computing. Berlin: Springer, 2003.
  7. [7] Khurana, U, Turaga, D, Samulowitz, H, Parthasrathy, S. Cognito: Automated feature engineering for supervised learning. In: IEEE 16th International Conference on Data Mining Workshops (ICDMW); December 12-15, 2016: IEEE, pp. 1304-1307.
  8. [8] Lucas, Y, Portier, P.E, Laporte, L, He-Guelton, L, Caelen, O, Granitzer M, Calabretto, S. Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs. Future Generation Computer Systems 2020; 102: pp.393-402, https://doi.org/10.1016/j.future.2019.08.029

Details

Primary Language

English

Subjects

Artificial Intelligence

Journal Section

Research Article

Publication Date

June 30, 2023

Submission Date

February 14, 2023

Acceptance Date

February 20, 2023

Published in Issue

Year 2023 Volume: 3 Number: 1

Vancouver
1.Maialen Murua. Genetic programming-based automated machine learning approach to solve regression problems. Computers and Informatics [Internet]. 2023 Jun. 1;3(1):19-25. Available from: https://izlik.org/JA26ER59UA

Computers and Informatics is licensed under CC BY-NC 4.0