Araştırma Makalesi
BibTex RIS Kaynak Göster
Yıl 2023, Cilt: 3 Sayı: 1, 19 - 25, 30.06.2023

Öz

Proje Numarası

ELKARTEK 2021

Kaynakça

  • [1] Candanedo, I.S, Nieves, E.H, González, S.R, Martín, M, Briones, A.G. Machine learning predictive model for industry 4.0. In: 13th International Conference on Knowledge Management in Organizations, KMO 2018; August 6-10, 2018: Springer, Cham, pp. 501-510.
  • [2] Khurana, U, Samulowitz H, Turaga, D. Feature engineering for predictive modeling using reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence; February 2-7, 2018: AAAI Press, pp. 3407–3414.
  • [3] Chen, Y.-W, Song, Q, Hu, X. Techniques for automated machine learning. ACM SIGKDD Explorations Newsletter 2021; 22: 35–50, https://doi.org/10.1145/3447556.3447567
  • [4] Olson, R.S, Bartley, N, Urbanowicz R.J, Moore, J.H. Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2016; July 20-24, 2016: Association for Computing Machinery, pp. 485–492.
  • [5] Viegas, F, Rocha, L, Gonçalves, M, Mourão, F, Sá, G, Salles, T, Andrade G, Sandin, I. A genetic programming approach for feature selection in highly dimensional skewed data. Neurocomputing 2018; 273: 554–569., https://doi.org/10.1016/j.neucom.2017.08.050
  • [6] Eiben, A, Smith, J. Introduction to Evolutionary Computing. Berlin: Springer, 2003.
  • [7] Khurana, U, Turaga, D, Samulowitz, H, Parthasrathy, S. Cognito: Automated feature engineering for supervised learning. In: IEEE 16th International Conference on Data Mining Workshops (ICDMW); December 12-15, 2016: IEEE, pp. 1304-1307.
  • [8] Lucas, Y, Portier, P.E, Laporte, L, He-Guelton, L, Caelen, O, Granitzer M, Calabretto, S. Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs. Future Generation Computer Systems 2020; 102: pp.393-402, https://doi.org/10.1016/j.future.2019.08.029
  • [9] Naser M, Alavi, A.H. Error metrics and performance fitness indicators for artificial intelligence and machine learning in engineering and sciences. Architecture, Structures and Construction 2021; 1–19, https://doi.org/10.1007/s44150-021-00015-8
  • [10] Shanmugasundar, G, Vanitha, M, Čep, R, Kumar, R, Kalita, K, Ramachandran, M. A comparative study of linear random forest and AdaBoost Regressions for modeling non-traditional machining. Processes 2015; 9: 1-14, https://doi.org/10.3390/pr9112015
  • [11] Bataineh, A.S.A. A gradient boosting regression-based approach for energy consumption prediction in buildings. Advances in Energy Research 2019; 6: 91-101, https://doi.org/10.12989/eri.2019.6.2.091

Genetic programming-based automated machine learning approach to solve regression problems

Yıl 2023, Cilt: 3 Sayı: 1, 19 - 25, 30.06.2023

Öz

Automated machine learning aims to optimize machine learning pipelines automatically given a dataset, task type and a target variable. This research analyzes the use of genetic programming to perform automated feature engineering in regression problems. It introduces a methodology to perform feature selection and to construct new features departing from the original feature set by combining and selecting features in the leaf nodes of the genetic programming tree. A multiple feature generation technique is proposed, where three different feature sets are tested with linear regression, Random Forest regressor and Gradient Boosting regressor. The proposed approach is applied to an industrial process dataset where the target variable is an indicator of the performance of the process. The experimental results reveal the ability of the method to reduce the cardinality of the original feature set while maintaining the performance of the learning models. Moreover, they show the ability of the newly constructed feature to better discriminate the target variable.

Destekleyen Kurum

Basque Government

Proje Numarası

ELKARTEK 2021

Teşekkür

This work was supported by the Project BISUM under Grant ELKARTEK 2021 through Basque Government.

Kaynakça

  • [1] Candanedo, I.S, Nieves, E.H, González, S.R, Martín, M, Briones, A.G. Machine learning predictive model for industry 4.0. In: 13th International Conference on Knowledge Management in Organizations, KMO 2018; August 6-10, 2018: Springer, Cham, pp. 501-510.
  • [2] Khurana, U, Samulowitz H, Turaga, D. Feature engineering for predictive modeling using reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence; February 2-7, 2018: AAAI Press, pp. 3407–3414.
  • [3] Chen, Y.-W, Song, Q, Hu, X. Techniques for automated machine learning. ACM SIGKDD Explorations Newsletter 2021; 22: 35–50, https://doi.org/10.1145/3447556.3447567
  • [4] Olson, R.S, Bartley, N, Urbanowicz R.J, Moore, J.H. Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2016; July 20-24, 2016: Association for Computing Machinery, pp. 485–492.
  • [5] Viegas, F, Rocha, L, Gonçalves, M, Mourão, F, Sá, G, Salles, T, Andrade G, Sandin, I. A genetic programming approach for feature selection in highly dimensional skewed data. Neurocomputing 2018; 273: 554–569., https://doi.org/10.1016/j.neucom.2017.08.050
  • [6] Eiben, A, Smith, J. Introduction to Evolutionary Computing. Berlin: Springer, 2003.
  • [7] Khurana, U, Turaga, D, Samulowitz, H, Parthasrathy, S. Cognito: Automated feature engineering for supervised learning. In: IEEE 16th International Conference on Data Mining Workshops (ICDMW); December 12-15, 2016: IEEE, pp. 1304-1307.
  • [8] Lucas, Y, Portier, P.E, Laporte, L, He-Guelton, L, Caelen, O, Granitzer M, Calabretto, S. Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs. Future Generation Computer Systems 2020; 102: pp.393-402, https://doi.org/10.1016/j.future.2019.08.029
  • [9] Naser M, Alavi, A.H. Error metrics and performance fitness indicators for artificial intelligence and machine learning in engineering and sciences. Architecture, Structures and Construction 2021; 1–19, https://doi.org/10.1007/s44150-021-00015-8
  • [10] Shanmugasundar, G, Vanitha, M, Čep, R, Kumar, R, Kalita, K, Ramachandran, M. A comparative study of linear random forest and AdaBoost Regressions for modeling non-traditional machining. Processes 2015; 9: 1-14, https://doi.org/10.3390/pr9112015
  • [11] Bataineh, A.S.A. A gradient boosting regression-based approach for energy consumption prediction in buildings. Advances in Energy Research 2019; 6: 91-101, https://doi.org/10.12989/eri.2019.6.2.091
Toplam 11 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Yapay Zeka
Bölüm Research Articles
Yazarlar

Maialen Murua 0000-0001-7922-6771

Proje Numarası ELKARTEK 2021
Yayımlanma Tarihi 30 Haziran 2023
Kabul Tarihi 20 Şubat 2023
Yayımlandığı Sayı Yıl 2023 Cilt: 3 Sayı: 1

Kaynak Göster

Vancouver Murua M. Genetic programming-based automated machine learning approach to solve regression problems. Computers and Informatics. 2023;3(1):19-25.