TY - JOUR T1 - Comprehensive Benchmarking Analysis for Evaluating Effectiveness of Transfer Learning-based Feature Engineering in AutoML AU - Sırt, Merve AU - Eyüpoğlu, Can PY - 2025 DA - February Y2 - 2024 DO - 10.52876/jcs.1604889 JF - The Journal of Cognitive Systems JO - JCS PB - İstanbul Technical University WT - DergiPark SN - 2548-0650 SP - 30 EP - 37 VL - 9 IS - 2 LA - en AB - This study conducts a comprehensive benchmarking analysis to evaluate the effectiveness of transfer learning-based feature engineering in Automated Machine Learning (AutoML) systems. The research compares traditional manual feature engineering, standard AutoML approaches, and transfer learning-enhanced AutoML across diverse data modalities, including images, text, and tabular data. Experimental evaluations were carried out using CIFAR-10, IMDB Reviews, and Adult Census Income datasets, focusing on assessing each approach in terms of model performance, training time, and resource utilization. The findings reveal that transfer learning-enhanced AutoML significantly reduces training time by up to 45% while improving model accuracy by up to 20%, particularly for image and text datasets. Furthermore, scenarios with high feature reuse rates demonstrated memory utilization improvements of up to 30%. These results underscore the substantial advantages of integrating transfer learning into AutoML systems for optimizing feature engineering processes. KW - AutoML KW - Transfer Learning KW - Feature Engineering KW - Machine Learning Optimization CR - [1] X. He, K. Zhao, X. Chu. “Automl: A Survey of the State-of-the-Art.” Knowledge-Based Systems 212 (January 2021): 106622. https://doi.org/10.1016/j.knosys.2020.106622. CR - [2] K. Chauhan et al., "Automated Machine Learning: The New Wave of Machine Learning," 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bangalore, India, 2020, pp. 205-212, . https://doi: 10.1109/ICIMIA48430.2020.9074859. CR - [3] T. Nagarajah and G. Poravi, "A Review on Automated Machine Learning (AutoML) Systems," 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), Bombay, India, 2019, pp. 1-6, . https://doi: 10.1109/I2CT45611.2019.9033810. CR - [4] M. Baratchi, C. Wang, S. Limmer, J. N. van Rijn, H. Hoos, T. Bäck and M. Olhofer. 2024. “Automated Machine Learning: Past, Present and Future.” Artificial Intelligence Review 57 (5). https://doi.org/10.1007/s10462-024-10726-1. CR - [5] V. K. Harikrishnan, M. Vijarania, and A. Gambhir. “Diabetic Retinopathy Identification Using Automl.” Computational Intelligence and Its Applications in Healthcare, 2020, 175–88. https://doi.org/10.1016/b978-0-12-820604-1.00012-1. CR - [6] D. Salinas and N. Erickson. 2023. “TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications.” arXiv preprint arXiv:2311.02971. CR - [7] P. Malakar, P. Balaprakash, V. Vishwanath, V. Morozov and K. Kumaran, "Benchmarking Machine Learning Methods for Performance Modeling of Scientific Applications," 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), Dallas, TX, USA, 2018, pp. 33-44, doi: 10.1109/PMBS.2018.8641686. CR - [8] D.V. Anand, Q. Xu, J. Wee. Topological feature engineering for machine learning based halide perovskite materials design. npj Comput Mater 8, 203 (2022). https://doi.org/10.1038/s41524-022-00883-8 CR - [9] M. Zöller, H. F. Huber (2021). Benchmark and survey of automated machine learning frameworks. https://arxiv.org/abs/1904.12054 CR - [10] Y. Abouelnaga, O. S. Ali, H. Rady, M. Moustafa. (2016). CIFAR-10: KNN-based ensemble of classifiers. https://arxiv.org/abs/1611.04905 CR - [11] Z. Yan, J. Zhou, W. Wong. (2021). Near Lossless Transfer Learning for Spiking Neural Networks. AAAI Conference on Artificial Intelligence. CR - [12] R. Istrate, F. Scheidegger, G. Mariani, D. Nikolopoulos, C. Bekas, A. Cristiano Innocenza Malossi. Tapas: Train-less accuracy predictor for architecture search. In Proceedings of the AAAI conference on artificial intelligence, pp. 3927–3934, 2019. CR - [13] Y. You and J. Demmel, "Runtime Data Layout Scheduling for Machine Learning Dataset," 2017 46th International Conference on Parallel Processing (ICPP), Bristol, UK, 2017, pp. 452-461, doi: 10.1109/ICPP.2017.54. CR - [14] P. M. Radiuk. Impact of training set batch size on the performance of convolutional neural networks for diverse datasets. (2017) CR - [15] M. S. Başarslan, F. Kayaalp, “Sentiment analysis with ensemble and machine learning methods in multi-domain datasets”, TUJE, vol. 7, no. 2, pp. 141–148, 2023, doi: 10.31127/tuje.1079698. CR - [16] Weblink-1: https://archive.ics.uci.edu/dataset/691/cifar+10 CR - [17] Weblink-2: https://www.tensorflow.org/datasets/catalog/imdb_reviews CR - [18] Weblink-3: https://archive.ics.uci.edu/dataset/2/adult UR - https://doi.org/10.52876/jcs.1604889 L1 - https://dergipark.org.tr/en/download/article-file/4456103 ER -