Effect on model performance of regularization methods

Cafer Budak; Vasfiye Mençik; Mehmet Emin Asker

doi:10.24012/dumf.1051352

EN

Effect on model performance of regularization methods

Abstract

Artificial Neural Networks with numerous parameters are tremendously powerful machine learning systems. Nonetheless, overfitting is a crucial problem in such networks. Maximizing the model accuracy and minimizing the amount of loss is significant in reducing in-class differences and maintaining sensitivity to these differences. In this study, the effects of overfitting for different model architectures with the Wine dataset were investigated by Dropout, AlfaDropout, GausianDropout, Batch normalization, Layer normalization, Activity normalization, L1 and L2 regularization methods and the change in loss function the combination with these methods. Combinations that performed well were examined on different datasets using the same model. The binary cross-entropy loss function was used as a performance measurement metric. According to the results, the Layer and Activity regularization combination showed better training and testing performance compared to other combinations.

Keywords

Kaynakça

[1] H. Akaike, “Information theory and an extension of the maximum likelihood principle,” in Selected Papers of Hirotugu Akaike. Berlin, Germany: Springer, 1998, pp. 199–213.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton. “Imagenet classification with deep convolutional neural networks”. In Advances in neural information processing systems, pp. 1097–1105, 2012
[3] N. Srivastava, G. Hinton, A. Krizhevsky, L. Sutskever and R. Salakhutdinov. "Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research 15.1 2014: pp1929-1958.
[4] S. Ioffe and C. Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift”. In Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015. Pp 448-456
[5] Y. Wu and K. He, “Group normalization,” in European Conference on Computer Vision (ECCV), 2018, pp. 3–19.
[6] J. L. Ba, J. R. Kiros, and G. E Hinton. “Layer normalization”. arXiv preprint 2016, arXiv:1607.06450
[7] D.Ulyanov, V. Andrea, and L. Victor . "Instance normalization: The missing ingredient for fast stylization." arXiv preprint, 2016 arXiv:1607.08022 .
[8] L. Wan, M. Zeiler,, S. Zhang, Y.L. Cun, and R. Fergus,. “Regularization of neural networks using dropconnect”. In Proceedings of the 30th International Conference on Machine Learning (ICML-13),2013., pp. 1058–1066

[9] L. Goodfellow, F. Warde, M. David, C. Mehdi, Aaron, and Y. Bengio,. “Maxout networks”. Proceedings of the International Conference on Learning Representations (ICLR), 2013, pp 1319- 1327.
[10] R. Tibshirani. “Regression shrinkage and selection via the lasso”. Journal of the Royal Statistical Society, Series B,1996 58:267 – 288
[11] normalization: A simple reparameterization to accelerate training of deep neural networks,” in Advances in Neural Information Processing Systems (NeurIPS), 2016, pp. 901–909
[12] S. Akbar , M. Peikari , S. Salama , S.Nofech-Mozes , A. Martel . “”The transition module: a method for preventing overfitting in convolutional neural networks. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 7(3), 260-265.. 2019; 7 (3): 260-265.
[13] J. Deng., W. Dong, SocherR. , L.-J.Li, K. Li., and L. Fei-Fei,. “ImageNet: A Large-Scale Hierarchical Image Database”. IEEE conference on computer vision and pattern recognition. Ieee, 2009. p. 248-255.
[14] R. Girshick, J. Donahue, T. Darrell,, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation”. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pp. 580–587.
[15] P. Cortez., A. Cerdeira., F. Almeida., T. Matos.and J.Reis, “Modeling wine preferences by data mining from physicochemical properties”. Decision support systems, 2009. 47(4), 547-553.
[16] Y. LeCun, . "The MNIST database of handwritten digits." http://yann. lecun. com/exdb/mnist/ 1998
[17] H. Xiao K. Rasul and R. Vollgraf, “Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms”. arXiv , 2017,arXiv:1708.07747
[18] A. Krizhevsky and G. Hinton, "Learning multiple layers of features from tiny images", Computer Science Department University of Toronto Tech. Rep., 2009,vol. 1, no. 4, pp. 7,.
[19] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” in International Conference on Learning Representations (ICLR), 2015
[20] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034
[21] T. Araújo , G. Aresta, E. Castro, J. Rouco, P. Aguiar , C. Eloy, A. Polónia, A. Campilho..”Classification of breast cancer histology images using convolutional neural networks”. PloS One 12, 2017.
[22] Y.Wei., F. Yang,., and , M. J. Wainwright “Early stopping for kernel boosting algorithms: A general analysis with localized complexities”. In Advances in Neural Information Processing Systems, 2017 pp. 6065–6075.
[23] A. Ali,.,J.Z. Kolter,., and R.J. Tibshirani,.” A continuoustime view of early stopping for least squares regression”. The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2019. p. 1370-1378
[24] A.Suggala., A. Prasad., and P.K. Ravikumar,. “Connecting optimization and regularization paths”. In Advances in Neural Information Processing Systems,2018 pp. 10608– 10619.
[25] L. Schmidt., S. Santurkar., D. Tsipras., K. Talwar., and A. Madry. “Adversarially robust generalization requires”,2018 arXiv preprint arXiv:1804.11285.
[26] T. DeVries. and G. W.Taylor, “Improved regularization of convolutional neural networks with cutout”.2017, arXiv preprint arXiv:1708.04552
[27] S. Albawi , T.A Mohammed.and S. Al-Zawi, "Understanding of a convolutional neural network," 2017,International Conference on Engineering and Technology (ICET), pp. 1-6,
[28] Ö.F Ertuğrul, E. Acar, E. Aldemir,A. Öztekin A, Automatic diagnosis of cardiovascular disorders by sub images of the ECG signal using multi-feature extraction methods and randomized neural network, 2021, Biomedical Signal Processing and Control, Volume 64, 102260.
[29] E. Acar, Detection of unregistered electric distribution transformers in agricultural fields with the aid of Sentinel-1 SAR images by machine learning approaches, Computers and Electronics in Agriculture, 2020,Volume 175, 105559,
[30] D. M. Hawkins”The Problem of Overfitting”, J. Chem. Inf. Comput. Sci, , 2004, pp 44, 1-12
[31] T. Van Laarhoven,. “L2 regularization versus batch and weight normalization”. arXiv preprint 2017, arXiv:1706.05350.
[32] M.Y.Park., & T. Hastie, L1‐regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology),2007 69(4), 659-677.

Ayrıntılar

Birincil Dil

İngilizce

Konular

-

Bölüm

Araştırma Makalesi

Yazarlar

Cafer Budak ^*
0000-0002-8470-4579
Türkiye

Vasfiye Mençik Bu kişi benim
0000-0002-3769-0071
Türkiye

Mehmet Emin Asker
0000-0003-4585-4168
Türkiye

Yayımlanma Tarihi

31 Aralık 2021

Gönderilme Tarihi

16 Kasım 2021

Kabul Tarihi

-

Yayımlandığı Sayı

Yıl 2021 Cilt: 12 Sayı: 5

DOI

https://doi.org/10.24012/dumf.1051352

IZ

https://izlik.org/JA82HS27BW

Kaynak Göster

RIS / Bibtex

IEEE

[1]C. Budak, V. Mençik, ve M. E. Asker, “Effect on model performance of regularization methods”, DÜMF MD, c. 12, sy 5, ss. 757–765, Ara. 2021, doi: 10.24012/dumf.1051352.

Cited By

Microwave-assisted hydrodistillation of patchouli oil using canola/sunflower traps: SHAP-interpretable ML optimization

Food and Humanity

https://doi.org/10.1016/j.foohum.2026.100999

Effect on model performance of regularization methods

Abstract

Keywords

Öz

Kaynakça

Ayrıntılar

Birincil Dil

Konular

Bölüm

Yazarlar

Yayımlanma Tarihi

Gönderilme Tarihi

Kabul Tarihi

Yayımlandığı Sayı

DOI

IZ

Kaynak Göster

Cited By

Microwave-assisted hydrodistillation of patchouli oil using canola/sunflower traps: SHAP-interpretable ML optimization