Effect on model performance of regularization methods

Cafer Budak; Vasfiye Mençik; Mehmet Emin Asker

doi:10.24012/dumf.1051352

Research Article

Effect on model performance of regularization methods

Year 2021, , 757 - 765, 31.12.2021

Cafer Budak , Vasfiye Mençik Mehmet Emin Asker

https://doi.org/10.24012/dumf.1051352

Abstract

Artificial Neural Networks with numerous parameters are tremendously powerful machine learning systems. Nonetheless, overfitting is a crucial problem in such networks. Maximizing the model accuracy and minimizing the amount of loss is significant in reducing in-class differences and maintaining sensitivity to these differences. In this study, the effects of overfitting for different model architectures with the Wine dataset were investigated by Dropout, AlfaDropout, GausianDropout, Batch normalization, Layer normalization, Activity normalization, L1 and L2 regularization methods and the change in loss function the combination with these methods. Combinations that performed well were examined on different datasets using the same model. The binary cross-entropy loss function was used as a performance measurement metric. According to the results, the Layer and Activity regularization combination showed better training and testing performance compared to other combinations.

Keywords

Overfitting, Machine learning, Regularization

References

[1] H. Akaike, “Information theory and an extension of the maximum likelihood principle,” in Selected Papers of Hirotugu Akaike. Berlin, Germany: Springer, 1998, pp. 199–213.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton. “Imagenet classification with deep convolutional neural networks”. In Advances in neural information processing systems, pp. 1097–1105, 2012
[3] N. Srivastava, G. Hinton, A. Krizhevsky, L. Sutskever and R. Salakhutdinov. "Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research 15.1 2014: pp1929-1958.
[4] S. Ioffe and C. Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift”. In Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015. Pp 448-456
[5] Y. Wu and K. He, “Group normalization,” in European Conference on Computer Vision (ECCV), 2018, pp. 3–19.
[6] J. L. Ba, J. R. Kiros, and G. E Hinton. “Layer normalization”. arXiv preprint 2016, arXiv:1607.06450
[7] D.Ulyanov, V. Andrea, and L. Victor . "Instance normalization: The missing ingredient for fast stylization." arXiv preprint, 2016 arXiv:1607.08022 .
[8] L. Wan, M. Zeiler,, S. Zhang, Y.L. Cun, and R. Fergus,. “Regularization of neural networks using dropconnect”. In Proceedings of the 30th International Conference on Machine Learning (ICML-13),2013., pp. 1058–1066
[9] L. Goodfellow, F. Warde, M. David, C. Mehdi, Aaron, and Y. Bengio,. “Maxout networks”. Proceedings of the International Conference on Learning Representations (ICLR), 2013, pp 1319- 1327.
[10] R. Tibshirani. “Regression shrinkage and selection via the lasso”. Journal of the Royal Statistical Society, Series B,1996 58:267 – 288
[11] normalization: A simple reparameterization to accelerate training of deep neural networks,” in Advances in Neural Information Processing Systems (NeurIPS), 2016, pp. 901–909
[12] S. Akbar , M. Peikari , S. Salama , S.Nofech-Mozes , A. Martel . “”The transition module: a method for preventing overfitting in convolutional neural networks. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 7(3), 260-265.. 2019; 7 (3): 260-265.
[13] J. Deng., W. Dong, SocherR. , L.-J.Li, K. Li., and L. Fei-Fei,. “ImageNet: A Large-Scale Hierarchical Image Database”. IEEE conference on computer vision and pattern recognition. Ieee, 2009. p. 248-255.
[14] R. Girshick, J. Donahue, T. Darrell,, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation”. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pp. 580–587.
[15] P. Cortez., A. Cerdeira., F. Almeida., T. Matos.and J.Reis, “Modeling wine preferences by data mining from physicochemical properties”. Decision support systems, 2009. 47(4), 547-553.
[16] Y. LeCun, . "The MNIST database of handwritten digits." http://yann. lecun. com/exdb/mnist/ 1998
[17] H. Xiao K. Rasul and R. Vollgraf, “Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms”. arXiv , 2017,arXiv:1708.07747
[18] A. Krizhevsky and G. Hinton, "Learning multiple layers of features from tiny images", Computer Science Department University of Toronto Tech. Rep., 2009,vol. 1, no. 4, pp. 7,.
[19] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” in International Conference on Learning Representations (ICLR), 2015
[20] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034
[21] T. Araújo , G. Aresta, E. Castro, J. Rouco, P. Aguiar , C. Eloy, A. Polónia, A. Campilho..”Classification of breast cancer histology images using convolutional neural networks”. PloS One 12, 2017.
[22] Y.Wei., F. Yang,., and , M. J. Wainwright “Early stopping for kernel boosting algorithms: A general analysis with localized complexities”. In Advances in Neural Information Processing Systems, 2017 pp. 6065–6075.
[23] A. Ali,.,J.Z. Kolter,., and R.J. Tibshirani,.” A continuoustime view of early stopping for least squares regression”. The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2019. p. 1370-1378
[24] A.Suggala., A. Prasad., and P.K. Ravikumar,. “Connecting optimization and regularization paths”. In Advances in Neural Information Processing Systems,2018 pp. 10608– 10619.
[25] L. Schmidt., S. Santurkar., D. Tsipras., K. Talwar., and A. Madry. “Adversarially robust generalization requires”,2018 arXiv preprint arXiv:1804.11285.
[26] T. DeVries. and G. W.Taylor, “Improved regularization of convolutional neural networks with cutout”.2017, arXiv preprint arXiv:1708.04552
[27] S. Albawi , T.A Mohammed.and S. Al-Zawi, "Understanding of a convolutional neural network," 2017,International Conference on Engineering and Technology (ICET), pp. 1-6,
[28] Ö.F Ertuğrul, E. Acar, E. Aldemir,A. Öztekin A, Automatic diagnosis of cardiovascular disorders by sub images of the ECG signal using multi-feature extraction methods and randomized neural network, 2021, Biomedical Signal Processing and Control, Volume 64, 102260.
[29] E. Acar, Detection of unregistered electric distribution transformers in agricultural fields with the aid of Sentinel-1 SAR images by machine learning approaches, Computers and Electronics in Agriculture, 2020,Volume 175, 105559,
[30] D. M. Hawkins”The Problem of Overfitting”, J. Chem. Inf. Comput. Sci, , 2004, pp 44, 1-12
[31] T. Van Laarhoven,. “L2 regularization versus batch and weight normalization”. arXiv preprint 2017, arXiv:1706.05350.
[32] M.Y.Park., & T. Hastie, L1‐regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology),2007 69(4), 659-677.

Year 2021, , 757 - 765, 31.12.2021

Cafer Budak , Vasfiye Mençik Mehmet Emin Asker

https://doi.org/10.24012/dumf.1051352

Abstract

References

[1] H. Akaike, “Information theory and an extension of the maximum likelihood principle,” in Selected Papers of Hirotugu Akaike. Berlin, Germany: Springer, 1998, pp. 199–213.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton. “Imagenet classification with deep convolutional neural networks”. In Advances in neural information processing systems, pp. 1097–1105, 2012
[3] N. Srivastava, G. Hinton, A. Krizhevsky, L. Sutskever and R. Salakhutdinov. "Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research 15.1 2014: pp1929-1958.
[4] S. Ioffe and C. Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift”. In Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015. Pp 448-456
[5] Y. Wu and K. He, “Group normalization,” in European Conference on Computer Vision (ECCV), 2018, pp. 3–19.
[6] J. L. Ba, J. R. Kiros, and G. E Hinton. “Layer normalization”. arXiv preprint 2016, arXiv:1607.06450
[7] D.Ulyanov, V. Andrea, and L. Victor . "Instance normalization: The missing ingredient for fast stylization." arXiv preprint, 2016 arXiv:1607.08022 .
[8] L. Wan, M. Zeiler,, S. Zhang, Y.L. Cun, and R. Fergus,. “Regularization of neural networks using dropconnect”. In Proceedings of the 30th International Conference on Machine Learning (ICML-13),2013., pp. 1058–1066
[9] L. Goodfellow, F. Warde, M. David, C. Mehdi, Aaron, and Y. Bengio,. “Maxout networks”. Proceedings of the International Conference on Learning Representations (ICLR), 2013, pp 1319- 1327.
[10] R. Tibshirani. “Regression shrinkage and selection via the lasso”. Journal of the Royal Statistical Society, Series B,1996 58:267 – 288
[11] normalization: A simple reparameterization to accelerate training of deep neural networks,” in Advances in Neural Information Processing Systems (NeurIPS), 2016, pp. 901–909
[12] S. Akbar , M. Peikari , S. Salama , S.Nofech-Mozes , A. Martel . “”The transition module: a method for preventing overfitting in convolutional neural networks. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 7(3), 260-265.. 2019; 7 (3): 260-265.
[13] J. Deng., W. Dong, SocherR. , L.-J.Li, K. Li., and L. Fei-Fei,. “ImageNet: A Large-Scale Hierarchical Image Database”. IEEE conference on computer vision and pattern recognition. Ieee, 2009. p. 248-255.
[14] R. Girshick, J. Donahue, T. Darrell,, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation”. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pp. 580–587.
[15] P. Cortez., A. Cerdeira., F. Almeida., T. Matos.and J.Reis, “Modeling wine preferences by data mining from physicochemical properties”. Decision support systems, 2009. 47(4), 547-553.
[16] Y. LeCun, . "The MNIST database of handwritten digits." http://yann. lecun. com/exdb/mnist/ 1998
[17] H. Xiao K. Rasul and R. Vollgraf, “Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms”. arXiv , 2017,arXiv:1708.07747
[18] A. Krizhevsky and G. Hinton, "Learning multiple layers of features from tiny images", Computer Science Department University of Toronto Tech. Rep., 2009,vol. 1, no. 4, pp. 7,.
[19] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” in International Conference on Learning Representations (ICLR), 2015
[20] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034
[21] T. Araújo , G. Aresta, E. Castro, J. Rouco, P. Aguiar , C. Eloy, A. Polónia, A. Campilho..”Classification of breast cancer histology images using convolutional neural networks”. PloS One 12, 2017.
[22] Y.Wei., F. Yang,., and , M. J. Wainwright “Early stopping for kernel boosting algorithms: A general analysis with localized complexities”. In Advances in Neural Information Processing Systems, 2017 pp. 6065–6075.
[23] A. Ali,.,J.Z. Kolter,., and R.J. Tibshirani,.” A continuoustime view of early stopping for least squares regression”. The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2019. p. 1370-1378
[24] A.Suggala., A. Prasad., and P.K. Ravikumar,. “Connecting optimization and regularization paths”. In Advances in Neural Information Processing Systems,2018 pp. 10608– 10619.
[25] L. Schmidt., S. Santurkar., D. Tsipras., K. Talwar., and A. Madry. “Adversarially robust generalization requires”,2018 arXiv preprint arXiv:1804.11285.
[26] T. DeVries. and G. W.Taylor, “Improved regularization of convolutional neural networks with cutout”.2017, arXiv preprint arXiv:1708.04552
[27] S. Albawi , T.A Mohammed.and S. Al-Zawi, "Understanding of a convolutional neural network," 2017,International Conference on Engineering and Technology (ICET), pp. 1-6,
[28] Ö.F Ertuğrul, E. Acar, E. Aldemir,A. Öztekin A, Automatic diagnosis of cardiovascular disorders by sub images of the ECG signal using multi-feature extraction methods and randomized neural network, 2021, Biomedical Signal Processing and Control, Volume 64, 102260.
[29] E. Acar, Detection of unregistered electric distribution transformers in agricultural fields with the aid of Sentinel-1 SAR images by machine learning approaches, Computers and Electronics in Agriculture, 2020,Volume 175, 105559,
[30] D. M. Hawkins”The Problem of Overfitting”, J. Chem. Inf. Comput. Sci, , 2004, pp 44, 1-12
[31] T. Van Laarhoven,. “L2 regularization versus batch and weight normalization”. arXiv preprint 2017, arXiv:1706.05350.
[32] M.Y.Park., & T. Hastie, L1‐regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology),2007 69(4), 659-677.

There are 32 citations in total.

Details

Primary Language	English
Journal Section	Articles
Authors	Cafer Budak 0000-0002-8470-4579 Vasfiye Mençik This is me 0000-0002-3769-0071 Mehmet Emin Asker 0000-0003-4585-4168
Publication Date	December 31, 2021
Submission Date	November 16, 2021
Published in Issue	Year 2021

Cite

IEEE	C. Budak, V. Mençik, and M. E. Asker, “Effect on model performance of regularization methods”, DÜMF MD, vol. 12, no. 5, pp. 757–765, 2021, doi: 10.24012/dumf.1051352.

Article Files

Full Text

DUJE tarafından yayınlanan tüm makaleler, Creative Commons Atıf 4.0 Uluslararası Lisansı ile lisanslanmıştır. Bu, orijinal eser ve kaynağın uygun şekilde belirtilmesi koşuluyla, herkesin eseri kopyalamasına, yeniden dağıtmasına, yeniden düzenlemesine, iletmesine ve uyarlamasına izin verir. 24456