The Effect of Si-CL Loss Function and Different Optimization Algorithms in Improving CNN Performance
Abstract
Convolutional Neural Networks (CNNs) used for image classification often have complex architectures involving large images, time-costly training processes, and a large number of layers and hyperparameters. Therefore, improving the accuracy of CNN is a challenging process that requires time, resources and specialized knowledge. In this study, to improve the performance of CNN models, experiments were conducted on the MNIST, EMNIST, and Fashion-MNIST datasets using different optimization algorithms and a loss function (Si-CL) from the literature. The findings of the study reveal the effects of loss functions and optimization algorithms on model performance in detail. The SGDM, Adam, RMSProp, RMSProp, AdaMax, AdaDelta and AdaGrad optimization algorithms used during the experiments are examined and the results show that the Adam algorithm performs the best in terms of both training accuracy and test accuracy. The SGDM algorithm was particularly effective at larger batch sizes and low learning rates, but required longer training times compared to the Adam algorithm. The Si-CL loss function used in the study performed better than the traditional cross entropy loss. The model trained with the Si-CL loss function achieved higher results in terms of both training and test accuracy, reduced training time and lower loss value. This allowed the model to learn faster and more efficiently.
Keywords
References
- Agarap, A. F. (2018). Deep learning using rectified linear units (relu). arXiv:1803.08375.
- Akhtar, M., Tanveer, M., & Arshad, M. (2025). HawkEye: A robust loss function for regression with bounded, smooth, and insensitive zone characteristics. Applied Soft Computing, 113118. https://doi.org/10.1016/j.asoc.2025.113118
- Arsenault, R., Poulin, A., Côté, P., & Brissette, F. (2014). Comparison of stochastic optimization algorithms in hydrological model calibration. Journal of Hydrologic Engineering, 19(7), 1374-1384. https://doi.org/10.1061/(ASCE)HE.1943-5584.000093
- Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade, 437-478, Springer, Berlin, Heidelberg.
- Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT'2010, 177-186, Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-7908-2604-3_16
- Chen, X., Yu, R., Ullah, S., Wu, D., Li, Z., Li, Q., Qi, H., Liu, J., Liu, M., & Zhang, Y. (2022). A novel loss function of deep learning in wind speed forecasting. Energy, 238, 121808. https://doi.org/10.1016/j.energy.2021.121808
- Cohen, G., Afshar, S., Tapson, J., & Van Schaik, A. (2017, May). EMNIST: Extending MNIST to handwritten letters. In: 2017 international joint conference on neural networks (IJCNN) (pp. 2921-2926). IEEE.
- Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul), 2121-2159.
Details
Primary Language
English
Subjects
Deep Learning, Neural Networks
Journal Section
Research Article
Publication Date
March 31, 2026
Submission Date
December 12, 2025
Acceptance Date
March 2, 2026
Published in Issue
Year 2026 Volume: 13 Number: 1