The Effect of Si-CL Loss Function and Different Optimization Algorithms in Improving CNN Performance

Yasin Özkan; Pakize Erdoğmuş

doi:10.54287/gujsa.1840916

The Effect of Si-CL Loss Function and Different Optimization Algorithms in Improving CNN Performance

Abstract

Convolutional Neural Networks (CNNs) used for image classification often have complex architectures involving large images, time-costly training processes, and a large number of layers and hyperparameters. Therefore, improving the accuracy of CNN is a challenging process that requires time, resources and specialized knowledge. In this study, to improve the performance of CNN models, experiments were conducted on the MNIST, EMNIST, and Fashion-MNIST datasets using different optimization algorithms and a loss function (Si-CL) from the literature. The findings of the study reveal the effects of loss functions and optimization algorithms on model performance in detail. The SGDM, Adam, RMSProp, RMSProp, AdaMax, AdaDelta and AdaGrad optimization algorithms used during the experiments are examined and the results show that the Adam algorithm performs the best in terms of both training accuracy and test accuracy. The SGDM algorithm was particularly effective at larger batch sizes and low learning rates, but required longer training times compared to the Adam algorithm. The Si-CL loss function used in the study performed better than the traditional cross entropy loss. The model trained with the Si-CL loss function achieved higher results in terms of both training and test accuracy, reduced training time and lower loss value. This allowed the model to learn faster and more efficiently.

Keywords

References

Agarap, A. F. (2018). Deep learning using rectified linear units (relu). arXiv:1803.08375.
Akhtar, M., Tanveer, M., & Arshad, M. (2025). HawkEye: A robust loss function for regression with bounded, smooth, and insensitive zone characteristics. Applied Soft Computing, 113118. https://doi.org/10.1016/j.asoc.2025.113118
Arsenault, R., Poulin, A., Côté, P., & Brissette, F. (2014). Comparison of stochastic optimization algorithms in hydrological model calibration. Journal of Hydrologic Engineering, 19(7), 1374-1384. https://doi.org/10.1061/(ASCE)HE.1943-5584.000093
Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade, 437-478, Springer, Berlin, Heidelberg.
Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT'2010, 177-186, Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-7908-2604-3_16
Chen, X., Yu, R., Ullah, S., Wu, D., Li, Z., Li, Q., Qi, H., Liu, J., Liu, M., & Zhang, Y. (2022). A novel loss function of deep learning in wind speed forecasting. Energy, 238, 121808. https://doi.org/10.1016/j.energy.2021.121808
Cohen, G., Afshar, S., Tapson, J., & Van Schaik, A. (2017, May). EMNIST: Extending MNIST to handwritten letters. In: 2017 international joint conference on neural networks (IJCNN) (pp. 2921-2926). IEEE.
Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul), 2121-2159.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
Gulzar, Y., Hamid, Y., Soomro, A. B., Alwan, A. A., & Journaux, L. (2018). A convolution neural network-based seed classification system. Symmetry, 12(12), 2018. https://doi.org/10.3390/sym12122018
Hallen, R. (2017). A study of gradient-based algorithms. arXiv:1704.06101.
Hinton, G., Srivastava, N., & Swersky, K. (2012). Neural Networks: Tricks of the Trade. In: Hinton, G., Srivastava, N., & Swersky, K. (Eds.), Neural Networks: Tricks of the Trade, 437-478, Berlin, Heidelberg, Springer.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980.
Krizhevsky, A., Sutskever I., & Hinton G E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 1097-1105.
LeCun, Y., Bottou L., Bengio Y., & Haffner P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791
Liu, M., Yao, D., Liu, Z., Guo, J., & Chen, J. (2023). An improved Adam optimization algorithm combining adaptive coefficients and composite gradients based on randomized block coordinate descent. Computational intelligence and neuroscience, 2023(1), 4765891. https://doi.org/10.1155/2023/4765891
Lin, C., Abe, S., Zheng, S., Li, X., & Chun, P. J. (2025). A structure‐oriented loss function for automated semantic segmentation of bridge point clouds. Computer‐Aided Civil and Infrastructure Engineering, 40(6), 801-816. https://doi.org/10.1111/mice.13422
Madhivanan, V., & Mathivanan, P. (2022, March). An evaluation on the performance of privacy preserving split neural networks using EMNIST dataset. In: International Conference on Deep Sciences for Computing and Communications (pp. 332-344). Cham: Springer Nature Switzerland.
Mokhtar, M. A., Fathy, M., Dahab, Y. A., & Sayed, E. A. (2025). A dual enhanced stochastic gradient descent method with dynamic momentum and step size adaptation for improved optimization performance. Scientific Reports, 15(1), 40389. https://doi.org/10.1038/s41598-025-24689-y
Nouri, A., & Seyyedsalehi, S. A. (2023). Eigen value based loss function for training attractors in iterated autoencoders. Neural Networks, 161, 575-588. https://doi.org/10.1016/j.neunet.2023.02.003
Ozkan, Y., & Erdogmus P. (2024). Evaluation of classification performance of new layered convolutional neural network architecture on offline handwritten signature images. Symmetry, 16(6), 649. https://doi.org/10.3390/sym16060649
Reddi, S. J., Kale, S., & Kumar, S. (2018). On the convergence of Adam and beyond. https://doi.org/10.48550/arXiv.1904.09237
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400-407.
Ruder, S. (2016). An overview of gradient descent optimization algorithms. https://doi.org/10.48550/arXiv.1609.04747
Salimans, T., & Kingma, D. P. (2016). Weight normalization: A simple reparameterization to accelerate training of deep neural networks. Advances in Neural Information Processing Systems, 29.
Shalev-Shwartz, S., Shamir, O., & Shammah, S. (2017). Failures of gradient-based deep learning. International Conference on Machine Learning, 3067-3075.
Shen, F., Shen, C., Zhou, X., Yang, Y., & Shen, H. T. (2016). Face image classification by pooling raw features. Pattern Recognition, 54, 94-103. https://doi.org/10.1016/j.patcog.2016.01.010
Sutskever, I., Martens, J., Dahl, G., & Hinton, G. (2013). On the importance of initialization and momentum in deep learning. Proceedings of the 30th International Conference on Machine Learning, 1139-1147, PMLR.
Szymański, J., Skarżyński, K., Szutenberg, B., Ratkowska, K., & Drywa, S. (2026). The dataset for extending EMNIST evaluation. Scientific Data, 13(1), 73. https://doi.org/10.1038/s41597-025-06291-z
Tieleman, T., & Hinton G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning.
Wu, H. (2018). CNN-Based Recognition of Handwritten Digits in MNIST Database. Research School of Computer Science. The Australia National University, Canberra.
Xiao, H., Rasul, K., & Vollgraf, R. (2017). Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747.
Yassine, E. L., Wang, S. L., Taresh, M. M., & Ali, T. A. A. (2024). Enhancing breast cancer diagnosis using deep learning and gradient multi-verse optimizer: a robust biomedical data analysis approach. PeerJ Computer Science, 10, e2578. https://doi.org/10.7717/peerj-cs.2578/supp-1
Yazan, E., & Talu, M. F. (2017). Comparison of the stochastic gradient descent based optimization techniques. 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), 1-5. https://doi.org/10.1109/IDAP.2017.8090299
Yu, S., He, B., & Fang, L. (2025). Multi-step short-term forecasting of photovoltaic power utilizing TimesNet with enhanced feature extraction and a novel loss function. Applied Energy, 388, 125645. https://doi.org/10.1016/j.apenergy.2025.12564
Zeiler, M. D. (2012). ADADELTA: An adaptive learning rate method. arXiv:1212.5701.
Zhang, C., Tu, C., Wang, Z., Cao, W., & Cao, W. (2026). ResNet18-ThunderSVM: Hybrid intelligence for handwritten digit recognition by fusing deep spatial features and high-performance classification. Scientific Reports, 16(1). https://doi.org/10.1038/s41598-026-38258-4
Zhao, J., & Jiao, L. (2019). Sparse deep tensor extreme learning machine for pattern classification. IEEE Access, 7, 119181-119191. https://doi.org/10.1109/ACCESS.2019.2924647
Zheng, Y., Iwana, B. K., & Uchida, S. (2018). Discovering class-wise trends of max-pooling in subspace. 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), 98-103. https://doi.org/10.1109/ICFHR-2018.2018.00026

Details

Primary Language

English

Subjects

Deep Learning, Neural Networks

Journal Section

Research Article

Authors

Yasin Özkan ^*
0000-0002-2029-0856
Türkiye

Pakize Erdoğmuş
0000-0003-2172-5767
Türkiye

Publication Date

March 31, 2026

Submission Date

December 12, 2025

Acceptance Date

March 2, 2026

Published in Issue

Year 2026 Volume: 13 Number: 1

DOI

https://doi.org/10.54287/gujsa.1840916

IZ

https://izlik.org/JA49WR65MM

Cite

RIS / Bibtex

APA

Özkan, Y., & Erdoğmuş, P. (2026). The Effect of Si-CL Loss Function and Different Optimization Algorithms in Improving CNN Performance. Gazi University Journal of Science Part A: Engineering and Innovation, 13(1), 269-305. https://doi.org/10.54287/gujsa.1840916

AMA

1.Özkan Y, Erdoğmuş P. The Effect of Si-CL Loss Function and Different Optimization Algorithms in Improving CNN Performance. GU J Sci, Part A. 2026;13(1):269-305. doi:10.54287/gujsa.1840916

Chicago

Özkan, Yasin, and Pakize Erdoğmuş. 2026. “The Effect of Si-CL Loss Function and Different Optimization Algorithms in Improving CNN Performance”. Gazi University Journal of Science Part A: Engineering and Innovation 13 (1): 269-305. https://doi.org/10.54287/gujsa.1840916.

EndNote

Özkan Y, Erdoğmuş P (March 1, 2026) The Effect of Si-CL Loss Function and Different Optimization Algorithms in Improving CNN Performance. Gazi University Journal of Science Part A: Engineering and Innovation 13 1 269–305.

IEEE

[1]Y. Özkan and P. Erdoğmuş, “The Effect of Si-CL Loss Function and Different Optimization Algorithms in Improving CNN Performance”, GU J Sci, Part A, vol. 13, no. 1, pp. 269–305, Mar. 2026, doi: 10.54287/gujsa.1840916.

ISNAD

Özkan, Yasin - Erdoğmuş, Pakize. “The Effect of Si-CL Loss Function and Different Optimization Algorithms in Improving CNN Performance”. Gazi University Journal of Science Part A: Engineering and Innovation 13/1 (March 1, 2026): 269-305. https://doi.org/10.54287/gujsa.1840916.

JAMA

1.Özkan Y, Erdoğmuş P. The Effect of Si-CL Loss Function and Different Optimization Algorithms in Improving CNN Performance. GU J Sci, Part A. 2026;13:269–305.

MLA

Özkan, Yasin, and Pakize Erdoğmuş. “The Effect of Si-CL Loss Function and Different Optimization Algorithms in Improving CNN Performance”. Gazi University Journal of Science Part A: Engineering and Innovation, vol. 13, no. 1, Mar. 2026, pp. 269-05, doi:10.54287/gujsa.1840916.

Vancouver

1.Yasin Özkan, Pakize Erdoğmuş. The Effect of Si-CL Loss Function and Different Optimization Algorithms in Improving CNN Performance. GU J Sci, Part A. 2026 Mar. 1;13(1):269-305. doi:10.54287/gujsa.1840916