Research Article
BibTex RIS Cite

The Effect of Si-CL Loss Function and Different Optimization Algorithms in Improving CNN Performance

Year 2026, Volume: 13 Issue: 1 , 269 - 305 , 31.03.2026
https://doi.org/10.54287/gujsa.1840916
https://izlik.org/JA49WR65MM

Abstract

Convolutional Neural Networks (CNNs) used for image classification often have complex architectures involving large images, time-costly training processes, and a large number of layers and hyperparameters. Therefore, improving the accuracy of CNN is a challenging process that requires time, resources and specialized knowledge. In this study, to improve the performance of CNN models, experiments were conducted on the MNIST, EMNIST, and Fashion-MNIST datasets using different optimization algorithms and a loss function (Si-CL) from the literature. The findings of the study reveal the effects of loss functions and optimization algorithms on model performance in detail. The SGDM, Adam, RMSProp, RMSProp, AdaMax, AdaDelta and AdaGrad optimization algorithms used during the experiments are examined and the results show that the Adam algorithm performs the best in terms of both training accuracy and test accuracy. The SGDM algorithm was particularly effective at larger batch sizes and low learning rates, but required longer training times compared to the Adam algorithm. The Si-CL loss function used in the study performed better than the traditional cross entropy loss. The model trained with the Si-CL loss function achieved higher results in terms of both training and test accuracy, reduced training time and lower loss value. This allowed the model to learn faster and more efficiently.

References

  • Agarap, A. F. (2018). Deep learning using rectified linear units (relu). arXiv:1803.08375.
  • Akhtar, M., Tanveer, M., & Arshad, M. (2025). HawkEye: A robust loss function for regression with bounded, smooth, and insensitive zone characteristics. Applied Soft Computing, 113118. https://doi.org/10.1016/j.asoc.2025.113118
  • Arsenault, R., Poulin, A., Côté, P., & Brissette, F. (2014). Comparison of stochastic optimization algorithms in hydrological model calibration. Journal of Hydrologic Engineering, 19(7), 1374-1384. https://doi.org/10.1061/(ASCE)HE.1943-5584.000093
  • Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade, 437-478, Springer, Berlin, Heidelberg.
  • Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT'2010, 177-186, Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-7908-2604-3_16
  • Chen, X., Yu, R., Ullah, S., Wu, D., Li, Z., Li, Q., Qi, H., Liu, J., Liu, M., & Zhang, Y. (2022). A novel loss function of deep learning in wind speed forecasting. Energy, 238, 121808. https://doi.org/10.1016/j.energy.2021.121808
  • Cohen, G., Afshar, S., Tapson, J., & Van Schaik, A. (2017, May). EMNIST: Extending MNIST to handwritten letters. In: 2017 international joint conference on neural networks (IJCNN) (pp. 2921-2926). IEEE.
  • Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul), 2121-2159.
  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
  • Gulzar, Y., Hamid, Y., Soomro, A. B., Alwan, A. A., & Journaux, L. (2018). A convolution neural network-based seed classification system. Symmetry, 12(12), 2018. https://doi.org/10.3390/sym12122018
  • Hallen, R. (2017). A study of gradient-based algorithms. arXiv:1704.06101.
  • Hinton, G., Srivastava, N., & Swersky, K. (2012). Neural Networks: Tricks of the Trade. In: Hinton, G., Srivastava, N., & Swersky, K. (Eds.), Neural Networks: Tricks of the Trade, 437-478, Berlin, Heidelberg, Springer.
  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980.
  • Krizhevsky, A., Sutskever I., & Hinton G E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 1097-1105.
  • LeCun, Y., Bottou L., Bengio Y., & Haffner P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791
  • Liu, M., Yao, D., Liu, Z., Guo, J., & Chen, J. (2023). An improved Adam optimization algorithm combining adaptive coefficients and composite gradients based on randomized block coordinate descent. Computational intelligence and neuroscience, 2023(1), 4765891. https://doi.org/10.1155/2023/4765891
  • Lin, C., Abe, S., Zheng, S., Li, X., & Chun, P. J. (2025). A structure‐oriented loss function for automated semantic segmentation of bridge point clouds. Computer‐Aided Civil and Infrastructure Engineering, 40(6), 801-816. https://doi.org/10.1111/mice.13422
  • Madhivanan, V., & Mathivanan, P. (2022, March). An evaluation on the performance of privacy preserving split neural networks using EMNIST dataset. In: International Conference on Deep Sciences for Computing and Communications (pp. 332-344). Cham: Springer Nature Switzerland.
  • Mokhtar, M. A., Fathy, M., Dahab, Y. A., & Sayed, E. A. (2025). A dual enhanced stochastic gradient descent method with dynamic momentum and step size adaptation for improved optimization performance. Scientific Reports, 15(1), 40389. https://doi.org/10.1038/s41598-025-24689-y
  • Nouri, A., & Seyyedsalehi, S. A. (2023). Eigen value based loss function for training attractors in iterated autoencoders. Neural Networks, 161, 575-588. https://doi.org/10.1016/j.neunet.2023.02.003
  • Ozkan, Y., & Erdogmus P. (2024). Evaluation of classification performance of new layered convolutional neural network architecture on offline handwritten signature images. Symmetry, 16(6), 649. https://doi.org/10.3390/sym16060649
  • Reddi, S. J., Kale, S., & Kumar, S. (2018). On the convergence of Adam and beyond. https://doi.org/10.48550/arXiv.1904.09237
  • Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400-407.
  • Ruder, S. (2016). An overview of gradient descent optimization algorithms. https://doi.org/10.48550/arXiv.1609.04747
  • Salimans, T., & Kingma, D. P. (2016). Weight normalization: A simple reparameterization to accelerate training of deep neural networks. Advances in Neural Information Processing Systems, 29.
  • Shalev-Shwartz, S., Shamir, O., & Shammah, S. (2017). Failures of gradient-based deep learning. International Conference on Machine Learning, 3067-3075.
  • Shen, F., Shen, C., Zhou, X., Yang, Y., & Shen, H. T. (2016). Face image classification by pooling raw features. Pattern Recognition, 54, 94-103. https://doi.org/10.1016/j.patcog.2016.01.010
  • Sutskever, I., Martens, J., Dahl, G., & Hinton, G. (2013). On the importance of initialization and momentum in deep learning. Proceedings of the 30th International Conference on Machine Learning, 1139-1147, PMLR.
  • Szymański, J., Skarżyński, K., Szutenberg, B., Ratkowska, K., & Drywa, S. (2026). The dataset for extending EMNIST evaluation. Scientific Data, 13(1), 73. https://doi.org/10.1038/s41597-025-06291-z
  • Tieleman, T., & Hinton G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning.
  • Wu, H. (2018). CNN-Based Recognition of Handwritten Digits in MNIST Database. Research School of Computer Science. The Australia National University, Canberra.
  • Xiao, H., Rasul, K., & Vollgraf, R. (2017). Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747.
  • Yassine, E. L., Wang, S. L., Taresh, M. M., & Ali, T. A. A. (2024). Enhancing breast cancer diagnosis using deep learning and gradient multi-verse optimizer: a robust biomedical data analysis approach. PeerJ Computer Science, 10, e2578. https://doi.org/10.7717/peerj-cs.2578/supp-1
  • Yazan, E., & Talu, M. F. (2017). Comparison of the stochastic gradient descent based optimization techniques. 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), 1-5. https://doi.org/10.1109/IDAP.2017.8090299
  • Yu, S., He, B., & Fang, L. (2025). Multi-step short-term forecasting of photovoltaic power utilizing TimesNet with enhanced feature extraction and a novel loss function. Applied Energy, 388, 125645. https://doi.org/10.1016/j.apenergy.2025.12564
  • Zeiler, M. D. (2012). ADADELTA: An adaptive learning rate method. arXiv:1212.5701.
  • Zhang, C., Tu, C., Wang, Z., Cao, W., & Cao, W. (2026). ResNet18-ThunderSVM: Hybrid intelligence for handwritten digit recognition by fusing deep spatial features and high-performance classification. Scientific Reports, 16(1). https://doi.org/10.1038/s41598-026-38258-4
  • Zhao, J., & Jiao, L. (2019). Sparse deep tensor extreme learning machine for pattern classification. IEEE Access, 7, 119181-119191. https://doi.org/10.1109/ACCESS.2019.2924647
  • Zheng, Y., Iwana, B. K., & Uchida, S. (2018). Discovering class-wise trends of max-pooling in subspace. 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), 98-103. https://doi.org/10.1109/ICFHR-2018.2018.00026
There are 39 citations in total.

Details

Primary Language English
Subjects Deep Learning, Neural Networks
Journal Section Research Article
Authors

Yasin Özkan 0000-0002-2029-0856

Pakize Erdoğmuş 0000-0003-2172-5767

Submission Date December 12, 2025
Acceptance Date March 2, 2026
Publication Date March 31, 2026
DOI https://doi.org/10.54287/gujsa.1840916
IZ https://izlik.org/JA49WR65MM
Published in Issue Year 2026 Volume: 13 Issue: 1

Cite

APA Özkan, Y., & Erdoğmuş, P. (2026). The Effect of Si-CL Loss Function and Different Optimization Algorithms in Improving CNN Performance. Gazi University Journal of Science Part A: Engineering and Innovation, 13(1), 269-305. https://doi.org/10.54287/gujsa.1840916