Research Article

Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs

Volume: 14 Number: 1 January 21, 2026
TR EN

Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs

Abstract

This study re-evaluates the concepts of differentiability and mathematical continuity in activation functions and experimentally investigates the impact of these features on the performance of convolutional neural networks (CNNs). Although differentiable activation functions, such as Swish and Mish, have become prevalent in recent years, the contribution of these features to learning performance remains ambiguous, particularly in shallow architectures. A controlled comparative study was conducted on the CIFAR-10 dataset. Five common activation functions, namely ReLU, Leaky ReLU, Softplus, Swish, and Mish, were evaluated. Each function was trained thrice under the same CNN architecture and training settings, and the classification accuracy and training stability were analyzed in tandem. The findings of this study indicated that ReLU, which is not differentiable at the zero point, achieved the highest average accuracy. In contrast, Leaky ReLU demonstrated a more stable learning behavior with reduced variance. The Swish and Mish functions, which possess differentiable and smooth structures, demonstrated consistent behavior throughout the learning process; however, they did not exhibit the anticipated superiority in terms of accuracy. The Softplus function demonstrated the least favorable performance, attributable to its proclivity for saturating. These findings suggest that, despite the appeal of mathematical differentiability and continuity in theory, they do not offer a direct advantage in terms of CNN performance in practice. The effectiveness of activation functions is predominantly shaped by the architectural structure and learning dynamics. This study proposes an original perspective that emphasizes the prioritization of evaluations based on empirical data over mathematical assumptions when selecting activation functions.

Keywords

Supporting Institution

This research received no external funding.

Ethical Statement

This study does not involve human or animal participants. All procedures followed scientific and ethical principles, and all referenced studies are appropriately cited.

Thanks

All parts of the study were conducted by the Author.

References

  1. Al Wafi, A. Z., & Nugroho, A. (2024). A comparative study of modern activation functions on multi-label CNNs to predict genres based on movie posters. Jurnal Ilmiah Teknik Elektro Komputer Dan Informatika, 10(3), 608–624. https://doi.org/10.26555/jiteki.v10i3.29540
  2. Atoum, I. A. (2023). Adaptive rectified linear unit (ARELU) for classification problems to solve dying problem in deep learning. International Journal of Advanced Computer Science and Applications, 14(2), 97–102. https://doi.org/10.14569/ijacsa.2023.0140212
  3. Chanana, G. (2025). Performance analysis of activation functions in molecular property prediction using Message Passing Graph Neural Networks. Chemical Physics, 591, Article 112591. https://doi.org/10.1016/j.chemphys.2024.112591
  4. Dugas, C., Bengio, Y., Bélisle, F., Nadeau, C., & Garcia, R. (2000). Incorporating second-order functional knowledge for better option pricing. In Advances in neural information processing systems (Vol. 13).
  5. Feng, H. S., & Yang, C. H. (2023). PolyLU: A simple and robust polynomial-based linear unit activation function for deep learning. IEEE Access, 11, 101347–101358. https://doi.org/10.1109/access.2023.3315308
  6. Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (pp. 315-323).
  7. Hannibal, S., Jentzen, A., & Thang, D. M. (2024). Non-convergence to global minimizers in data driven supervised deep learning: Adam and stochastic gradient descent optimization provably fail to converge to global minimizers in the training of deep neural networks with ReLU activation. arXiv. https://doi.org/10.48550/arXiv.2410.10533
  8. Hong, D., Chen, D., Zhang, Y., Zhou, H., Xie, L., Ju, J., & Tang, J. (2024). Efficient adversarial attack based on moment estimation and lookahead gradient. Electronics, 13(13), Article 2464. https://doi.org/10.3390/electronics13132464

Details

Primary Language

English

Subjects

Deep Learning, Neural Networks

Journal Section

Research Article

Publication Date

January 21, 2026

Submission Date

May 23, 2025

Acceptance Date

December 10, 2025

Published in Issue

Year 2026 Volume: 14 Number: 1

APA
Delibaş, E. (2026). Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs. Duzce University Journal of Science and Technology, 14(1), 189-198. https://doi.org/10.29130/dubited.1704852
AMA
1.Delibaş E. Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs. DUBİTED. 2026;14(1):189-198. doi:10.29130/dubited.1704852
Chicago
Delibaş, Emre. 2026. “Rethinking Differentiability: A Comparative Study of Smooth Vs Non-Smooth Activation Functions in CNNs”. Duzce University Journal of Science and Technology 14 (1): 189-98. https://doi.org/10.29130/dubited.1704852.
EndNote
Delibaş E (January 1, 2026) Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs. Duzce University Journal of Science and Technology 14 1 189–198.
IEEE
[1]E. Delibaş, “Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs”, DUBİTED, vol. 14, no. 1, pp. 189–198, Jan. 2026, doi: 10.29130/dubited.1704852.
ISNAD
Delibaş, Emre. “Rethinking Differentiability: A Comparative Study of Smooth Vs Non-Smooth Activation Functions in CNNs”. Duzce University Journal of Science and Technology 14/1 (January 1, 2026): 189-198. https://doi.org/10.29130/dubited.1704852.
JAMA
1.Delibaş E. Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs. DUBİTED. 2026;14:189–198.
MLA
Delibaş, Emre. “Rethinking Differentiability: A Comparative Study of Smooth Vs Non-Smooth Activation Functions in CNNs”. Duzce University Journal of Science and Technology, vol. 14, no. 1, Jan. 2026, pp. 189-98, doi:10.29130/dubited.1704852.
Vancouver
1.Emre Delibaş. Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs. DUBİTED. 2026 Jan. 1;14(1):189-98. doi:10.29130/dubited.1704852