Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs

Emre Delibaş

doi:10.29130/dubited.1704852

Research Article

Türevlenebilirliğin Yeniden Değerlendirilmesi: Evrişimli Sinir Ağlarında Sürekli ve Süreksiz Aktivasyon Fonksiyonlarının Karşılaştırmalı Bir İncelemesi

Year 2026, Volume: 14 Issue: 1, 189 - 198, 21.01.2026

Emre Delibaş

https://doi.org/10.29130/dubited.1704852

Abstract

Bu çalışma, aktivasyon fonksiyonlarında türevlenebilirlik ve matematiksel süreklilik kavramlarını yeniden değerlendirerek, bu özelliklerin evrişimli sinir ağlarının (CNN) performansı üzerindeki etkisini deneysel olarak incelemektedir. Swish ve Mish gibi türevlenebilir aktivasyon fonksiyonları son yıllarda yaygın olarak kullanılmaya başlanmış olsa da, bu özelliklerin öğrenme performansına katkısı, özellikle sığ mimarilerde belirsizliğini korumaktadır. Bu kapsamda, CIFAR-10 veri seti üzerinde kontrollü bir karşılaştırmalı deneysel çalışma gerçekleştirilmiştir. ReLU, Leaky ReLU, Softplus, Swish ve Mish olmak üzere beş yaygın aktivasyon fonksiyonu değerlendirilmiştir. Her fonksiyon, aynı CNN mimarisi ve eğitim ayarları altında üç kez eğitilmiş ve sınıflandırma doğruluğu ile eğitim sürecindeki kararlılık birlikte analiz edilmiştir. Çalışmanın bulguları, sıfır noktasında türevlenebilir olmayan ReLU’nun en yüksek ortalama doğruluğa ulaştığını ortaya koymuştur. Buna karşın, Leaky ReLU daha düşük varyansla daha kararlı bir öğrenme davranışı sergilemiştir. Türevlenebilir ve smooth (yumuşak) yapıya sahip olan Swish ve Mish fonksiyonları, öğrenme süreci boyunca tutarlı bir davranış göstermelerine rağmen, doğruluk açısından beklenen üstünlüğü ortaya koyamamıştır. Softplus fonksiyonu ise doygunluk eğilimi nedeniyle en zayıf performansı göstermiştir. Bu bulgular, matematiksel türevlenebilirlik ve sürekliliğin teoride cazip özellikler olmasına rağmen, pratikte CNN performansı açısından doğrudan bir avantaj sunmadığını göstermektedir. Aktivasyon fonksiyonlarının etkinliği büyük ölçüde mimari yapı ve öğrenme dinamikleri tarafından şekillenmektedir. Bu çalışma, aktivasyon fonksiyonu seçiminde matematiksel varsayımlar yerine ampirik verilere dayalı değerlendirmelere öncelik verilmesi gerektiğini vurgulayan özgün bir bakış açısı önermektedir.

Keywords

Aktivasyon Fonksiyonları , Türevlenebilirlik , Evrişimsel Sinir Ağları , Ampirik Değerlendirme , Peroformans Analizi

References

Al Wafi, A. Z., & Nugroho, A. (2024). A comparative study of modern activation functions on multi-label CNNs to predict genres based on movie posters. Jurnal Ilmiah Teknik Elektro Komputer Dan Informatika, 10(3), 608–624. https://doi.org/10.26555/jiteki.v10i3.29540
Atoum, I. A. (2023). Adaptive rectified linear unit (ARELU) for classification problems to solve dying problem in deep learning. International Journal of Advanced Computer Science and Applications, 14(2), 97–102. https://doi.org/10.14569/ijacsa.2023.0140212
Chanana, G. (2025). Performance analysis of activation functions in molecular property prediction using Message Passing Graph Neural Networks. Chemical Physics, 591, Article 112591. https://doi.org/10.1016/j.chemphys.2024.112591
Dugas, C., Bengio, Y., Bélisle, F., Nadeau, C., & Garcia, R. (2000). Incorporating second-order functional knowledge for better option pricing. In Advances in neural information processing systems (Vol. 13).
Feng, H. S., & Yang, C. H. (2023). PolyLU: A simple and robust polynomial-based linear unit activation function for deep learning. IEEE Access, 11, 101347–101358. https://doi.org/10.1109/access.2023.3315308
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (pp. 315-323).
Hannibal, S., Jentzen, A., & Thang, D. M. (2024). Non-convergence to global minimizers in data driven supervised deep learning: Adam and stochastic gradient descent optimization provably fail to converge to global minimizers in the training of deep neural networks with ReLU activation. arXiv. https://doi.org/10.48550/arXiv.2410.10533
Hong, D., Chen, D., Zhang, Y., Zhou, H., Xie, L., Ju, J., & Tang, J. (2024). Efficient adversarial attack based on moment estimation and lookahead gradient. Electronics, 13(13), Article 2464. https://doi.org/10.3390/electronics13132464
Islamov, R., Ajroldi, N., Orvieto, A., & Lucchi, A. (2024). Loss landscape characterization of neural networks without over-parametrization. arXiv. https://arxiv.org/abs/2410.12455v3
Jagtap, A. D., Kawaguchi, K., & Karniadakis, G. E. (2020). Adaptive activation functions accelerate convergence in deep and physics-informed neural networks. Journal of Computational Physics, 404, Article 109136. https://doi.org/10.1016/j.jcp.2019.109136
Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013) Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning (Vol. 28).
Misra, D. (2019). Mish: A self regularized non-monotonic activation function. arXiv. https://arxiv.org/abs/1908.08681v3
Nag, S., Bhattacharyya, M., Mukherjee, A., & Kundu, R. (2023). Serf: Towards better training of deep neural networks using log-Softplus ERror activation Function. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 5313–5322). Springer. https://doi.org/10.1109/wacv56688.2023.00529
Picchiotti, N., & Gori, M. (2022). Clustering-based interpretation of deep ReLU network. In Lecture Notes in Computer Science (Vol. 13196, pp. 403–412). https://doi.org/10.1007/978-3-031-08421-8_28
Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Swish: A self-gated activation function. arXiv. https://arxiv.org/abs/1710.05941
Sun, K., Yu, J., Zhang, L., & Dong, Z. (2020). A convolutional neural network model based on improved Softplus activation function. In Advances in Intelligent Systems and Computing (Vol. 1017, pp. 1326–1335). Springer. https://doi.org/10.1007/978-3-030-25128-4_164
Wang, X., Qin, Y., Wang, Y., Xiang, S., & Chen, H. (2019). ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis. Neurocomputing, 363, 88–98. https://doi.org/10.1016/j.neucom.2019.07.017
Xiu, H. H. (2020). Research on activation function in deep convolutional neural network. In Proceedings of the 2020 Conference on Artificial Intelligence and Healthcare (pp. 19–24). https://doi.org/10.1145/3433996.3434001

Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs

Year 2026, Volume: 14 Issue: 1, 189 - 198, 21.01.2026

Emre Delibaş

https://doi.org/10.29130/dubited.1704852

Abstract

This study re-evaluates the concepts of differentiability and mathematical continuity in activation functions and experimentally investigates the impact of these features on the performance of convolutional neural networks (CNNs). Although differentiable activation functions, such as Swish and Mish, have become prevalent in recent years, the contribution of these features to learning performance remains ambiguous, particularly in shallow architectures. A controlled comparative study was conducted on the CIFAR-10 dataset. Five common activation functions, namely ReLU, Leaky ReLU, Softplus, Swish, and Mish, were evaluated. Each function was trained thrice under the same CNN architecture and training settings, and the classification accuracy and training stability were analyzed in tandem. The findings of this study indicated that ReLU, which is not differentiable at the zero point, achieved the highest average accuracy. In contrast, Leaky ReLU demonstrated a more stable learning behavior with reduced variance. The Swish and Mish functions, which possess differentiable and smooth structures, demonstrated consistent behavior throughout the learning process; however, they did not exhibit the anticipated superiority in terms of accuracy. The Softplus function demonstrated the least favorable performance, attributable to its proclivity for saturating. These findings suggest that, despite the appeal of mathematical differentiability and continuity in theory, they do not offer a direct advantage in terms of CNN performance in practice. The effectiveness of activation functions is predominantly shaped by the architectural structure and learning dynamics. This study proposes an original perspective that emphasizes the prioritization of evaluations based on empirical data over mathematical assumptions when selecting activation functions.

Keywords

Activation Functions , Differentiability , Convolutional Neural Networks , Empirical Evaluation , Performance Analysis

Ethical Statement

This study does not involve human or animal participants. All procedures followed scientific and ethical principles, and all referenced studies are appropriately cited.

Supporting Institution

This research received no external funding.

Thanks

All parts of the study were conducted by the Author.

References

Al Wafi, A. Z., & Nugroho, A. (2024). A comparative study of modern activation functions on multi-label CNNs to predict genres based on movie posters. Jurnal Ilmiah Teknik Elektro Komputer Dan Informatika, 10(3), 608–624. https://doi.org/10.26555/jiteki.v10i3.29540
Atoum, I. A. (2023). Adaptive rectified linear unit (ARELU) for classification problems to solve dying problem in deep learning. International Journal of Advanced Computer Science and Applications, 14(2), 97–102. https://doi.org/10.14569/ijacsa.2023.0140212
Chanana, G. (2025). Performance analysis of activation functions in molecular property prediction using Message Passing Graph Neural Networks. Chemical Physics, 591, Article 112591. https://doi.org/10.1016/j.chemphys.2024.112591
Dugas, C., Bengio, Y., Bélisle, F., Nadeau, C., & Garcia, R. (2000). Incorporating second-order functional knowledge for better option pricing. In Advances in neural information processing systems (Vol. 13).
Feng, H. S., & Yang, C. H. (2023). PolyLU: A simple and robust polynomial-based linear unit activation function for deep learning. IEEE Access, 11, 101347–101358. https://doi.org/10.1109/access.2023.3315308
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (pp. 315-323).
Hannibal, S., Jentzen, A., & Thang, D. M. (2024). Non-convergence to global minimizers in data driven supervised deep learning: Adam and stochastic gradient descent optimization provably fail to converge to global minimizers in the training of deep neural networks with ReLU activation. arXiv. https://doi.org/10.48550/arXiv.2410.10533
Hong, D., Chen, D., Zhang, Y., Zhou, H., Xie, L., Ju, J., & Tang, J. (2024). Efficient adversarial attack based on moment estimation and lookahead gradient. Electronics, 13(13), Article 2464. https://doi.org/10.3390/electronics13132464
Islamov, R., Ajroldi, N., Orvieto, A., & Lucchi, A. (2024). Loss landscape characterization of neural networks without over-parametrization. arXiv. https://arxiv.org/abs/2410.12455v3
Jagtap, A. D., Kawaguchi, K., & Karniadakis, G. E. (2020). Adaptive activation functions accelerate convergence in deep and physics-informed neural networks. Journal of Computational Physics, 404, Article 109136. https://doi.org/10.1016/j.jcp.2019.109136
Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013) Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning (Vol. 28).
Misra, D. (2019). Mish: A self regularized non-monotonic activation function. arXiv. https://arxiv.org/abs/1908.08681v3
Nag, S., Bhattacharyya, M., Mukherjee, A., & Kundu, R. (2023). Serf: Towards better training of deep neural networks using log-Softplus ERror activation Function. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 5313–5322). Springer. https://doi.org/10.1109/wacv56688.2023.00529
Picchiotti, N., & Gori, M. (2022). Clustering-based interpretation of deep ReLU network. In Lecture Notes in Computer Science (Vol. 13196, pp. 403–412). https://doi.org/10.1007/978-3-031-08421-8_28
Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Swish: A self-gated activation function. arXiv. https://arxiv.org/abs/1710.05941
Sun, K., Yu, J., Zhang, L., & Dong, Z. (2020). A convolutional neural network model based on improved Softplus activation function. In Advances in Intelligent Systems and Computing (Vol. 1017, pp. 1326–1335). Springer. https://doi.org/10.1007/978-3-030-25128-4_164
Wang, X., Qin, Y., Wang, Y., Xiang, S., & Chen, H. (2019). ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis. Neurocomputing, 363, 88–98. https://doi.org/10.1016/j.neucom.2019.07.017
Xiu, H. H. (2020). Research on activation function in deep convolutional neural network. In Proceedings of the 2020 Conference on Artificial Intelligence and Healthcare (pp. 19–24). https://doi.org/10.1145/3433996.3434001

There are 18 citations in total.

Details

Primary Language	English
Subjects	Deep Learning, Neural Networks
Journal Section	Research Article
Authors	Emre Delibaş 0000-0001-7564-5020
Submission Date	May 23, 2025
Acceptance Date	December 10, 2025
Publication Date	January 21, 2026
Published in Issue	Year 2026 Volume: 14 Issue: 1

Cite

APA	Delibaş, E. (2026). Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs. Duzce University Journal of Science and Technology, 14(1), 189-198. https://doi.org/10.29130/dubited.1704852
AMA	Delibaş E. Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs. DUBİTED. January 2026;14(1):189-198. doi:10.29130/dubited.1704852
Chicago	Delibaş, Emre. “Rethinking Differentiability: A Comparative Study of Smooth Vs Non-Smooth Activation Functions in CNNs”. Duzce University Journal of Science and Technology 14, no. 1 (January 2026): 189-98. https://doi.org/10.29130/dubited.1704852.
EndNote	Delibaş E (January 1, 2026) Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs. Duzce University Journal of Science and Technology 14 1 189–198.
IEEE	E. Delibaş, “Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs”, DUBİTED, vol. 14, no. 1, pp. 189–198, 2026, doi: 10.29130/dubited.1704852.
ISNAD	Delibaş, Emre. “Rethinking Differentiability: A Comparative Study of Smooth Vs Non-Smooth Activation Functions in CNNs”. Duzce University Journal of Science and Technology 14/1 (January2026), 189-198. https://doi.org/10.29130/dubited.1704852.
JAMA	Delibaş E. Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs. DUBİTED. 2026;14:189–198.
MLA	Delibaş, Emre. “Rethinking Differentiability: A Comparative Study of Smooth Vs Non-Smooth Activation Functions in CNNs”. Duzce University Journal of Science and Technology, vol. 14, no. 1, 2026, pp. 189-98, doi:10.29130/dubited.1704852.
Vancouver	Delibaş E. Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs. DUBİTED. 2026;14(1):189-98.

Article Files

Full Text