Rethinking Differentiability: A Comparative Study of Smooth vs Non-Smooth Activation Functions in CNNs
Abstract
Keywords
Activation Functions, Differentiability, Convolutional Neural Networks, Empirical Evaluation, Performance Analysis
Supporting Institution
Ethical Statement
Thanks
References
- Al Wafi, A. Z., & Nugroho, A. (2024). A comparative study of modern activation functions on multi-label CNNs to predict genres based on movie posters. Jurnal Ilmiah Teknik Elektro Komputer Dan Informatika, 10(3), 608–624. https://doi.org/10.26555/jiteki.v10i3.29540
- Atoum, I. A. (2023). Adaptive rectified linear unit (ARELU) for classification problems to solve dying problem in deep learning. International Journal of Advanced Computer Science and Applications, 14(2), 97–102. https://doi.org/10.14569/ijacsa.2023.0140212
- Chanana, G. (2025). Performance analysis of activation functions in molecular property prediction using Message Passing Graph Neural Networks. Chemical Physics, 591, Article 112591. https://doi.org/10.1016/j.chemphys.2024.112591
- Dugas, C., Bengio, Y., Bélisle, F., Nadeau, C., & Garcia, R. (2000). Incorporating second-order functional knowledge for better option pricing. In Advances in neural information processing systems (Vol. 13).
- Feng, H. S., & Yang, C. H. (2023). PolyLU: A simple and robust polynomial-based linear unit activation function for deep learning. IEEE Access, 11, 101347–101358. https://doi.org/10.1109/access.2023.3315308
- Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (pp. 315-323).
- Hannibal, S., Jentzen, A., & Thang, D. M. (2024). Non-convergence to global minimizers in data driven supervised deep learning: Adam and stochastic gradient descent optimization provably fail to converge to global minimizers in the training of deep neural networks with ReLU activation. arXiv. https://doi.org/10.48550/arXiv.2410.10533
- Hong, D., Chen, D., Zhang, Y., Zhou, H., Xie, L., Ju, J., & Tang, J. (2024). Efficient adversarial attack based on moment estimation and lookahead gradient. Electronics, 13(13), Article 2464. https://doi.org/10.3390/electronics13132464
- Islamov, R., Ajroldi, N., Orvieto, A., & Lucchi, A. (2024). Loss landscape characterization of neural networks without over-parametrization. arXiv. https://arxiv.org/abs/2410.12455v3
- Jagtap, A. D., Kawaguchi, K., & Karniadakis, G. E. (2020). Adaptive activation functions accelerate convergence in deep and physics-informed neural networks. Journal of Computational Physics, 404, Article 109136. https://doi.org/10.1016/j.jcp.2019.109136