TY - JOUR T1 - Tam Bağlı İleri Beslemeli Sinir Ağlarda Softplus Aktivasyon Fonksiyonu İçin Yeni Bir Yol Ağırlık Başlatma Düzeni TT - A New Weight Initialization Scheme for Softplus Activation Function in Fully Connected FeedForward Neural Networks AU - Dönmez, Emrah AU - Turhan, Mehmet Murat AU - Talu, Muhammed Fatih PY - 2025 DA - October Y2 - 2025 DO - 10.46387/bjesr.1717780 JF - Mühendislik Bilimleri ve Araştırmaları Dergisi JO - BJESR PB - Bandırma Onyedi Eylül Üniversitesi WT - DergiPark SN - 2687-4415 SP - 193 EP - 202 VL - 7 IS - 2 LA - tr AB - Derin sinir ağlarının eğitilebilirliği, özellikle katman sayısı arttığında, gradyan sönümlenmesi veya gradyan patlaması gibi sorunlar nedeniyle zorlaşmaktadır. Bu çalışmada, Softplus aktivasyon fonksiyonu kullanılan tam bağlı ileri beslemeli sinir ağları için, merkezi limit teoremine dayalı iki yeni yol ağırlık başlatma düzeni önerilmiştir. İlk düzen yalnızca ileri yönde sinyal istatistiklerini, ikincisi ise hem ileri hem de geri yönde istatistikleri korumayı hedeflemektedir. CIFAR-10 ve CIFAR-100 veri kümeleri üzerinde yapılan deneylerde, derin mimarilerde yalnızca ileri yöndeki istatistik korumanın yeterli olmadığı, iki yönlü korumanın ise ağın eğitilebilirliğini anlamlı şekilde artırdığı gözlemlenmiştir. Özellikle 25 gizli katmanlı ağlarda, yalnızca iki yönlü koruma sağlayan başlatma düzeniyle başarılı eğitim gerçekleştirilebilmiştir. Elde edilen sonuçlar, aktivasyon fonksiyonu dinamiklerine uygun başlatma stratejilerinin, derin sinir ağlarının eğitilebilmesinde belirleyici rol oynadığını göstermektedir. KW - Aktivasyon Fonksiyonu KW - Derin Sinir Ağları KW - Softplus KW - Yol Ağırlık Başlatma N2 - The trainability of deep neural networks becomes challenging as the number of layers increases, primarily due to issues such as vanishing or exploding gradients. In this study, two new weight initialization schemes based on the central limit theorem are proposed for fully connected feedforward neural networks using the Softplus activation function. The first scheme aims to preserve signal statistics only in the forward direction, while the second aims to preserve them in both forward and backward directions. Experiments conducted on the CIFAR-10 and CIFAR-100 datasets demonstrate that preserving only forward signal statistics is insufficient in deep architectures, whereas preserving statistics in both directions significantly improves trainability. Particularly in architectures with 25 hidden layers, successful training was achieved only with the bidirectional initialization scheme. The results reveal that initialization strategies compatible with the dynamics of the activation function play a critical role in enabling the effective training of deep neural networks. CR - Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nat. 2015 5217553, vol. 521, no. 7553, pp. 436–444, May 2015, doi: 10.1038/nature14539. CR - A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems, F. Pereira, C. J. Burges, L. Bottou, and K. Q. Weinberger, Eds., Curran Associates, Inc., 2012. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf CR - J. Kaur and W. Singh, “A systematic review of object detection from images using deep learning,” Multimed. Tools Appl., vol. 83, no. 4, pp. 12253–12338, 2024. CR - D. Guo et al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,” arXiv Prepr. arXiv2501.12948, 2025. CR - Y. Bengio, “Learning Deep Architectures for AI,” Found. Trends® Mach. Learn., vol. 2, no. 1, pp. 1–127, 2009. CR - F. Chollet, Deep Learning with Python (Second Edition). New York: Manning Publications, 2021. CR - Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE Trans. Neural Networks, vol. 5, no. 2, pp. 157–166, 1994. CR - G. Philipp, D. Song, and J. G. Carbonell, “The exploding gradient problem demystified-definition, prevalence, impact, origin, tradeoffs, and solutions,” arXiv Prepr. arXiv1712.05577v4, 2018. CR - K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2016, pp. 770–778. CR - S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. CR - S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, in ICML’15. JMLR.org, 2015, pp. 448–456. CR - J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv Prepr. arXiv1607.06450, 2016. CR - K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034. CR - V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, in ICML’10. Madison, WI, USA: Omnipress, 2010, pp. 807–814. CR - D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and Accurate Deep Network Learning by Exponential Linear Units(ELUs),” in 4th International Conference on Learning Representations, ICLR 2016,San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2016. [Online]. Available: http://arxiv.org/abs/1511.07289 CR - X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Y. W. Teh and M. Titterington, Eds., in Proceedings of Machine Learning Research, vol. 9. Chia Laguna Resort, Sardinia, Italy: PMLR, Jan. 2010, pp. 249–256. [Online]. Available: https://proceedings.mlr.press/v9/glorot10a.html CR - Y. Li, C. Fan, Y. Li, Q. Wu, and Y. Ming, “Improving deep neural network with Multiple Parametric Exponential Linear Units,” Neurocomputing, vol. 301, 2018. CR - S. K. Kumar, “On weight initialization in deep neural networks,” arXiv Prepr. arXiv1704.08863v2, 2017. CR - S. R. Dubey, S. K. Singh, and B. B. Chaudhuri, “Activation functions in deep learning: A comprehensive survey and benchmark,” Neurocomputing, vol. 503, pp. 92–108, Sep. 2022. CR - M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken, “Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,” Neural Networks, vol. 6, no. 6, pp. 861–867, 1993. CR - P. F. Verhulst, “Recherches mathématiques sur la loi d’accroissement de la population,” Mémoires l’Académie R. des Sci. B.-lett. Bruxelles, vol. 18, pp. 1–41, 1845. CR - F.-C. Chen, “Back-propagation neural networks for nonlinear self-tuning adaptive control,” IEEE Control Syst. Mag., vol. 10, no. 3, pp. 44–48, 1990. CR - C. Dugas, Y. Bengio, F. Bélisle, C. Nadeau, and R. Garcia, “Incorporating Second-Order Functional Knowledge for Better Option Pricing,” in Advances in Neural Information Processing Systems, T. Leen, T. Dietterich, and V. Tresp, Eds., MIT Press, 2000. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2000/file/44968aece94f667e4095002d140b5896-Paper.pdf CR - Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. CR - L. and O. G. B. and M. K.-R. LeCun Yann A.and Bottou, “Efficient BackProp,” in Neural Networks: Tricks of the Trade: Second Edition, G. B. and M. K.-R. Montavon Grégoireand Orr, Ed., Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 9–48. CR - G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, “Self-normalizing neural networks,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, in NIPS’17. Red Hook, NY, USA: Curran Associates Inc., 2017, pp. 972–981. CR - G. Zhang and H. Li, “Effectiveness of scaled exponentially-regularized linear units (SERLUs),” arXiv Prepr. arXiv1807.10117, Jun. 2018. CR - N. L. Johnson, “Systems of Frequency Curves Generated by Methods of Translation,” Biometrika, vol. 36, no. 1/2, p. 149, Jun. 1949. CR - A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” 2009. UR - https://doi.org/10.46387/bjesr.1717780 L1 - https://dergipark.org.tr/tr/download/article-file/4950837 ER -