Araştırma Makalesi
BibTex RIS Kaynak Göster

Tam Bağlı İleri Beslemeli Sinir Ağlarda Softplus Aktivasyon Fonksiyonu İçin Yeni Bir Yol Ağırlık Başlatma Düzeni

Yıl 2025, Cilt: 7 Sayı: 2, 193 - 202
https://doi.org/10.46387/bjesr.1717780

Öz

Derin sinir ağlarının eğitilebilirliği, özellikle katman sayısı arttığında, gradyan sönümlenmesi veya gradyan patlaması gibi sorunlar nedeniyle zorlaşmaktadır. Bu çalışmada, Softplus aktivasyon fonksiyonu kullanılan tam bağlı ileri beslemeli sinir ağları için, merkezi limit teoremine dayalı iki yeni yol ağırlık başlatma düzeni önerilmiştir. İlk düzen yalnızca ileri yönde sinyal istatistiklerini, ikincisi ise hem ileri hem de geri yönde istatistikleri korumayı hedeflemektedir. CIFAR-10 ve CIFAR-100 veri kümeleri üzerinde yapılan deneylerde, derin mimarilerde yalnızca ileri yöndeki istatistik korumanın yeterli olmadığı, iki yönlü korumanın ise ağın eğitilebilirliğini anlamlı şekilde artırdığı gözlemlenmiştir. Özellikle 25 gizli katmanlı ağlarda, yalnızca iki yönlü koruma sağlayan başlatma düzeniyle başarılı eğitim gerçekleştirilebilmiştir. Elde edilen sonuçlar, aktivasyon fonksiyonu dinamiklerine uygun başlatma stratejilerinin, derin sinir ağlarının eğitilebilmesinde belirleyici rol oynadığını göstermektedir.

Kaynakça

  • Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nat. 2015 5217553, vol. 521, no. 7553, pp. 436–444, May 2015, doi: 10.1038/nature14539.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems, F. Pereira, C. J. Burges, L. Bottou, and K. Q. Weinberger, Eds., Curran Associates, Inc., 2012. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
  • J. Kaur and W. Singh, “A systematic review of object detection from images using deep learning,” Multimed. Tools Appl., vol. 83, no. 4, pp. 12253–12338, 2024.
  • D. Guo et al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,” arXiv Prepr. arXiv2501.12948, 2025.
  • Y. Bengio, “Learning Deep Architectures for AI,” Found. Trends® Mach. Learn., vol. 2, no. 1, pp. 1–127, 2009.
  • F. Chollet, Deep Learning with Python (Second Edition). New York: Manning Publications, 2021.
  • Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE Trans. Neural Networks, vol. 5, no. 2, pp. 157–166, 1994.
  • G. Philipp, D. Song, and J. G. Carbonell, “The exploding gradient problem demystified-definition, prevalence, impact, origin, tradeoffs, and solutions,” arXiv Prepr. arXiv1712.05577v4, 2018.
  • K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2016, pp. 770–778.
  • S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
  • S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, in ICML’15. JMLR.org, 2015, pp. 448–456.
  • J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv Prepr. arXiv1607.06450, 2016.
  • K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034.
  • V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, in ICML’10. Madison, WI, USA: Omnipress, 2010, pp. 807–814.
  • D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and Accurate Deep Network Learning by Exponential Linear Units(ELUs),” in 4th International Conference on Learning Representations, ICLR 2016,San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2016. [Online]. Available: http://arxiv.org/abs/1511.07289
  • X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Y. W. Teh and M. Titterington, Eds., in Proceedings of Machine Learning Research, vol. 9. Chia Laguna Resort, Sardinia, Italy: PMLR, Jan. 2010, pp. 249–256. [Online]. Available: https://proceedings.mlr.press/v9/glorot10a.html
  • Y. Li, C. Fan, Y. Li, Q. Wu, and Y. Ming, “Improving deep neural network with Multiple Parametric Exponential Linear Units,” Neurocomputing, vol. 301, 2018.
  • S. K. Kumar, “On weight initialization in deep neural networks,” arXiv Prepr. arXiv1704.08863v2, 2017.
  • S. R. Dubey, S. K. Singh, and B. B. Chaudhuri, “Activation functions in deep learning: A comprehensive survey and benchmark,” Neurocomputing, vol. 503, pp. 92–108, Sep. 2022.
  • M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken, “Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,” Neural Networks, vol. 6, no. 6, pp. 861–867, 1993.
  • P. F. Verhulst, “Recherches mathématiques sur la loi d’accroissement de la population,” Mémoires l’Académie R. des Sci. B.-lett. Bruxelles, vol. 18, pp. 1–41, 1845.
  • F.-C. Chen, “Back-propagation neural networks for nonlinear self-tuning adaptive control,” IEEE Control Syst. Mag., vol. 10, no. 3, pp. 44–48, 1990.
  • C. Dugas, Y. Bengio, F. Bélisle, C. Nadeau, and R. Garcia, “Incorporating Second-Order Functional Knowledge for Better Option Pricing,” in Advances in Neural Information Processing Systems, T. Leen, T. Dietterich, and V. Tresp, Eds., MIT Press, 2000. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2000/file/44968aece94f667e4095002d140b5896-Paper.pdf
  • Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  • L. and O. G. B. and M. K.-R. LeCun Yann A.and Bottou, “Efficient BackProp,” in Neural Networks: Tricks of the Trade: Second Edition, G. B. and M. K.-R. Montavon Grégoireand Orr, Ed., Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 9–48.
  • G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, “Self-normalizing neural networks,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, in NIPS’17. Red Hook, NY, USA: Curran Associates Inc., 2017, pp. 972–981.
  • G. Zhang and H. Li, “Effectiveness of scaled exponentially-regularized linear units (SERLUs),” arXiv Prepr. arXiv1807.10117, Jun. 2018.
  • N. L. Johnson, “Systems of Frequency Curves Generated by Methods of Translation,” Biometrika, vol. 36, no. 1/2, p. 149, Jun. 1949.
  • A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” 2009.

A New Weight Initialization Scheme for Softplus Activation Function in Fully Connected FeedForward Neural Networks

Yıl 2025, Cilt: 7 Sayı: 2, 193 - 202
https://doi.org/10.46387/bjesr.1717780

Öz

The trainability of deep neural networks becomes challenging as the number of layers increases, primarily due to issues such as vanishing or exploding gradients. In this study, two new weight initialization schemes based on the central limit theorem are proposed for fully connected feedforward neural networks using the Softplus activation function. The first scheme aims to preserve signal statistics only in the forward direction, while the second aims to preserve them in both forward and backward directions. Experiments conducted on the CIFAR-10 and CIFAR-100 datasets demonstrate that preserving only forward signal statistics is insufficient in deep architectures, whereas preserving statistics in both directions significantly improves trainability. Particularly in architectures with 25 hidden layers, successful training was achieved only with the bidirectional initialization scheme. The results reveal that initialization strategies compatible with the dynamics of the activation function play a critical role in enabling the effective training of deep neural networks.

Kaynakça

  • Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nat. 2015 5217553, vol. 521, no. 7553, pp. 436–444, May 2015, doi: 10.1038/nature14539.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems, F. Pereira, C. J. Burges, L. Bottou, and K. Q. Weinberger, Eds., Curran Associates, Inc., 2012. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
  • J. Kaur and W. Singh, “A systematic review of object detection from images using deep learning,” Multimed. Tools Appl., vol. 83, no. 4, pp. 12253–12338, 2024.
  • D. Guo et al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,” arXiv Prepr. arXiv2501.12948, 2025.
  • Y. Bengio, “Learning Deep Architectures for AI,” Found. Trends® Mach. Learn., vol. 2, no. 1, pp. 1–127, 2009.
  • F. Chollet, Deep Learning with Python (Second Edition). New York: Manning Publications, 2021.
  • Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE Trans. Neural Networks, vol. 5, no. 2, pp. 157–166, 1994.
  • G. Philipp, D. Song, and J. G. Carbonell, “The exploding gradient problem demystified-definition, prevalence, impact, origin, tradeoffs, and solutions,” arXiv Prepr. arXiv1712.05577v4, 2018.
  • K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2016, pp. 770–778.
  • S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
  • S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, in ICML’15. JMLR.org, 2015, pp. 448–456.
  • J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv Prepr. arXiv1607.06450, 2016.
  • K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034.
  • V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, in ICML’10. Madison, WI, USA: Omnipress, 2010, pp. 807–814.
  • D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and Accurate Deep Network Learning by Exponential Linear Units(ELUs),” in 4th International Conference on Learning Representations, ICLR 2016,San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2016. [Online]. Available: http://arxiv.org/abs/1511.07289
  • X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Y. W. Teh and M. Titterington, Eds., in Proceedings of Machine Learning Research, vol. 9. Chia Laguna Resort, Sardinia, Italy: PMLR, Jan. 2010, pp. 249–256. [Online]. Available: https://proceedings.mlr.press/v9/glorot10a.html
  • Y. Li, C. Fan, Y. Li, Q. Wu, and Y. Ming, “Improving deep neural network with Multiple Parametric Exponential Linear Units,” Neurocomputing, vol. 301, 2018.
  • S. K. Kumar, “On weight initialization in deep neural networks,” arXiv Prepr. arXiv1704.08863v2, 2017.
  • S. R. Dubey, S. K. Singh, and B. B. Chaudhuri, “Activation functions in deep learning: A comprehensive survey and benchmark,” Neurocomputing, vol. 503, pp. 92–108, Sep. 2022.
  • M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken, “Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,” Neural Networks, vol. 6, no. 6, pp. 861–867, 1993.
  • P. F. Verhulst, “Recherches mathématiques sur la loi d’accroissement de la population,” Mémoires l’Académie R. des Sci. B.-lett. Bruxelles, vol. 18, pp. 1–41, 1845.
  • F.-C. Chen, “Back-propagation neural networks for nonlinear self-tuning adaptive control,” IEEE Control Syst. Mag., vol. 10, no. 3, pp. 44–48, 1990.
  • C. Dugas, Y. Bengio, F. Bélisle, C. Nadeau, and R. Garcia, “Incorporating Second-Order Functional Knowledge for Better Option Pricing,” in Advances in Neural Information Processing Systems, T. Leen, T. Dietterich, and V. Tresp, Eds., MIT Press, 2000. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2000/file/44968aece94f667e4095002d140b5896-Paper.pdf
  • Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  • L. and O. G. B. and M. K.-R. LeCun Yann A.and Bottou, “Efficient BackProp,” in Neural Networks: Tricks of the Trade: Second Edition, G. B. and M. K.-R. Montavon Grégoireand Orr, Ed., Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 9–48.
  • G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, “Self-normalizing neural networks,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, in NIPS’17. Red Hook, NY, USA: Curran Associates Inc., 2017, pp. 972–981.
  • G. Zhang and H. Li, “Effectiveness of scaled exponentially-regularized linear units (SERLUs),” arXiv Prepr. arXiv1807.10117, Jun. 2018.
  • N. L. Johnson, “Systems of Frequency Curves Generated by Methods of Translation,” Biometrika, vol. 36, no. 1/2, p. 149, Jun. 1949.
  • A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” 2009.
Toplam 29 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Makine Öğrenme (Diğer)
Bölüm Araştırma Makaleleri
Yazarlar

Mehmet Murat Turhan 0000-0003-4497-9102

Emrah Dönmez 0000-0003-3345-8344

Muhammed Fatih Talu 0000-0003-1166-8404

Erken Görünüm Tarihi 19 Ekim 2025
Yayımlanma Tarihi 22 Ekim 2025
Gönderilme Tarihi 11 Haziran 2025
Kabul Tarihi 6 Temmuz 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 7 Sayı: 2

Kaynak Göster

APA Turhan, M. M., Dönmez, E., & Talu, M. F. (2025). Tam Bağlı İleri Beslemeli Sinir Ağlarda Softplus Aktivasyon Fonksiyonu İçin Yeni Bir Yol Ağırlık Başlatma Düzeni. Mühendislik Bilimleri ve Araştırmaları Dergisi, 7(2), 193-202. https://doi.org/10.46387/bjesr.1717780
AMA Turhan MM, Dönmez E, Talu MF. Tam Bağlı İleri Beslemeli Sinir Ağlarda Softplus Aktivasyon Fonksiyonu İçin Yeni Bir Yol Ağırlık Başlatma Düzeni. Müh.Bil.ve Araş.Dergisi. Ekim 2025;7(2):193-202. doi:10.46387/bjesr.1717780
Chicago Turhan, Mehmet Murat, Emrah Dönmez, ve Muhammed Fatih Talu. “Tam Bağlı İleri Beslemeli Sinir Ağlarda Softplus Aktivasyon Fonksiyonu İçin Yeni Bir Yol Ağırlık Başlatma Düzeni”. Mühendislik Bilimleri ve Araştırmaları Dergisi 7, sy. 2 (Ekim 2025): 193-202. https://doi.org/10.46387/bjesr.1717780.
EndNote Turhan MM, Dönmez E, Talu MF (01 Ekim 2025) Tam Bağlı İleri Beslemeli Sinir Ağlarda Softplus Aktivasyon Fonksiyonu İçin Yeni Bir Yol Ağırlık Başlatma Düzeni. Mühendislik Bilimleri ve Araştırmaları Dergisi 7 2 193–202.
IEEE M. M. Turhan, E. Dönmez, ve M. F. Talu, “Tam Bağlı İleri Beslemeli Sinir Ağlarda Softplus Aktivasyon Fonksiyonu İçin Yeni Bir Yol Ağırlık Başlatma Düzeni”, Müh.Bil.ve Araş.Dergisi, c. 7, sy. 2, ss. 193–202, 2025, doi: 10.46387/bjesr.1717780.
ISNAD Turhan, Mehmet Murat vd. “Tam Bağlı İleri Beslemeli Sinir Ağlarda Softplus Aktivasyon Fonksiyonu İçin Yeni Bir Yol Ağırlık Başlatma Düzeni”. Mühendislik Bilimleri ve Araştırmaları Dergisi 7/2 (Ekim2025), 193-202. https://doi.org/10.46387/bjesr.1717780.
JAMA Turhan MM, Dönmez E, Talu MF. Tam Bağlı İleri Beslemeli Sinir Ağlarda Softplus Aktivasyon Fonksiyonu İçin Yeni Bir Yol Ağırlık Başlatma Düzeni. Müh.Bil.ve Araş.Dergisi. 2025;7:193–202.
MLA Turhan, Mehmet Murat vd. “Tam Bağlı İleri Beslemeli Sinir Ağlarda Softplus Aktivasyon Fonksiyonu İçin Yeni Bir Yol Ağırlık Başlatma Düzeni”. Mühendislik Bilimleri ve Araştırmaları Dergisi, c. 7, sy. 2, 2025, ss. 193-02, doi:10.46387/bjesr.1717780.
Vancouver Turhan MM, Dönmez E, Talu MF. Tam Bağlı İleri Beslemeli Sinir Ağlarda Softplus Aktivasyon Fonksiyonu İçin Yeni Bir Yol Ağırlık Başlatma Düzeni. Müh.Bil.ve Araş.Dergisi. 2025;7(2):193-202.