Derin Sinir Ağları için Hiperparametre Metodlarının ve Kitlerinin İncelenmesi

Sara Altun; Muhammed Fatih Talu

doi:10.24012/dumf.767700

TR EN

Derin Sinir Ağları için Hiperparametre Metodlarının ve Kitlerinin İncelenmesi

Öz

Otomatik makine öğrenimi (AutoML) ve derin sinir ağları birçok hiperparametreye sahiptir. Karmaşık ve hesapsal maliyet olarak pahalı makine öğrenme modellerine son zamanlarda ilginin artması, hiperparametre optimizasyonu (HPO) araştırmalarının yeniden canlanmasına neden olmuştur. HPO’un başlangıcı epey uzun yıllara dayanmaktadır ve derin öğrenme ağları ile popülaritesi artmıştır. Bu makale, HPO ile ilgili en önemli konuların gözden geçirilmesini sağlamaktadır. İlk olarak model eğitimi ve yapısı ile ilgili temel hiperparametreler tanıtılmakta ve değer aralığı için önemleri ve yöntemleri tartışılmaktadır. Sonrasında, özellikle derin öğrenme ağları için etkinliklerini ve doğruluklarını kapsayan optimizasyon algoritmalarına ve uygulanabilirliklerine odaklanılmaktadır. Aynı zamanda bu çalışmada HPO için önemli olan ve araştırmacılar tarafından tercih edilen HPO kitlerini incelenmiştir. İncelenen HPO kitlerinin en gelişmiş arama algoritmaları, büyük derin öğrenme araçları ile fizibilite ve kullanıcılar tarafından tasarlanan yeni modüller için genişletilebilme durumlarını karşılaştırmaktadır. HPO derin öğrenme algoritmalarına uygulandığında ortaya çıkan problemler, optimizasyon algoritmaları arasında bir karşılaştırma ve sınırlı hesaplama kaynaklarına sahip model değerlendirmesi için öne çıkan yaklaşımlarla sonuçlanmaktadır.

Anahtar Kelimeler

Investigation of Hyperparametry Methods and Kits for Deep Neural Networks

Abstract

Automatic machine learning (AutoML) and deep neural networks have many hyperparameters. The recent increasing interest in complex and cost-effective machine learning models has led to the revival of hyperparameter optimization (HPO) research. The beginning of HPO has been around for many years and its popularity has increased with deep learning networks. This article provides important issues related to the revision of the HPO. First, basic hyperparameters related to the training and structure of the model are introduced and their importance and methods for the value range are discussed. Then, it focuses on optimization algorithms and their applicability, especially for deep learning networks, covering their effectiveness and accuracy. Then, it focuses on optimization algorithms and their applicability, especially for deep learning networks, covering their effectiveness and accuracy. At the same time, this study examined the HPO kits that are important for HPO and are preferred by researchers. The most advanced search algorithms of the analyzed HPO kits compare the feasibility and expandability for new modules designed by users with large deep learning tools. Problems that arise when HPO is applied to deep learning algorithms result in prominent approaches for model evaluation with a comparison between optimization algorithms and limited computational resources.

Keywords

References

[1] Sculley, D., Snoek, J., Wiltschko, A., and Rahimi, A., (2018).Winner’s curse? On Pace, Progress, and Empirical Rigor, In: International Conference on Learning Representations Workshop track, published online: iclr.cc
[2] King, R. D., Feng, C., & Sutherland, A. (1995). Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence an International Journal, 9(3), 289-333.
[3] Kohavi, R., & John, G. H. (1995). Automatic parameter selection by minimizing estimated error. In Machine Learning Proceedings 1995 (pp. 304-312). Morgan Kaufmann.
[4] Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine learning. Neural and Statistical Classification, 13(1994), 1-298.
[5] Ripley, B. D. (1993). Statistical aspects of neural networks. Networks and chaos—statistical and probabilistic aspects, 50, 40-123.
[6] Rodriguez, J. (2018). Understanding Hyperparameters Optimization in Deep Learning Models: Concepts and Tools.
[7] Tan, M., & Le, Q. V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946.
[8] Ma, N., Zhang, X., Zheng, H. T., & Sun, J. (2018). Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV) (pp. 116-131).

[9] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510-4520).
[10] Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., & Le, Q. V. (2019). Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2820-2828).
[11] Ng, A. (2017). Improving deep neural networks: Hyperparameter tuning, regularization and optimization. Deeplearning. ai on Coursera.
[12] Robbins, H., & Monro, S. (1951). A stochastic approximation method. The annals of mathematical statistics, 400-407.
[13] Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7).
[14] Hinton, G., Srivastava, N., & Swersky, K. (2012). Neural networks for machine learning. Coursera, video lectures, 264(1).
[15] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[16] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
[17] Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade (pp. 437-478). Springer, Berlin, Heidelberg.
[18] Hamby, D. M. (1994). A review of techniques for parameter sensitivity analysis of environmental models. Environmental monitoring and assessment, 32(2), 135-154.
[19] Breierova, L., & Choudhari, M. (1996). An introduction to sensitivity analysis. Massachusetts Institute of Technology.
[20] https://www.jeremyjordan.me/, 10 Mayıs 2020.
[21] Lau, S. (2017). Learning rate schedules and adaptive learning rate methods for deep learning. Towards Data Science.
[22] Smith, L. N. (2017, March). Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 464-472). IEEE.
[23] https://github.com/bckenstler/CLR. 14 Mayıs 2020.
[24] Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.
[25] Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527-1554.
[26] Yu, T., & Zhu, H. (2020). Hyper-Parameter Optimization: A Review of Algorithms and Applications. arXiv preprint arXiv:2003.05689.
[27] Heaton, J. (2008). The number of hidden layers. Heaton Research Inc.
[28] Perez, L., & Wang, J. (2017). The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621.
[29] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.
[30] Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
[31] Walia, A. S. (2017). Activation functions and it’s types-Which is better?. Towards Data Science, 29.
[32] Nair, V., and Hinton, G. E., (2010). Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) pp. 807-814.
[33] Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013, February). Maxout networks. In International conference on machine learning (pp. 1319-1327).
[34] Ramachandran, P., Zoph, B., and Le, Q. V., (2017). Swish: a Self-gated Activation Function. arXiv preprint arXiv:1710.05941, 7.
[35] Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv preprint arXiv:1710.05941.
[36] Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13(1), 281-305.
[37] Montgomery, D. C. (2017). Design and analysis of experiments. John wiley & sons.
[38] Joseph, R. (2018). Grid Search for model tuning.
[39] Močkus, J. (1975). On Bayesian methods for seeking the extremum. In Optimization techniques IFIP technical conference (pp. 400-404). Springer, Berlin, Heidelberg.
[40] Mockus, J., Tiesis, V., & Zilinskas, A. (1978). The application of Bayesian methods for seeking the extremum. Towards global optimization, 2(117-129), 2.
[41] Jones, D. R., Schonlau, M., & Welch, W. J. (1998). Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4), 455-492.
[42] Yu, T., & Zhu, H. (2020). Hyper-Parameter Optimization: A Review of Algorithms and Applications. arXiv preprint arXiv:2003.05689.
[43] Eggensperger, K., Feurer, M., Hutter, F., Bergstra, J., Snoek, J., Hoos, H., & Leyton-Brown, K. (2013, December). Towards an empirical foundation for assessing bayesian optimization of hyperparameters. In NIPS workshop on Bayesian Optimization in Theory and Practice (Vol. 10, p. 3).
[44] Rasmussen, C. E. (2003, February). Gaussian processes in machine learning. In Summer School on Machine Learning (pp. 63-71). Springer, Berlin, Heidelberg.
[45] Hutter, F., Kotthoff, L., and Vanschoren, J., (2019). Automated Machine Learning. Springer: New York, NY, USA.
[46] Feurer, M., & Hutter, F. (2019). Hyperparameter optimization. In Automated Machine Learning (pp. 3-33). Springer, Cham.
[47] Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2011, January). Sequential model-based optimization for general algorithm configuration. In International conference on learning and intelligent optimization (pp. 507-523). Springer, Berlin, Heidelberg.
[48] Li, L., Jamieson, K., Rostamizadeh, A., Gonina, E., Hardt, M., Recht, B., & Talwalkar, A. (2018). Massively parallel hyperparameter tuning. arXiv preprint arXiv:1810.05934.
[49] Thornton, C., Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2013, August). Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 847-855).
[50] Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2015). Efficient and robust automated machine learning. In Advances in neural information processing systems (pp. 2962-2970).
[51] Bergstra, J. S., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. In Advances in neural information processing systems (pp. 2546-2554).
[52] Sparks, E. R., Talwalkar, A., Haas, D., Franklin, M. J., Jordan, M. I., & Kraska, T. (2015, August). Automating model search for large scale machine learning. In Proceedings of the Sixth ACM Symposium on Cloud Computing (pp. 368-380).
[53] Zhang, Y., Bahadori, M. T., Su, H., & Sun, J. (2016, August). FLASH: fast Bayesian optimization for data analytic pipelines. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2065-2074).
[54] Loshchilov, I., & Hutter, F. (2016). CMA-ES for hyperparameter optimization of deep neural networks. arXiv preprint arXiv:1604.07269.
[55] Swersky, K., Snoek, J., & Adams, R. P. (2014). Freeze-thaw Bayesian optimization. arXiv preprint arXiv:1406.3896.
[56] Golovin, D., Solnik, B., Moitra, S., Kochanski, G., Karro, J., & Sculley, D. (2017, August). Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1487-1495).
[57] Liaw, R., Liang, E., Nishihara, R., Moritz, P., Gonzalez, J. E., & Stoica, I. (2018). Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118.
[58] Microsoft., (2018), Neural Network Intelligence. https://github.com/microsoft/nni#nni-released-reminder.
[59] Provost, F., Jensen, D., & Oates, T. (1999, August). Efficient progressive sampling. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 23-32).
[60] Bissuel, A. (2020). Hyper-parameter Optimization Algorithms: A Short Review. https://www. automl.org/blog_bohb/.
[61] Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2017). Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research, 18(1), 6765-6816.
[62] Falkner, S., Klein, A., & Hutter, F. (2018). BOHB: Robust and efficient hyperparameter optimization at scale. arXiv preprint arXiv:1807.01774.
[63] Simon, D. (2013). Evolutionary optimization algorithms. John Wiley & Sons.
[64] Hansen, N. (2016). The CMA evolution strategy: A tutorial. arXiv preprint arXiv:1604.00772.
[65] Shiffman, D., Fry, S., and Marsh, Z., (2012). The Nature of Code. (pp. 323-330).
[66] Yang, X. S. (2009). Firefly Algorithms Formultimodal Optimization, Proceedings of the Stochastic Algorithms: Foundations and Applications, Lecture Notes in Computing Sciences,. Springer, Sapporo, Japan. vol. 5792. (pp. 178-178).
[67] Krishnanand, K.N., Ghose, D. (2005). Detection of Multiple Source Locations Using a Glowworm Metaphor with Applications to Collective Robotics. In IEEE Swarm Intelligence Symposium. (pp. 84-91).
[68] Dorigo M., Maniezzo, V., Colorni, A.. (1991) The Ant System: An Autocatalytic Optimizing Process. Tech. Rep. No. 91- 016. Dipartimento di Elettronica, Politecnico di Milano, Italy.
[69] Kennedy, J., Eberhart, R. C. (1995). Particle Swarm Optimization. IEEE International Conference on Neural Networks, vol. IV, Piscataway, NJ. (pp. 1942-1948).
[70] Jiang, M., Yuan, D., Cheng, Y..( 2009). Improved Artificial Fish Swarm Algorithm. In Fifth International Conference on Natural Computation. (pp. 281-285).
[71] Karaboga, D., Akay, B.. (2009). A Survey: Algorithms Simulating Bee Swarm Intelligence. Artificial Intelligence Review, vol. 31 no., 1-4, (pp. 61-85).
[72] Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W. M., Donahue, J., Razavi, A., ... & Fernando, C. (2017). Population based training of neural networks. arXiv preprint arXiv:1711.09846.
[73] Li, A., Spyra, O., Perel, S., Dalibard, V., Jaderberg, M., Gu, C., ... & Gupta, P. (2019, July). A generalized framework for population based training. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1791-1799).
[74] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).
[75] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).

Details

Primary Language

Turkish

Subjects

-

Journal Section

Review

Authors

Sara Altun ^*
0000-0003-2877-7105
Türkiye

Muhammed Fatih Talu
Türkiye

Publication Date

March 30, 2021

Submission Date

July 10, 2020

Acceptance Date

December 2, 2020

Published in Issue

Year 2021 Volume: 12 Number: 2

DOI

https://doi.org/10.24012/dumf.767700

IZ

https://izlik.org/JA65GY74WK

Cite

RIS / Bibtex

IEEE

[1]S. Altun and M. F. Talu, “Derin Sinir Ağları için Hiperparametre Metodlarının ve Kitlerinin İncelenmesi”, DUJE, vol. 12, no. 2, pp. 187–199, Mar. 2021, doi: 10.24012/dumf.767700.

Cited By

Hyperparameter optimization of pre-trained convolutional neural networks using adolescent identity search algorithm

Neural Computing and Applications

https://doi.org/10.1007/s00521-023-09121-8

Evaluation of YOLOv8 Model Series with HOP for Object Detection in Complex Agriculture Domains

International Journal of Pure and Applied Sciences

https://doi.org/10.29132/ijpas.1448068