Derleme
BibTex RIS Kaynak Göster

Derin Sinir Ağları için Hiperparametre Metodlarının ve Kitlerinin İncelenmesi

Yıl 2021, Cilt: 12 Sayı: 2, 187 - 199, 30.03.2021
https://doi.org/10.24012/dumf.767700

Öz

Otomatik makine öğrenimi (AutoML) ve derin sinir ağları birçok hiperparametreye sahiptir. Karmaşık ve hesapsal maliyet olarak pahalı makine öğrenme modellerine son zamanlarda ilginin artması, hiperparametre optimizasyonu (HPO) araştırmalarının yeniden canlanmasına neden olmuştur. HPO’un başlangıcı epey uzun yıllara dayanmaktadır ve derin öğrenme ağları ile popülaritesi artmıştır. Bu makale, HPO ile ilgili en önemli konuların gözden geçirilmesini sağlamaktadır. İlk olarak model eğitimi ve yapısı ile ilgili temel hiperparametreler tanıtılmakta ve değer aralığı için önemleri ve yöntemleri tartışılmaktadır. Sonrasında, özellikle derin öğrenme ağları için etkinliklerini ve doğruluklarını kapsayan optimizasyon algoritmalarına ve uygulanabilirliklerine odaklanılmaktadır. Aynı zamanda bu çalışmada HPO için önemli olan ve araştırmacılar tarafından tercih edilen HPO kitlerini incelenmiştir. İncelenen HPO kitlerinin en gelişmiş arama algoritmaları, büyük derin öğrenme araçları ile fizibilite ve kullanıcılar tarafından tasarlanan yeni modüller için genişletilebilme durumlarını karşılaştırmaktadır. HPO derin öğrenme algoritmalarına uygulandığında ortaya çıkan problemler, optimizasyon algoritmaları arasında bir karşılaştırma ve sınırlı hesaplama kaynaklarına sahip model değerlendirmesi için öne çıkan yaklaşımlarla sonuçlanmaktadır.

Kaynakça

  • [1] Sculley, D., Snoek, J., Wiltschko, A., and Rahimi, A., (2018).Winner’s curse? On Pace, Progress, and Empirical Rigor, In: International Conference on Learning Representations Workshop track, published online: iclr.cc
  • [2] King, R. D., Feng, C., & Sutherland, A. (1995). Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence an International Journal, 9(3), 289-333.
  • [3] Kohavi, R., & John, G. H. (1995). Automatic parameter selection by minimizing estimated error. In Machine Learning Proceedings 1995 (pp. 304-312). Morgan Kaufmann.
  • [4] Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine learning. Neural and Statistical Classification, 13(1994), 1-298.
  • [5] Ripley, B. D. (1993). Statistical aspects of neural networks. Networks and chaos—statistical and probabilistic aspects, 50, 40-123.
  • [6] Rodriguez, J. (2018). Understanding Hyperparameters Optimization in Deep Learning Models: Concepts and Tools.
  • [7] Tan, M., & Le, Q. V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946.
  • [8] Ma, N., Zhang, X., Zheng, H. T., & Sun, J. (2018). Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV) (pp. 116-131).
  • [9] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510-4520).
  • [10] Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., & Le, Q. V. (2019). Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2820-2828).
  • [11] Ng, A. (2017). Improving deep neural networks: Hyperparameter tuning, regularization and optimization. Deeplearning. ai on Coursera.
  • [12] Robbins, H., & Monro, S. (1951). A stochastic approximation method. The annals of mathematical statistics, 400-407.
  • [13] Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7).
  • [14] Hinton, G., Srivastava, N., & Swersky, K. (2012). Neural networks for machine learning. Coursera, video lectures, 264(1).
  • [15] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  • [16] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
  • [17] Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade (pp. 437-478). Springer, Berlin, Heidelberg.
  • [18] Hamby, D. M. (1994). A review of techniques for parameter sensitivity analysis of environmental models. Environmental monitoring and assessment, 32(2), 135-154.
  • [19] Breierova, L., & Choudhari, M. (1996). An introduction to sensitivity analysis. Massachusetts Institute of Technology.
  • [20] https://www.jeremyjordan.me/, 10 Mayıs 2020.
  • [21] Lau, S. (2017). Learning rate schedules and adaptive learning rate methods for deep learning. Towards Data Science.
  • [22] Smith, L. N. (2017, March). Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 464-472). IEEE.
  • [23] https://github.com/bckenstler/CLR. 14 Mayıs 2020.
  • [24] Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.
  • [25] Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527-1554.
  • [26] Yu, T., & Zhu, H. (2020). Hyper-Parameter Optimization: A Review of Algorithms and Applications. arXiv preprint arXiv:2003.05689.
  • [27] Heaton, J. (2008). The number of hidden layers. Heaton Research Inc.
  • [28] Perez, L., & Wang, J. (2017). The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621.
  • [29] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.
  • [30] Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
  • [31] Walia, A. S. (2017). Activation functions and it’s types-Which is better?. Towards Data Science, 29.
  • [32] Nair, V., and Hinton, G. E., (2010). Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) pp. 807-814.
  • [33] Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013, February). Maxout networks. In International conference on machine learning (pp. 1319-1327).
  • [34] Ramachandran, P., Zoph, B., and Le, Q. V., (2017). Swish: a Self-gated Activation Function. arXiv preprint arXiv:1710.05941, 7.
  • [35] Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv preprint arXiv:1710.05941.
  • [36] Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13(1), 281-305.
  • [37] Montgomery, D. C. (2017). Design and analysis of experiments. John wiley & sons.
  • [38] Joseph, R. (2018). Grid Search for model tuning.
  • [39] Močkus, J. (1975). On Bayesian methods for seeking the extremum. In Optimization techniques IFIP technical conference (pp. 400-404). Springer, Berlin, Heidelberg.
  • [40] Mockus, J., Tiesis, V., & Zilinskas, A. (1978). The application of Bayesian methods for seeking the extremum. Towards global optimization, 2(117-129), 2.
  • [41] Jones, D. R., Schonlau, M., & Welch, W. J. (1998). Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4), 455-492.
  • [42] Yu, T., & Zhu, H. (2020). Hyper-Parameter Optimization: A Review of Algorithms and Applications. arXiv preprint arXiv:2003.05689.
  • [43] Eggensperger, K., Feurer, M., Hutter, F., Bergstra, J., Snoek, J., Hoos, H., & Leyton-Brown, K. (2013, December). Towards an empirical foundation for assessing bayesian optimization of hyperparameters. In NIPS workshop on Bayesian Optimization in Theory and Practice (Vol. 10, p. 3).
  • [44] Rasmussen, C. E. (2003, February). Gaussian processes in machine learning. In Summer School on Machine Learning (pp. 63-71). Springer, Berlin, Heidelberg.
  • [45] Hutter, F., Kotthoff, L., and Vanschoren, J., (2019). Automated Machine Learning. Springer: New York, NY, USA.
  • [46] Feurer, M., & Hutter, F. (2019). Hyperparameter optimization. In Automated Machine Learning (pp. 3-33). Springer, Cham.
  • [47] Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2011, January). Sequential model-based optimization for general algorithm configuration. In International conference on learning and intelligent optimization (pp. 507-523). Springer, Berlin, Heidelberg.
  • [48] Li, L., Jamieson, K., Rostamizadeh, A., Gonina, E., Hardt, M., Recht, B., & Talwalkar, A. (2018). Massively parallel hyperparameter tuning. arXiv preprint arXiv:1810.05934.
  • [49] Thornton, C., Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2013, August). Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 847-855).
  • [50] Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2015). Efficient and robust automated machine learning. In Advances in neural information processing systems (pp. 2962-2970).
  • [51] Bergstra, J. S., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. In Advances in neural information processing systems (pp. 2546-2554).
  • [52] Sparks, E. R., Talwalkar, A., Haas, D., Franklin, M. J., Jordan, M. I., & Kraska, T. (2015, August). Automating model search for large scale machine learning. In Proceedings of the Sixth ACM Symposium on Cloud Computing (pp. 368-380).
  • [53] Zhang, Y., Bahadori, M. T., Su, H., & Sun, J. (2016, August). FLASH: fast Bayesian optimization for data analytic pipelines. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2065-2074).
  • [54] Loshchilov, I., & Hutter, F. (2016). CMA-ES for hyperparameter optimization of deep neural networks. arXiv preprint arXiv:1604.07269.
  • [55] Swersky, K., Snoek, J., & Adams, R. P. (2014). Freeze-thaw Bayesian optimization. arXiv preprint arXiv:1406.3896.
  • [56] Golovin, D., Solnik, B., Moitra, S., Kochanski, G., Karro, J., & Sculley, D. (2017, August). Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1487-1495).
  • [57] Liaw, R., Liang, E., Nishihara, R., Moritz, P., Gonzalez, J. E., & Stoica, I. (2018). Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118.
  • [58] Microsoft., (2018), Neural Network Intelligence. https://github.com/microsoft/nni#nni-released-reminder.
  • [59] Provost, F., Jensen, D., & Oates, T. (1999, August). Efficient progressive sampling. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 23-32).
  • [60] Bissuel, A. (2020). Hyper-parameter Optimization Algorithms: A Short Review. https://www. automl.org/blog_bohb/.
  • [61] Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2017). Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research, 18(1), 6765-6816.
  • [62] Falkner, S., Klein, A., & Hutter, F. (2018). BOHB: Robust and efficient hyperparameter optimization at scale. arXiv preprint arXiv:1807.01774.
  • [63] Simon, D. (2013). Evolutionary optimization algorithms. John Wiley & Sons.
  • [64] Hansen, N. (2016). The CMA evolution strategy: A tutorial. arXiv preprint arXiv:1604.00772.
  • [65] Shiffman, D., Fry, S., and Marsh, Z., (2012). The Nature of Code. (pp. 323-330).
  • [66] Yang, X. S. (2009). Firefly Algorithms Formultimodal Optimization, Proceedings of the Stochastic Algorithms: Foundations and Applications, Lecture Notes in Computing Sciences,. Springer, Sapporo, Japan. vol. 5792. (pp. 178-178).
  • [67] Krishnanand, K.N., Ghose, D. (2005). Detection of Multiple Source Locations Using a Glowworm Metaphor with Applications to Collective Robotics. In IEEE Swarm Intelligence Symposium. (pp. 84-91).
  • [68] Dorigo M., Maniezzo, V., Colorni, A.. (1991) The Ant System: An Autocatalytic Optimizing Process. Tech. Rep. No. 91- 016. Dipartimento di Elettronica, Politecnico di Milano, Italy.
  • [69] Kennedy, J., Eberhart, R. C. (1995). Particle Swarm Optimization. IEEE International Conference on Neural Networks, vol. IV, Piscataway, NJ. (pp. 1942-1948).
  • [70] Jiang, M., Yuan, D., Cheng, Y..( 2009). Improved Artificial Fish Swarm Algorithm. In Fifth International Conference on Natural Computation. (pp. 281-285).
  • [71] Karaboga, D., Akay, B.. (2009). A Survey: Algorithms Simulating Bee Swarm Intelligence. Artificial Intelligence Review, vol. 31 no., 1-4, (pp. 61-85).
  • [72] Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W. M., Donahue, J., Razavi, A., ... & Fernando, C. (2017). Population based training of neural networks. arXiv preprint arXiv:1711.09846.
  • [73] Li, A., Spyra, O., Perel, S., Dalibard, V., Jaderberg, M., Gu, C., ... & Gupta, P. (2019, July). A generalized framework for population based training. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1791-1799).
  • [74] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).
  • [75] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).

Investigation of Hyperparametry Methods and Kits for Deep Neural Networks

Yıl 2021, Cilt: 12 Sayı: 2, 187 - 199, 30.03.2021
https://doi.org/10.24012/dumf.767700

Öz

Automatic machine learning (AutoML) and deep neural networks have many hyperparameters. The recent increasing interest in complex and cost-effective machine learning models has led to the revival of hyperparameter optimization (HPO) research. The beginning of HPO has been around for many years and its popularity has increased with deep learning networks. This article provides important issues related to the revision of the HPO. First, basic hyperparameters related to the training and structure of the model are introduced and their importance and methods for the value range are discussed. Then, it focuses on optimization algorithms and their applicability, especially for deep learning networks, covering their effectiveness and accuracy. Then, it focuses on optimization algorithms and their applicability, especially for deep learning networks, covering their effectiveness and accuracy. At the same time, this study examined the HPO kits that are important for HPO and are preferred by researchers. The most advanced search algorithms of the analyzed HPO kits compare the feasibility and expandability for new modules designed by users with large deep learning tools. Problems that arise when HPO is applied to deep learning algorithms result in prominent approaches for model evaluation with a comparison between optimization algorithms and limited computational resources.

Kaynakça

  • [1] Sculley, D., Snoek, J., Wiltschko, A., and Rahimi, A., (2018).Winner’s curse? On Pace, Progress, and Empirical Rigor, In: International Conference on Learning Representations Workshop track, published online: iclr.cc
  • [2] King, R. D., Feng, C., & Sutherland, A. (1995). Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence an International Journal, 9(3), 289-333.
  • [3] Kohavi, R., & John, G. H. (1995). Automatic parameter selection by minimizing estimated error. In Machine Learning Proceedings 1995 (pp. 304-312). Morgan Kaufmann.
  • [4] Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine learning. Neural and Statistical Classification, 13(1994), 1-298.
  • [5] Ripley, B. D. (1993). Statistical aspects of neural networks. Networks and chaos—statistical and probabilistic aspects, 50, 40-123.
  • [6] Rodriguez, J. (2018). Understanding Hyperparameters Optimization in Deep Learning Models: Concepts and Tools.
  • [7] Tan, M., & Le, Q. V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946.
  • [8] Ma, N., Zhang, X., Zheng, H. T., & Sun, J. (2018). Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV) (pp. 116-131).
  • [9] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510-4520).
  • [10] Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., & Le, Q. V. (2019). Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2820-2828).
  • [11] Ng, A. (2017). Improving deep neural networks: Hyperparameter tuning, regularization and optimization. Deeplearning. ai on Coursera.
  • [12] Robbins, H., & Monro, S. (1951). A stochastic approximation method. The annals of mathematical statistics, 400-407.
  • [13] Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7).
  • [14] Hinton, G., Srivastava, N., & Swersky, K. (2012). Neural networks for machine learning. Coursera, video lectures, 264(1).
  • [15] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  • [16] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
  • [17] Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade (pp. 437-478). Springer, Berlin, Heidelberg.
  • [18] Hamby, D. M. (1994). A review of techniques for parameter sensitivity analysis of environmental models. Environmental monitoring and assessment, 32(2), 135-154.
  • [19] Breierova, L., & Choudhari, M. (1996). An introduction to sensitivity analysis. Massachusetts Institute of Technology.
  • [20] https://www.jeremyjordan.me/, 10 Mayıs 2020.
  • [21] Lau, S. (2017). Learning rate schedules and adaptive learning rate methods for deep learning. Towards Data Science.
  • [22] Smith, L. N. (2017, March). Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 464-472). IEEE.
  • [23] https://github.com/bckenstler/CLR. 14 Mayıs 2020.
  • [24] Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.
  • [25] Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527-1554.
  • [26] Yu, T., & Zhu, H. (2020). Hyper-Parameter Optimization: A Review of Algorithms and Applications. arXiv preprint arXiv:2003.05689.
  • [27] Heaton, J. (2008). The number of hidden layers. Heaton Research Inc.
  • [28] Perez, L., & Wang, J. (2017). The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621.
  • [29] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.
  • [30] Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
  • [31] Walia, A. S. (2017). Activation functions and it’s types-Which is better?. Towards Data Science, 29.
  • [32] Nair, V., and Hinton, G. E., (2010). Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) pp. 807-814.
  • [33] Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013, February). Maxout networks. In International conference on machine learning (pp. 1319-1327).
  • [34] Ramachandran, P., Zoph, B., and Le, Q. V., (2017). Swish: a Self-gated Activation Function. arXiv preprint arXiv:1710.05941, 7.
  • [35] Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv preprint arXiv:1710.05941.
  • [36] Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13(1), 281-305.
  • [37] Montgomery, D. C. (2017). Design and analysis of experiments. John wiley & sons.
  • [38] Joseph, R. (2018). Grid Search for model tuning.
  • [39] Močkus, J. (1975). On Bayesian methods for seeking the extremum. In Optimization techniques IFIP technical conference (pp. 400-404). Springer, Berlin, Heidelberg.
  • [40] Mockus, J., Tiesis, V., & Zilinskas, A. (1978). The application of Bayesian methods for seeking the extremum. Towards global optimization, 2(117-129), 2.
  • [41] Jones, D. R., Schonlau, M., & Welch, W. J. (1998). Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4), 455-492.
  • [42] Yu, T., & Zhu, H. (2020). Hyper-Parameter Optimization: A Review of Algorithms and Applications. arXiv preprint arXiv:2003.05689.
  • [43] Eggensperger, K., Feurer, M., Hutter, F., Bergstra, J., Snoek, J., Hoos, H., & Leyton-Brown, K. (2013, December). Towards an empirical foundation for assessing bayesian optimization of hyperparameters. In NIPS workshop on Bayesian Optimization in Theory and Practice (Vol. 10, p. 3).
  • [44] Rasmussen, C. E. (2003, February). Gaussian processes in machine learning. In Summer School on Machine Learning (pp. 63-71). Springer, Berlin, Heidelberg.
  • [45] Hutter, F., Kotthoff, L., and Vanschoren, J., (2019). Automated Machine Learning. Springer: New York, NY, USA.
  • [46] Feurer, M., & Hutter, F. (2019). Hyperparameter optimization. In Automated Machine Learning (pp. 3-33). Springer, Cham.
  • [47] Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2011, January). Sequential model-based optimization for general algorithm configuration. In International conference on learning and intelligent optimization (pp. 507-523). Springer, Berlin, Heidelberg.
  • [48] Li, L., Jamieson, K., Rostamizadeh, A., Gonina, E., Hardt, M., Recht, B., & Talwalkar, A. (2018). Massively parallel hyperparameter tuning. arXiv preprint arXiv:1810.05934.
  • [49] Thornton, C., Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2013, August). Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 847-855).
  • [50] Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2015). Efficient and robust automated machine learning. In Advances in neural information processing systems (pp. 2962-2970).
  • [51] Bergstra, J. S., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. In Advances in neural information processing systems (pp. 2546-2554).
  • [52] Sparks, E. R., Talwalkar, A., Haas, D., Franklin, M. J., Jordan, M. I., & Kraska, T. (2015, August). Automating model search for large scale machine learning. In Proceedings of the Sixth ACM Symposium on Cloud Computing (pp. 368-380).
  • [53] Zhang, Y., Bahadori, M. T., Su, H., & Sun, J. (2016, August). FLASH: fast Bayesian optimization for data analytic pipelines. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2065-2074).
  • [54] Loshchilov, I., & Hutter, F. (2016). CMA-ES for hyperparameter optimization of deep neural networks. arXiv preprint arXiv:1604.07269.
  • [55] Swersky, K., Snoek, J., & Adams, R. P. (2014). Freeze-thaw Bayesian optimization. arXiv preprint arXiv:1406.3896.
  • [56] Golovin, D., Solnik, B., Moitra, S., Kochanski, G., Karro, J., & Sculley, D. (2017, August). Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1487-1495).
  • [57] Liaw, R., Liang, E., Nishihara, R., Moritz, P., Gonzalez, J. E., & Stoica, I. (2018). Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118.
  • [58] Microsoft., (2018), Neural Network Intelligence. https://github.com/microsoft/nni#nni-released-reminder.
  • [59] Provost, F., Jensen, D., & Oates, T. (1999, August). Efficient progressive sampling. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 23-32).
  • [60] Bissuel, A. (2020). Hyper-parameter Optimization Algorithms: A Short Review. https://www. automl.org/blog_bohb/.
  • [61] Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2017). Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research, 18(1), 6765-6816.
  • [62] Falkner, S., Klein, A., & Hutter, F. (2018). BOHB: Robust and efficient hyperparameter optimization at scale. arXiv preprint arXiv:1807.01774.
  • [63] Simon, D. (2013). Evolutionary optimization algorithms. John Wiley & Sons.
  • [64] Hansen, N. (2016). The CMA evolution strategy: A tutorial. arXiv preprint arXiv:1604.00772.
  • [65] Shiffman, D., Fry, S., and Marsh, Z., (2012). The Nature of Code. (pp. 323-330).
  • [66] Yang, X. S. (2009). Firefly Algorithms Formultimodal Optimization, Proceedings of the Stochastic Algorithms: Foundations and Applications, Lecture Notes in Computing Sciences,. Springer, Sapporo, Japan. vol. 5792. (pp. 178-178).
  • [67] Krishnanand, K.N., Ghose, D. (2005). Detection of Multiple Source Locations Using a Glowworm Metaphor with Applications to Collective Robotics. In IEEE Swarm Intelligence Symposium. (pp. 84-91).
  • [68] Dorigo M., Maniezzo, V., Colorni, A.. (1991) The Ant System: An Autocatalytic Optimizing Process. Tech. Rep. No. 91- 016. Dipartimento di Elettronica, Politecnico di Milano, Italy.
  • [69] Kennedy, J., Eberhart, R. C. (1995). Particle Swarm Optimization. IEEE International Conference on Neural Networks, vol. IV, Piscataway, NJ. (pp. 1942-1948).
  • [70] Jiang, M., Yuan, D., Cheng, Y..( 2009). Improved Artificial Fish Swarm Algorithm. In Fifth International Conference on Natural Computation. (pp. 281-285).
  • [71] Karaboga, D., Akay, B.. (2009). A Survey: Algorithms Simulating Bee Swarm Intelligence. Artificial Intelligence Review, vol. 31 no., 1-4, (pp. 61-85).
  • [72] Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W. M., Donahue, J., Razavi, A., ... & Fernando, C. (2017). Population based training of neural networks. arXiv preprint arXiv:1711.09846.
  • [73] Li, A., Spyra, O., Perel, S., Dalibard, V., Jaderberg, M., Gu, C., ... & Gupta, P. (2019, July). A generalized framework for population based training. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1791-1799).
  • [74] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).
  • [75] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).
Toplam 75 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Bölüm Makaleler
Yazarlar

Sara Altun 0000-0003-2877-7105

Muhammed Fatih Talu

Yayımlanma Tarihi 30 Mart 2021
Gönderilme Tarihi 10 Temmuz 2020
Yayımlandığı Sayı Yıl 2021 Cilt: 12 Sayı: 2

Kaynak Göster

IEEE S. Altun ve M. F. Talu, “Derin Sinir Ağları için Hiperparametre Metodlarının ve Kitlerinin İncelenmesi”, DÜMF MD, c. 12, sy. 2, ss. 187–199, 2021, doi: 10.24012/dumf.767700.
DUJE tarafından yayınlanan tüm makaleler, Creative Commons Atıf 4.0 Uluslararası Lisansı ile lisanslanmıştır. Bu, orijinal eser ve kaynağın uygun şekilde belirtilmesi koşuluyla, herkesin eseri kopyalamasına, yeniden dağıtmasına, yeniden düzenlemesine, iletmesine ve uyarlamasına izin verir. 24456