Setting Reward Function of Sensor Based DDQN Model

Mehmet Gökçay Kabataş; Sevinç İlhan Omurca

doi:10.31590/ejosat.1008702

Araştırma Makalesi

Sensör Tabanlı DDQN Modeline Ödül Fonksiyonu Belirleme

Yıl 2021, , 539 - 544, 30.11.2021

Mehmet Gökçay Kabataş , Sevinç İlhan Omurca

https://doi.org/10.31590/ejosat.1008702

Öz

Bu çalışmada DDQN Modeli ile Pekiştirmeli Öğrenme içerisinde 100 engeli/nesneyi geçmek üzere eğitilen ajanın uygun ödül fonksiyonunun belirlenmesi amaçlanmaktadır. Ajanı eğitmek için çevre alt problemlere bölümüştür. Alt problemler için çeşitli kurallar ve farklı ödül fonksiyonları tanımlanmıştır. Eğitim için gNet adında geliştirilmiş mini derin öğrenme kütüphanesi kullanılmıştır.

Anahtar Kelimeler

yapay zeka, derin öğrenme, DDQN, gNet, ödül fonksiyonu, sensor tabanlı

Kaynakça

R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, MIT Press, 1998.
E. Ratner, D. Hadfield-Menell and A. D. Dragan, “Simplifying Reward Design through Divide-and-Conquer,” CoRR, vol. abs/1806.02501, 2018, [Online] http://arxiv.org/abs/1806.02501.
Z. Hu, K. Wan, X. Gao, and Y. Zhai, “A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters,” Mathematical Problems in Engineering, vol. 2019, pp. 1-10, DOI: 10.1155/2019/7619483.
C. J. C. H. Watkins and P. Dayan, “Q-Learning,” Machine Learning, vol. 8, 1992, pp. 279-292.
R. E. Bellmann and S. E. Dreyfus, Applied Dynamic Programming, Princeton, NJ, USA: Princeton University Press, 1962.
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. Riedmiller, "Playing Atari with Deep Reinforcement Learning," CoRR, vol. abs/1312.5602, 2013, [Online] https://arxiv.org/abs/1312.5602
L. Lin, “Reinforcement Learning for Robots Using Neural Networks,” Ph.D. dissertation, School of Computer Science, Carnegie Mellon Univ., Pittsburgh, PA, USA, 1993.
H. van Hasselt, A. Guez, D. Silver, "Deep Reinforcement Learning with Double Q-Learning," in Proc. of the AAAI Conference on Artificial Intelligence, vol. 30, No.1, 2016, [Online] https://arxiv.org/abs/1509.06461
gNet, Avalaible: https://github.com/MGokcayK/gNet.
D. P. Kingma, J. Ba, (2014, 12), Adam: A Method for Stochastic Optimization in International Conference on Learning Representations, [Online] https://arxiv.org/abs/1412.6980.

Setting Reward Function of Sensor Based DDQN Model

Yıl 2021, , 539 - 544, 30.11.2021

Mehmet Gökçay Kabataş , Sevinç İlhan Omurca

https://doi.org/10.31590/ejosat.1008702

Öz

In this study, it is aimed to determine the appropriate reward function of the agent which trained to pass 100 obstacles/objects in Reinforcement Learning (RL) with Double Deep Q Network (DDQN) model. To train the agent, environment is split into sub problems. Several rules and different reward functions defined for the sub problems. A developed mini deep learning library which is called gNet is used for the training.

Anahtar Kelimeler

artificial intelligence, deep learning, DDQN, gNet, reward function, sensor based

Kaynakça

R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, MIT Press, 1998.
E. Ratner, D. Hadfield-Menell and A. D. Dragan, “Simplifying Reward Design through Divide-and-Conquer,” CoRR, vol. abs/1806.02501, 2018, [Online] http://arxiv.org/abs/1806.02501.
Z. Hu, K. Wan, X. Gao, and Y. Zhai, “A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters,” Mathematical Problems in Engineering, vol. 2019, pp. 1-10, DOI: 10.1155/2019/7619483.
C. J. C. H. Watkins and P. Dayan, “Q-Learning,” Machine Learning, vol. 8, 1992, pp. 279-292.
R. E. Bellmann and S. E. Dreyfus, Applied Dynamic Programming, Princeton, NJ, USA: Princeton University Press, 1962.
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. Riedmiller, "Playing Atari with Deep Reinforcement Learning," CoRR, vol. abs/1312.5602, 2013, [Online] https://arxiv.org/abs/1312.5602
L. Lin, “Reinforcement Learning for Robots Using Neural Networks,” Ph.D. dissertation, School of Computer Science, Carnegie Mellon Univ., Pittsburgh, PA, USA, 1993.
H. van Hasselt, A. Guez, D. Silver, "Deep Reinforcement Learning with Double Q-Learning," in Proc. of the AAAI Conference on Artificial Intelligence, vol. 30, No.1, 2016, [Online] https://arxiv.org/abs/1509.06461
gNet, Avalaible: https://github.com/MGokcayK/gNet.
D. P. Kingma, J. Ba, (2014, 12), Adam: A Method for Stochastic Optimization in International Conference on Learning Representations, [Online] https://arxiv.org/abs/1412.6980.

Toplam 10 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Mühendislik
Bölüm	Makaleler
Yazarlar	Mehmet Gökçay Kabataş 0000-0002-9628-4890 Sevinç İlhan Omurca 0000-0003-1214-9235
Yayımlanma Tarihi	30 Kasım 2021
Yayımlandığı Sayı	Yıl 2021

Kaynak Göster

APA	Kabataş, M. G., & İlhan Omurca, S. (2021). Setting Reward Function of Sensor Based DDQN Model. Avrupa Bilim Ve Teknoloji Dergisi(28), 539-544. https://doi.org/10.31590/ejosat.1008702