Improving Sample Efficiency of Reinforcement Learning Control Using Autoencoders

Burak Er; Mustafa Doğan

Araştırma Makalesi

Yıl 2024, Cilt: 1 Sayı: 1, 39 - 48, 20.07.2024

Burak Er , Mustafa Doğan

Öz

Kaynakça

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Humanlevel control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
D. Silver, J. Schrittwieser, K. Simonyan, et al., “Mastering the game of go without human knowledge,” nature, vol. 550, no. 7676, pp. 354–359, 2017.
G. Dulac-Arnold, D. Mankowitz, and T. Hester, Challenges of real-world reinforcement learning, 2019. arXiv: 1904.12901 [cs.LG].
R. S. Sutton, “Dyna, an integrated architecture for learning, planning, and reacting,” ACM Sigart Bulletin, vol. 2, no. 4, pp. 160–163, 1991.
K. Chua, R. Calandra, R. McAllister, and S. Levine, “Deep reinforcement learning in a handful of trials using probabilistic dynamics models,” Advances in neural information processing systems, vol. 31, 2018.
R. Agarwal, C. Liang, D. Schuurmans, and M. Norouzi, “Learning to generalize from sparse and underspecified rewards,” in International conference on machine learning, PMLR, 2019, pp. 130–140.
T. Nguyen, T. M. Luu, T. Vu, and C. D. Yoo, “Sample-efficient reinforcement learning representation learning with curiosity contrastive forward dynamics model,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2021, pp. 3471–3477.
N. Botteghi, M. Poel, and C. Brune, Unsupervised representation learning in deep reinforcement learning: A review, 2022. arXiv: 2208.14226 [cs.LG].
L. Kocsis and C. Szepesvári, “Bandit based montecarlo planning,” in European conference on machine learning, Springer, 2006, pp. 282–293.
P.-Y. Oudeyer and F. Kaplan, “What is intrinsic motivation? a typology of computational approaches,” Frontiers in neurorobotics, vol. 1, p. 108, 2007.
Y. Burda, H. Edwards, A. Storkey, and O. Klimov, “Exploration by random network distillation,” arXiv preprint arXiv:1810.12894, 2018.
A. A. Rusu, N. C. Rabinowitz, G. Desjardins, et al., “Progressive neural networks,” arXiv preprint arXiv:1606.04671, 2016.
C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in International conference on machine learning, PMLR, 2017, pp. 1126–1135.
A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning algorithms,” arXiv preprint arXiv:1803.02999, 2018.
T. Hester, M. Vecerik, O. Pietquin, et al., “Deep qlearning from demonstrations,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, 2018.
A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative q-learning for offline reinforcement learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 1179–1191, 2020.
M. Laskin, K. Lee, A. Stooke, L. Pinto, P. Abbeel, and A. Srinivas, “Reinforcement learning with augmented data,” Advances in neural information processing systems, vol. 33, pp. 19 884–19 895, 2020.
M. Watter, J. Springenberg, J. Boedecker, and M. Riedmiller, “Embed to control: A locally linear latent dynamics model for control from raw images,” Advances in neural information processing systems, vol. 28, 2015.
D. Ha and J. Schmidhuber, “World models,” arXiv preprint arXiv:1803.10122, 2018.
V. Mnih, K. Kavukcuoglu, D. Silver, et al., Playing atari with deep reinforcement learning, 2013. arXiv: 1312. 5602 [cs.LG].
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.
S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, “Contractive auto-encoders: Explicit invariance during feature extraction,” in Proceedings of the 28th international conference on international conference on machine learning, 2011, pp. 833–840.
O. Gym, Openai gym, https://www.gymlibrary. dev.
F.-q. Chen, Y. Wu, G.-d. Zhao, J.-m. Zhang, M. Zhu, and J. Bai, Contractive de-noising auto-encoder, 2014. arXiv: 1305.4076 [cs.LG].

Improving Sample Efficiency of Reinforcement Learning Control Using Autoencoders

Yıl 2024, Cilt: 1 Sayı: 1, 39 - 48, 20.07.2024

Burak Er , Mustafa Doğan

Öz

This study presents a novel approach for improving the sample efficiency of reinforcement learning (RL) control of dynamic systems by utilizing autoencoders. The main objective of this research is to investigate the effectiveness of autoencoders in enhancing the learning process and improving the resulting policies in RL control problems. In literature most applications use only autoencoder’s latent space while learning. This approach can cause loss of information, difficulty in interpreting latent space, difficulty in handling dynamic environments and outdated representation. In this study, proposed novel approach overcomes these problems and enhances sample efficiency using both states and their latent space while learning. The methodology consists of two main steps. First, a denoising-contractive autoencoder is developed and implemented for RL control problems, with a specific focus on its applicability to state representation and feature extraction. The second step involves training a Deep Reinforcement Learning algorithm using the augmented states generated by the autoencoder. The algorithm is compared against a baseline Deep Q-Network (DQN) algorithm in the LunarLander environment, where observations from the environment are subject to Gaussian noise.

Anahtar Kelimeler

Reinforcement Learning, Autoencoders, Control

Kaynakça

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Humanlevel control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
D. Silver, J. Schrittwieser, K. Simonyan, et al., “Mastering the game of go without human knowledge,” nature, vol. 550, no. 7676, pp. 354–359, 2017.
G. Dulac-Arnold, D. Mankowitz, and T. Hester, Challenges of real-world reinforcement learning, 2019. arXiv: 1904.12901 [cs.LG].
R. S. Sutton, “Dyna, an integrated architecture for learning, planning, and reacting,” ACM Sigart Bulletin, vol. 2, no. 4, pp. 160–163, 1991.
K. Chua, R. Calandra, R. McAllister, and S. Levine, “Deep reinforcement learning in a handful of trials using probabilistic dynamics models,” Advances in neural information processing systems, vol. 31, 2018.
R. Agarwal, C. Liang, D. Schuurmans, and M. Norouzi, “Learning to generalize from sparse and underspecified rewards,” in International conference on machine learning, PMLR, 2019, pp. 130–140.
T. Nguyen, T. M. Luu, T. Vu, and C. D. Yoo, “Sample-efficient reinforcement learning representation learning with curiosity contrastive forward dynamics model,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2021, pp. 3471–3477.
N. Botteghi, M. Poel, and C. Brune, Unsupervised representation learning in deep reinforcement learning: A review, 2022. arXiv: 2208.14226 [cs.LG].
L. Kocsis and C. Szepesvári, “Bandit based montecarlo planning,” in European conference on machine learning, Springer, 2006, pp. 282–293.
P.-Y. Oudeyer and F. Kaplan, “What is intrinsic motivation? a typology of computational approaches,” Frontiers in neurorobotics, vol. 1, p. 108, 2007.
Y. Burda, H. Edwards, A. Storkey, and O. Klimov, “Exploration by random network distillation,” arXiv preprint arXiv:1810.12894, 2018.
A. A. Rusu, N. C. Rabinowitz, G. Desjardins, et al., “Progressive neural networks,” arXiv preprint arXiv:1606.04671, 2016.
C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in International conference on machine learning, PMLR, 2017, pp. 1126–1135.
A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning algorithms,” arXiv preprint arXiv:1803.02999, 2018.
T. Hester, M. Vecerik, O. Pietquin, et al., “Deep qlearning from demonstrations,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, 2018.
A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative q-learning for offline reinforcement learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 1179–1191, 2020.
M. Laskin, K. Lee, A. Stooke, L. Pinto, P. Abbeel, and A. Srinivas, “Reinforcement learning with augmented data,” Advances in neural information processing systems, vol. 33, pp. 19 884–19 895, 2020.
M. Watter, J. Springenberg, J. Boedecker, and M. Riedmiller, “Embed to control: A locally linear latent dynamics model for control from raw images,” Advances in neural information processing systems, vol. 28, 2015.
D. Ha and J. Schmidhuber, “World models,” arXiv preprint arXiv:1803.10122, 2018.
V. Mnih, K. Kavukcuoglu, D. Silver, et al., Playing atari with deep reinforcement learning, 2013. arXiv: 1312. 5602 [cs.LG].
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.
S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, “Contractive auto-encoders: Explicit invariance during feature extraction,” in Proceedings of the 28th international conference on international conference on machine learning, 2011, pp. 833–840.
O. Gym, Openai gym, https://www.gymlibrary. dev.
F.-q. Chen, Y. Wu, G.-d. Zhao, J.-m. Zhang, M. Zhu, and J. Bai, Contractive de-noising auto-encoder, 2014. arXiv: 1305.4076 [cs.LG].

Toplam 25 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Yapay Zeka (Diğer), Kontrol Mühendisliği, Mekatronik ve Robotik (Diğer)
Bölüm	Araştırma Makaleleri
Yazarlar	Burak Er 0009-0000-7540-995X Mustafa Doğan 0000-0001-5215-8887
Yayımlanma Tarihi	20 Temmuz 2024
Gönderilme Tarihi	9 Mayıs 2024
Kabul Tarihi	15 Mayıs 2024
Yayımlandığı Sayı	Yıl 2024 Cilt: 1 Sayı: 1

Kaynak Göster

IEEE	B. Er ve M. Doğan, “Improving Sample Efficiency of Reinforcement Learning Control Using Autoencoders”, ITU Computer Science AI and Robotics, c. 1, sy. 1, ss. 39–48, 2024.

Makale Dosyaları

Tam Metin

ITU Computer Science AI and Robotics