Effective Exploration via Intrinsic Motivation in Reinforcement Learning

Berkay Eren; Alper Demir

doi:10.28979/jarnas.1922504

Effective Exploration via Intrinsic Motivation in Reinforcement Learning

Abstract

Reinforcement learning agents often struggle in sparse-reward environments where feedback is limited and appears only after a sequence of correct actions. In partialobservable navigation tasks, simple exploration strategies are often insufficient. This study investigates intrinsic motivation mechanisms, specifically focusing on the “Don’t Do What Doesn’t Matter” (DoWhaM) method, which rewards rare but effective actions. To address its limitations in spatial tasks, we propose Area-aware DoWhaM Adaptation (ADA). This method extends action-usefulness with spatial novelty bonuses to encourage expanding the visible area. We evaluate ADA against DoWhaM and a Count-Based baselines in various MiniGrid environments. Results indicate that ADA improves sample efficiency in the early stages of training. In dynamic environments where the layout changes in every episode, ADA significantly outperforms the Count-Based baseline and learns faster than DoWhaM. These findings suggest that combining action-usefulness with spatial novelty provides a robust heuristic for exploration in procedurally generated tasks.

Keywords

Ethical Statement

No approval from the Board of Ethics is required.

References

[1] R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, Adaptive Computation and Machine Learning, MIT Press, 1998.
[2] M. Chevalier-Boisvert, B. Dai, M. Towers, R. de Lazcano, L. Willems, S. Lahlou, S. Pal, P. S. Castro, J. Terry, Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks, Proceedings of the Conference on Neural Information Processing Systems, New Orleans, Louisiana, 2023, pp. 1–12.
[3] P. Ladosz, L. Weng, M. Kim, H. Oh, Exploration in deep reinforcement learning: A survey, Information Fusion 85, (2022) 1–22.
[4] Y. Burda, H. Edwards, A. Storkey, O. Klimov, Exploration by random network distillation, Seventh International Conference on Learning Representations, New Orleans, 2019, pp. 1–17.
[5] M. Seurin, F. Strub, P. Preux, O. Pietquin, Don’t Do What Doesn’t Matter: Intrinsic motivation with action usefulness, Internationnal Joint Conference on Artificial Intelligence, Montreal, 2021, pp. 2950–2956.
[6] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms (2017) Accessed 13 Aug 2018.
[7] R. Bellman, Dynamic programming, Science 153 (3731) (1966) 34–37
[8] L. P. Kaelbling, M. L. Littman, A. R. Cassandra, Planning and acting in partially observable stochastic domains, Artificial Intelligence 101 (1-2) (1998) 99–134.

[9] R. S. Sutton, Learning to predict by the methods of temporal differences, Machine Learning 3 (1) (1988) 9–44.
[10] L. P. Kaelbling, M. L. Littman, A. W. Moore, Reinforcement learning: A survey, Journal of Artificial Intelligence Research 4, (1996) 237–285.
[11] K. Arulkumaran, M. P. Deisenroth, M. Brundage, A. A. Bharath, Deep Reinforcement Learning: A Brief Survey, IEEE Signal Processing Magazine 34 (6) (2017) 26–38.
[12] C. J. Watkins, Learning from delayed rewards, Doctoral Dissertation King’s College (1989) University of Cambridge.
[13] V. R. Konda, J. N. Tsitsiklis, Actor-Critic Algorithms, Advances in Neural Information Processing Systems 12, 1999, pp. 1008–1014.
[14] J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, P. Moritz, Trust region policy optimization, Proceedings of the 32nd International Conference on Machine Learning, Lille, 2015, pp. 1889–1897.
[15] J. Schulman, P. Moritz, S. Levine, M. I. Jordan, P. Abbeel, High-dimensional continuous control using generalized advantage estimation, 4th International Conference on Learning Representations, San Juan, (2016) Accessed 25 Jul 2019.
[16] S. Singh, R. L. Lewis, A. G. Barto, J. Sorg, Intrinsically motivated reinforcement learning: An evolutionary perspective, IEEE Transactions on Autonomous Mental Development 2 (2) (2010) 70–82.
[17] A. G. Barto, Intrinsically Motivated Learning in Natural and Artificial Systems, Intrinsic Motivation and Reinforcement Learning, Springer, 2013.
[18] M. G. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, R. Munos, Unifying countbased exploration and intrinsic motivation, Advances in Neural Information Processing Systems 29, Barcelona, 2016, pp. 1471–1479.
[19] G. Ostrovski, M. G. Bellemare, A. van den Oord, R. Munos, Count-based exploration with neural density models, Proceedings of the 34th International Conference on Machine Learning, Sydney, 2017, pp. 2721–2730.
[20] T. D. Kulkarni, K. Narasimhan, A. Saeedi, J. Tenenbaum, Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation, Advances in Neural Information Processing Systems 29, Barcelona, 2016, pp. 3675–3683.
[21] R. Li, Z. Cai, T. Huang, W. Zhu, Anchor: The achieved goal to replace the subgoal for hierarchical reinforcement learning, Knowledge Based Systems 225 (2021) 107128.
[22] P. Sequeira, F. S. Melo, A. Paiva, Emotion-based intrinsic motivation for reinforcement learning agents, Affective Computing and Intelligent Interaction - 4th International Conference, Memphis, 2011, 2011, pp. 326–336.
[23] S. Iqbal, F. Sha, Coordinated exploration via intrinsic rewards for multi-agent reinforcement learning (2019) Accessed 03 Jun 2019.
[24] N. Jaques, A. Lazaridou, E. Hughes, C¸ . G¨ul¸cehre, P. A. Ortega, D. Strouse, J. Z. Leibo, N. de Freitas, Social influence as intrinsic motivation for multi-agent deep reinforcement learning, Proceedings of the 36th International Conference on Machine Learning, California, 2019, pp. 3040–3049.
[25] S. Hong, S. Lee, Learning cooperative intrinsic motivation in multi-agent reinforcement learning, 2021 International Conference on Information and Communication Technology Convergence, Jeju Island, 2021, pp. 1697–1699.
[26] D. Pathak, P. Agrawal, A. A. Efros, T. Darrell, Curiosity-driven exploration by self-supervised prediction, Proceedings of the 34th International Conference on Machine Learning, Sydney, 2017, pp. 2778–2787.
[27] H. Tang, R. Houthooft, D. Foote, A. Stooke, X. Chen, Y. Duan, J. Schulman, F. D. Turck, P. Abbeel, #Exploration: A study of count-based exploration for deep reinforcement learning, Advances in Neural Information Processing Systems 30, Long Beach, 2017, pp. 2753–2762.
[28] N. Savinov, A. Raichuk, D. Vincent, R. Marinier, M. Pollefeys, T. P. Lillicrap, S. Gelly, Episodic curiosity through reachability, 7th International Conference on Learning Representations, New Orleans (2019) Accessed 25 Jul 2019.
[29] A. P. Badia, P. Sprechmann, A. Vitvitskyi, Z. D. Guo, B. Piot, S. Kapturowski, O. Tieleman, M. Arjovsky, A. Pritzel, A. Bolt, C. Blundell, Never give up: Learning directed exploration strategies, 8th International Conference on Learning Representations, Addis Ababa (2020) Accessed 03 Feb 2021.
[30] B. Eysenbach, A. Gupta, J. Ibarz, S. Levine, Diversity is all you need: Learning skills without a reward function, 7th International Conference on Learning Representations, New Orleans (2019) Accessed 25 Jul 2019.
[31] K. Gregor, D. J. Rezende, D. Wierstra, Variational intrinsic control, 5th International Conference on Learning Representations, Toulon (2017) Accessed 04 Apr 2019
[32] M. Andrychowicz, D. Crow, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, P. Abbeel, W. Zaremba, Hindsight experience replay, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Long Beach, 2017, pp. 5048–5058.
[33] I. Osband, C. Blundell, A. Pritzel, B. V. Roy, Deep exploration via Bootstrapped DQN, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, Barcelona, 2016, pp. 4026–4034.
[34] T. Hester, M. Vecerík, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, D. Horgan, J. Quan, A. Sendonaris, . . . , A. Gruslys, Deep Q-learning from demonstrations, Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, 2018, pp. 3223–3230.
[35] E. Liang, R. Liaw, R. Nishihara, P. Moritz, R. Fox, K. Goldberg, J. E. Gonzalez, M. I. Jordan, I. Stoica, RLlib: Abstractions for distributed reinforcement learning, International Conference on Machine Learning (2018) Accessed 01 Apr 2026.
[36] B. Efron, R. Tibshirani, An introduction to the bootstrap, Springer, 1994.
[37] Z. Wu, E. Liang, M. Luo, S. Mika, J. E. Gonzalez, I. Stoica, RLlib flow: Distributed reinforcement learning is a dataflow problem, 35th Conference on Neural Information Processing Systems, Canada, 2021, pp. 1–12.
[38] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, . . . , S. Chintala, PyTorch: An imperative style, high-performance deep learning library (2019) Accessed 3 Dec 2019.

Details

Primary Language

English

Subjects

Reinforcement Learning

Journal Section

Research Article

Authors

Berkay Eren
0009-0000-3108-5527
Türkiye

Alper Demir ^*
0000-0003-2646-4850
Türkiye

Publication Date

June 30, 2026

Submission Date

April 3, 2026

Acceptance Date

June 16, 2026

Published in Issue

Year 2026 Volume: 12 Number: 2

DOI

https://doi.org/10.28979/jarnas.1922504

IZ

https://izlik.org/JA33MP32XF

Cite

RIS / Bibtex

APA

Eren, B., & Demir, A. (2026). Effective Exploration via Intrinsic Motivation in Reinforcement Learning. Journal of Advanced Research in Natural and Applied Sciences, 12(2), 171-192. https://doi.org/10.28979/jarnas.1922504

AMA

1.Eren B, Demir A. Effective Exploration via Intrinsic Motivation in Reinforcement Learning. JARNAS. 2026;12(2):171-192. doi:10.28979/jarnas.1922504

Chicago

Eren, Berkay, and Alper Demir. 2026. “Effective Exploration via Intrinsic Motivation in Reinforcement Learning”. Journal of Advanced Research in Natural and Applied Sciences 12 (2): 171-92. https://doi.org/10.28979/jarnas.1922504.

EndNote

Eren B, Demir A (June 1, 2026) Effective Exploration via Intrinsic Motivation in Reinforcement Learning. Journal of Advanced Research in Natural and Applied Sciences 12 2 171–192.

IEEE

[1]B. Eren and A. Demir, “Effective Exploration via Intrinsic Motivation in Reinforcement Learning”, JARNAS, vol. 12, no. 2, pp. 171–192, June 2026, doi: 10.28979/jarnas.1922504.

ISNAD

Eren, Berkay - Demir, Alper. “Effective Exploration via Intrinsic Motivation in Reinforcement Learning”. Journal of Advanced Research in Natural and Applied Sciences 12/2 (June 1, 2026): 171-192. https://doi.org/10.28979/jarnas.1922504.

JAMA

1.Eren B, Demir A. Effective Exploration via Intrinsic Motivation in Reinforcement Learning. JARNAS. 2026;12:171–192.

MLA

Eren, Berkay, and Alper Demir. “Effective Exploration via Intrinsic Motivation in Reinforcement Learning”. Journal of Advanced Research in Natural and Applied Sciences, vol. 12, no. 2, June 2026, pp. 171-92, doi:10.28979/jarnas.1922504.

Vancouver

1.Berkay Eren, Alper Demir. Effective Exploration via Intrinsic Motivation in Reinforcement Learning. JARNAS. 2026 Jun. 1;12(2):171-92. doi:10.28979/jarnas.1922504

DOAJ	EBSCO
Scilit	SOBİAD