A Hierarchical Reinforcement Learning Framework for UAV Path Planning in Tactical Environments

Mahmut Nedim Alpdemir

doi:10.55525/tjst.1219845

EN

A Hierarchical Reinforcement Learning Framework for UAV Path Planning in Tactical Environments

Abstract

Tactical UAV path planning under radar threat using reinforcement learning involves particular challenges ranging from modeling related difficulties to sparse feedback problem. Learning goal-directed behavior with sparse feedback from complex environments is a fundamental challenge for reinforcement learning algorithms. In this paper we extend our previous work in this area to provide a solution to the problem setting stated above, using Hierarchical Reinforcement Learning (HRL) in a novel way that involves a meta controller for higher level goal assignment and a controller that determines the lower-level actions of the agent. Our meta controller is based on a regression model trained using a state transition scheme that defines the evolution of goal designation, whereas our lower-level controller is based on a Deep Q Network (DQN) and is trained via reinforcement learning iterations. This two-layer framework ensures that an optimal plan for a complex path, organized as multiple goals, is achieved gradually, through piecewise assignment of sub-goals, and thus as a result of a staged, efficient and rigorous procedure.

Keywords

References

Abell DC, Caraway III WD. A method for the determination of target aspect angle with respect to a radar, July, 1998.
Bertsekas DP. Reinforcement Learning and Optimal Control. Athena Scientific, Belmont, Massachusetts.
Bouhamed O, Ghazzai H, Besbes H, Massoud Y. Autonomous uav navigation: A ddpg-based deep reinforcement learning approach, 2020.
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
Challita U, Saad W, Bettstetter C. Deep reinforcement learning for interference-aware path planning of cellular-connected uavs. In 2018 IEEE International Conference on Communications (ICC), 2018, pp. 1–7.
Fujita Y, Nagarajan P, Kataoka T, Ishikawa T. Chainerrl: A deep reinforcement learning library. Journal of Machine Learning Research 2021; 22(77): 1–14.
Garcia F, Rachelson E. Markov Decision Processes, pp. 1–38. John Wiley and Sons, Ltd, 2013.
Gosavi A. Control Optimization with Reinforcement Learning pp. 197–268. Springer US, Boston, MA, 2015.

Hare J. Dealing with sparse rewards in reinforcement learning. CoRR, abs/1910.09281, 2019.
Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou JP, Leibo JZ, Gruslys A. Learning from demonstrations for real world reinforcement learning. CoRR, abs/1704.03732, 2017.
Inanc T, Muezzinoglu MK, Misovec K, Murray RM. Framework for low-observable trajectory generation in presence of multiple radars. Journal of Guidance Control and Dynamics 2008; 31(6):1740–1749.
Pierre T, Kabamba, Semyon M, Meerkov, Frederick H. Zeitz. Optimal path planning for unmanned combat aerial vehicles to defeat radar tracking. Journal of Guidance Control Dynamics 2006; 29(2):279–288.
Kang EW. Radar System Analysis, Design and Simulation. ARTECH HOUSE, INC. 2008.
Kingma DP, Ba J. Adam: A method for stochastic optimization, 2017.
Aristotelis L, Anestis F, Ioannis V. Deep reinforcement learning: A state-of-the-art walkthrough. The Journal of Artificial Intelligence Research 2020; 69: 1421–1471.
Le TP, Vien NA, Chung T. A deep hierarchical reinforcement learning algorithm in partially observable markov decision processes. IEEE Access 2018; 6:49089–49102.
Jeong-Won L, Bruce W, Kelly C. Path Planning of Unmanned Aerial Vehicles in a Dynamic Environment.
Mahafza BR. Radar Systems Analysis and Design Using Matlab. CRC Press, third edition, 2013.
Mes MRK, Rivera AP. Approximate Dynamic Programming by Practical Examples, Springer International Publishing, Cham. pp. 63–101.
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature 2015; 518(7540): 529–533.
Pelosi M, Kopp C, Brown M. Range-limited uav trajectory using terrain masking under radar detection risk. Appl Artif Intell 2012; 26(8): 743–759.
Pham HX, La HM, Feil-Seifer D, Nguyen LV. Autonomous UAV navigation using reinforcement learning. CoRR, abs/1801.05086, 2018.
Qu C, Gai W, Zhong M, Zhang J. A novel reinforcement learning based grey wolf optimizer algorithm for unmanned aerial vehicles (uavs) path planning. Applied Soft Computing 2020; 89: 106099.
Benjamin R. A tour of reinforcement learning: The view from continuous control, Annu Rev Control Robot Auton Syst 2019; 2(1): 253–279.
Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay, 2015. cite arxiv:1511.05952Comment: Published at ICLR 2016.
Skolink MI. Radar Handbook. McGraw-Hill, second edition, 1990.
Sutton RS, Barto AG, Williams R J. Reinforcement learning is direct adaptive optimal control. IEEE Control Systems Magazine 1992;12(2):19–22.
Sutton RS, Barto AG. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA, 2018.
Swerling P. Probability of Detection for Fluctuating Targets. RAND Corporation, Santa Monica, CA, 1954.
Mirco T, Harald B, Richard N, David G, Marco C. Uav path planning using global and local map information with deep reinforcement learning, 2020.
Martijn van Otterlo and Marco Wiering. Reinforcement Learning and Markov Decision Processes, chapter 1, pages 3–42. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.
Chao Y, Xiaojia X, Chang W. Towards real-time path planning through deep reinforcement learning for a uav in dynamic environments. J Intell Robot Syst 2020; 98(2): 297–309.
Michael Z, Stan U, Robert M. Aircraft routing under the risk of detection. Naval Research Logistics 2006; 53(8):728–747.
Frederick H. Zeitz. Ucav path planning in the pesence of radar-guided surface-to-air missile threats, Phd thesis, University of Michigan, 2005.
Weiwei Z, Wei W, Nengcheng C, Chao W. Efficient uav path planning with multiconstraints in a 3d large battle field environment. Math Probl Eng 2014:597092.
Zhe Z, Jian W, Jiyang D, Cheng H. Rapid penetration path planning method for stealth uav in complex environment with bb threats Int J Aerosp Eng 2020:8896357.
Alpdemir MN. Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput Applic 2022; 34: 5649–5664.
AlMahamid F, Grolinger K. Autonomous Unmanned Aerial Vehicle navigation using Reinforcement Learning: A systematic review Eng Appl Artif Intell 2022; 115: 105321 .
Parr R. Hierarchical control and learning for Markov decision processes, Ph.D. Thesis, University of California at Berkeley, 1998.
Parr R, Russell. Reinforcement learning with hierarchies of machines, in: Advances in Neural Information Processing Systems 10, MIT Press, Cambridge, MA, 1998, pp. 1043–1049.
Barto AG, Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning Discrete Event Dyn Syst 2003; 13: 341–379.
Hutsebaut-Buysse M, Mets K, Latré S. Hierarchical Reinforcement Learning: A Survey and Open Research Challenges, Mach Learn Knowl Extr 2022; 4(1): 172-221.
Pateria S, Subagdja B, Tan A, Quek C. Hierarchical Reinforcement Learning: A Comprehensive Survey. ACM Comput Surv 2022; 54(5):35.
Cheng Y, Li D, Wong WE, Zhao M, Mo D. Multi-UAV Collaborative Path Planning using Hierarchical Reinforcement Learning and Simulated Annealing J Int J Performability Eng 2022;18(7): 463-474.
Qin Z, Zhang X, Zhang X, Lu B, Liu Z, Guo L. The UAV Trajectory Optimization for Data Collection from Time-Constrained IoT Devices: A Hierarchical Deep Q-Network Approach. Applied Sciences. 2022; 12(5): 2546.
Li B, Wu Y, Li G. Hierarchical reinforcement learning guidance with threat avoidance, Journal of Systems Engineering and Electronics 2022; 33(5): 1173-1185.
Sutton RS, Precup D, Singh S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Artif Intell 1992; 112: 181–211.
Dietterich TG. Hierarchical reinforcement learning with the MaxQ value function decomposition. J Artif Intell Res 2000; 13: 227–303.
Dayan P, Hinton GE. Feudal reinforcement learning. Advances in Neural Information Processing Systems. Morgan-Kaufmann 1993; 5: 271–278.
Qin Z, Zhang X, Zhang X, Lu B, Liu Z, Guo L. The UAV Trajectory Optimization for Data Collection from Time-Constrained IoT Devices: A Hierarchical Deep Q-Network Approach. Applied Sciences 2022; 12(5):2546.
Hosmer DW, Lemeshow S. Applied Logistic Regression, John Wiley & Sons, Inc., Second Edition, 2000.

Details

Primary Language

English

Subjects

Engineering

Journal Section

Research Article

Authors

Mahmut Nedim Alpdemir ^*
0000-0001-6411-1453
Türkiye

Publication Date

March 29, 2023

Submission Date

December 16, 2022

Acceptance Date

March 21, 2023

Published in Issue

Year 2023 Volume: 18 Number: 1

DOI

https://doi.org/10.55525/tjst.1219845

IZ

https://izlik.org/JA56HS87BB

Cite

RIS / Bibtex

APA

Alpdemir, M. N. (2023). A Hierarchical Reinforcement Learning Framework for UAV Path Planning in Tactical Environments. Turkish Journal of Science and Technology, 18(1), 243-259. https://doi.org/10.55525/tjst.1219845

AMA

1.Alpdemir MN. A Hierarchical Reinforcement Learning Framework for UAV Path Planning in Tactical Environments. TJST. 2023;18(1):243-259. doi:10.55525/tjst.1219845

Chicago

Alpdemir, Mahmut Nedim. 2023. “A Hierarchical Reinforcement Learning Framework for UAV Path Planning in Tactical Environments”. Turkish Journal of Science and Technology 18 (1): 243-59. https://doi.org/10.55525/tjst.1219845.

EndNote

Alpdemir MN (March 1, 2023) A Hierarchical Reinforcement Learning Framework for UAV Path Planning in Tactical Environments. Turkish Journal of Science and Technology 18 1 243–259.

IEEE

[1]M. N. Alpdemir, “A Hierarchical Reinforcement Learning Framework for UAV Path Planning in Tactical Environments”, TJST, vol. 18, no. 1, pp. 243–259, Mar. 2023, doi: 10.55525/tjst.1219845.

ISNAD

Alpdemir, Mahmut Nedim. “A Hierarchical Reinforcement Learning Framework for UAV Path Planning in Tactical Environments”. Turkish Journal of Science and Technology 18/1 (March 1, 2023): 243-259. https://doi.org/10.55525/tjst.1219845.

JAMA

1.Alpdemir MN. A Hierarchical Reinforcement Learning Framework for UAV Path Planning in Tactical Environments. TJST. 2023;18:243–259.

MLA

Alpdemir, Mahmut Nedim. “A Hierarchical Reinforcement Learning Framework for UAV Path Planning in Tactical Environments”. Turkish Journal of Science and Technology, vol. 18, no. 1, Mar. 2023, pp. 243-59, doi:10.55525/tjst.1219845.

Vancouver

1.Mahmut Nedim Alpdemir. A Hierarchical Reinforcement Learning Framework for UAV Path Planning in Tactical Environments. TJST. 2023 Mar. 1;18(1):243-59. doi:10.55525/tjst.1219845

Cited By

Generating Function Reallocation to Handle Contingencies in Human–Robot Teaming Missions: The Cases in Lunar Surface Transportation

Applied Sciences

https://doi.org/10.3390/app13137506

A Review of the Application of Hierarchical Reinforcement Learning in the Field of Drones

Artificial Intelligence and Robotics Research

https://doi.org/10.12677/AIRR.2024.131008

Comparative Evaluation of Reinforcement Learning Algorithms for Multi-Agent Unmanned Aerial Vehicle Path Planning in 2D and 3D Environments

Drones

https://doi.org/10.3390/drones9060438