Integrating Proximal Policy Optimization with Physically Realistic Simulation for Robust Autonomous Underwater Vehicle Control

Mahmut Mol; Ahmet Karaarslan

doi:10.54287/gujsa.1813751

Integrating Proximal Policy Optimization with Physically Realistic Simulation for Robust Autonomous Underwater Vehicle Control

Abstract

This study presents the design and implementation of a reinforcement learning (RL)-based framework for the control of an autonomous underwater vehicle (AUV) directly within Unreal Engine (UE). A high-fidelity aquatic environment was created using UE’s native Water System to simulate hydrodynamic forces and buoyancy. Unlike studies assuming continuous control, this research addresses the challenge of stabilizing an AUV subject to severe discrete 'bang-bang' hardware constraints. A parallelized Proximal Policy Optimization (PPO) algorithm was employed to synthesize adaptive control policies. Comparative analysis against tuned Proportional-Integral-Derivative (PID) baselines demonstrates that the RL agent outperforms classical methods in three key metrics: (1) in longitudinal navigation, the agent learned an emergent "pulsing" strategy—mimicking Pulse-Width Modulation (PWM)—to overcome these discrete actuation constraints, reducing steady-state error compared to the Proportional-Derivative (PD) baseline; (2) in vertical depth control, the agent autonomously learned gravity compensation, settling faster than integral-based controllers while avoiding buoyancy-induced stalling; and (3) in heading control, the agent demonstrated superior dynamic handling, completing stabilization maneuvers faster than the baseline. The key architectural innovation lies in the direct integration of UE’s Learning Agents plugin, eliminating the need for external middleware. This native integration enables real-time synchronization between simulation physics and learning processes, establishing a high-fidelity platform for developing intelligent underwater control systems.

Keywords

References

Amer, A., Álvarez-Tuñón, O., Uğurlu, H. İ., Le Fevre Sejersen, J., Brodskiy, Y., & Kayacan, E. (2023, December 5-8). UNav-Sim: A visually realistic underwater robotics simulator and synthetic data-generation framework. 2023 21st International Conference on Advanced Robotics (ICAR), 570–576. Abu Dhabi, UAE. https://doi.org/10.1109/ICAR58858.2023.10406819
Behrje, U., Amory, A., Meyer, B., & Maehle, E. (2018, June 20–21). System identification and sliding mode depth control of the micro AUV SEMBIO. Proceedings of the 50th International Symposium on Robotics (ISR 2018), 344–351. Munich, Germany.
Benjamin, M. R., Schmidt, H., Newman, P. M., & Leonard, J. J. (2010). Nested autonomy for unmanned marine vehicles with MOOS-IvP. Journal of Field Robotics, 27(5), 834–875.
Cai, L., Chang, K., & Girdhar, Y. (2025). Learning to swim: Reinforcement learning for 6-DOF control of thruster-driven autonomous underwater vehicles. 2025 IEEE International Conference on Robotics and Automation (ICRA). https://arxiv.org/abs/2410.00120
Chamusca, I. L., De Jesus Santos, F. V., Ferreira, C. V., Murari, T. B., Apolinario Junior, A. L., & Winkler, I. (2022). Evaluation of design guidelines for the development of intuitive virtual reality authoring tools: A case study with NVIDIA Omniverse. 2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), 419–424. Singapore. https://doi.org/10.1109/ISMAR-Adjunct57072.2022.00078
Eriksson, J., & Wingård, J. (2022). Improving the accuracy of FFT-based GPGPU ocean surface simulations [MSc Thesis, Chalmers University of Technology and University of Gothenburg].
Farhang, A. R., Mulcahy, B., Holden, D., Matthews, I., & Yue, Y. (2024). Humanlike behavior in a third-person shooter with imitation learning. 2024 IEEE Conference on Games (CoG), 1–4. Milan, Italy. https://doi.org/10.1109/CoG60054.2024.10645651
Fossen, T. I. (2011). Handbook of marine craft hydrodynamics and motion control. Chichester, UK: John Wiley & Sons.

Gammell, J. D., Srinivasa, S. S., & Barfoot, T. D. (2014, September 14–18). Informed RRT*: Optimal sampling-based path planning focused via direct sampling of an admissible ellipsoidal heuristic. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2997–3004. Chicago, IL, USA. https://doi.org/10.1109/IROS.2014.6942976
Hadi, B., Khosravi, A., & Sarhadi, P. (2022). Deep reinforcement learning for adaptive path planning and control of an autonomous underwater vehicle. Applied Ocean Research, 129, 103326. https://doi.org/10.1016/j.apor.2022.103326
LaValle, S. M., & Kuffner, J. J. (2001). Rapidly-exploring random trees: Progress and prospects. In B. R. Donald, K. M. Lynch, & D. Rus (Eds.), Algorithmic and computational robotics: New directions (pp. 293–308). Wellesley, MA, USA: A K Peters.
Manhães, M. M. M., Scherer, S. A., Voss, M., Douat, L. R., & Rauschenbach, T. (2016). UUV simulator: A Gazebo-based package for underwater intervention and multi-robot simulation. OCEANS 2016 MTS/IEEE Monterey, Monterey, CA, USA. https://doi.org/10.1109/OCEANS.2016.7761080
Marchel, Ł., Kot, R., Szymak, P., & Piskur, P. (2025). Model-based AUV path planning using curriculum learning and deep reinforcement learning on a simplified electronic navigation chart. Applied Sciences, 15(11), 6081. https://doi.org/10.3390/app15116081
Misko, S., Free, A., Sivashankar, S., Kluge, T., Vantsevich, V., Hirshkorn, M., Morales, A., Brascome, J. M., Rose, S., Bowen, N., Zhang, S., Ghasemi, M., Gardner, S., Fiorini, P., Maddela, M., Jayakumar, P., Gorsich, D., Manning, C., Thurau, M., … Costello, I. (2024). Real-time, closed-loop and physics-based modeling and simulation system for unmanned ground vehicles in unstructured terrain environments. In SAE Technical Paper Series (Vol. 1). National Defense Industrial Association. 2022 NDIA Michigan Chapter Ground Vehicle Systems Engineering and Technology Symposium. https://doi.org/10.4271/2024-01-3957
Potokar, E., Ashford, S., Kaess, M., & Mangelson, J. G. (2022). HoloOcean: An underwater robotics simulator. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 3040–3046. Philadelphia, PA, USA.
Prats, M., Pérez, J., Fernández, J. J., & Sanz, P. J. (2012). An open source tool for simulation and supervision of underwater intervention missions. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2577–2582. Vilamoura-Algarve, Portugal. https://doi.org/10.1109/IROS.2012.6385788
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. https://doi.org/10.48550/arXiv.1707.06347
Shah, S., Dey, D., Lovett, C., & Kapoor, A. (2017). AirSim: High-fidelity visual and physical simulation for autonomous vehicles. https://doi.org/10.48550/arXiv.1705.05065
Song, J., Ma, H., Bagoren, O., Sethuraman, A. V., Zhang, Y., & Skinner, K. A. (2025). OceanSim: A GPU-accelerated underwater robot perception simulation framework. https://doi.org/10.48550/arXiv.2503.01074
Su, C., Li, Y., Wang, H., Xia, Y., Li, J., Wan, G., & Chen, Y. (2026). AUV path planning in complex 3D underwater environments based on an improved TD3 algorithm. Ocean Engineering, 345, 123688. https://doi.org/10.1016/j.oceaneng.2025.123688
Sørensen, J. V., Ma, Z., & Jørgensen, B. N. (2022). Potentials of game engines for wind power digital twin development: An investigation of the Unreal Engine. Energy Informatics, 5(Suppl 4), 39. https://doi.org/10.1186/s42162-022-00227-2
Visai, G. (2024). Cinematic photoreal environments in Unreal Engine 5: Create captivating worlds and unleash the power of cinematic tools without coding. Birmingham, UK: Packt Publishing.
Wang, M., Juan, R., Li, Z., & Gao, Z. (2025). Formation control and intention compensating of AUVs using multi-agent reinforcement learning and predict network. Ocean Engineering, 342, 122854. https://doi.org/10.1016/j.oceaneng.2025.122854
Wynn, R. B., Huvenne, V. A. I., Le Bas, T. P., Murton, B. J., Connelly, D. P., Bett, B. J., Ruhl, H. A., Morris, K. J., Peakall, J., Parsons, D. R., Sumner, E. J., Darby, S. E., Dorrell, R. M., & Hunt, J. E. (2014). Autonomous underwater vehicles (AUVs): Their past, present and future contributions to the advancement of marine geoscience. Marine Geology, 352, 451–468. https://doi.org/10.1016/j.margeo.2014.03.012
Yu, X., Sun, Y., Wang, X., & Zhang, G. (2021). End-to-end AUV motion planning method based on soft actor-critic. Sensors, 21(17), 5893. https://doi.org/10.3390/s21175893
Zhang, A., Wang, W., Bi, W., & Huang, Z. (2024). A path planning method based on deep reinforcement learning for AUV in complex marine environment. Ocean Engineering, 313, 119354. https://doi.org/10.1016/j.oceaneng.2024.119354
Zhang, B., & Li, C. (2024). The optimization and application research of the RRT-APF-based path planning algorithm. Electronics, 13(24), 4963. https://doi.org/10.3390/electronics13244963
Zhang, X., Fan, Y., Liu, H., Zhang, Y., & Sha, Q. (2023). Design and implementation of autonomous underwater vehicle simulation system based on MOOS and Unreal Engine. Electronics, 12(14), 3107. https://doi.org/10.3390/electronics12143107
Zhou, Z., Song, J., Xie, X., Shu, Z., Ma, L., Liu, D., Yin, J., & See, S. (2024). Towards building AI-CPS with NVIDIA Isaac Sim: An industrial benchmark and case study for robotics manipulation. Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP ’24), 263–274. Lisbon, Portugal. https://doi.org/10.1145/3639477.3639740

Details

Primary Language

English

Subjects

Autonomous Agents and Multiagent Systems

Journal Section

Research Article

Authors

Mahmut Mol ^*
0000-0001-9983-861X
Türkiye

Ahmet Karaarslan
0000-0001-6475-4539
Türkiye

Publication Date

March 31, 2026

Submission Date

October 30, 2025

Acceptance Date

March 24, 2026

Published in Issue

Year 2026 Volume: 13 Number: 1

DOI

https://doi.org/10.54287/gujsa.1813751

IZ

https://izlik.org/JA58JP52YD

Cite

RIS / Bibtex

APA

Mol, M., & Karaarslan, A. (2026). Integrating Proximal Policy Optimization with Physically Realistic Simulation for Robust Autonomous Underwater Vehicle Control. Gazi University Journal of Science Part A: Engineering and Innovation, 13(1), 165-199. https://doi.org/10.54287/gujsa.1813751

AMA

1.Mol M, Karaarslan A. Integrating Proximal Policy Optimization with Physically Realistic Simulation for Robust Autonomous Underwater Vehicle Control. GU J Sci, Part A. 2026;13(1):165-199. doi:10.54287/gujsa.1813751

Chicago

Mol, Mahmut, and Ahmet Karaarslan. 2026. “Integrating Proximal Policy Optimization With Physically Realistic Simulation for Robust Autonomous Underwater Vehicle Control”. Gazi University Journal of Science Part A: Engineering and Innovation 13 (1): 165-99. https://doi.org/10.54287/gujsa.1813751.

EndNote

Mol M, Karaarslan A (March 1, 2026) Integrating Proximal Policy Optimization with Physically Realistic Simulation for Robust Autonomous Underwater Vehicle Control. Gazi University Journal of Science Part A: Engineering and Innovation 13 1 165–199.

IEEE

[1]M. Mol and A. Karaarslan, “Integrating Proximal Policy Optimization with Physically Realistic Simulation for Robust Autonomous Underwater Vehicle Control”, GU J Sci, Part A, vol. 13, no. 1, pp. 165–199, Mar. 2026, doi: 10.54287/gujsa.1813751.

ISNAD

Mol, Mahmut - Karaarslan, Ahmet. “Integrating Proximal Policy Optimization With Physically Realistic Simulation for Robust Autonomous Underwater Vehicle Control”. Gazi University Journal of Science Part A: Engineering and Innovation 13/1 (March 1, 2026): 165-199. https://doi.org/10.54287/gujsa.1813751.

JAMA

1.Mol M, Karaarslan A. Integrating Proximal Policy Optimization with Physically Realistic Simulation for Robust Autonomous Underwater Vehicle Control. GU J Sci, Part A. 2026;13:165–199.

MLA

Mol, Mahmut, and Ahmet Karaarslan. “Integrating Proximal Policy Optimization With Physically Realistic Simulation for Robust Autonomous Underwater Vehicle Control”. Gazi University Journal of Science Part A: Engineering and Innovation, vol. 13, no. 1, Mar. 2026, pp. 165-99, doi:10.54287/gujsa.1813751.

Vancouver

1.Mahmut Mol, Ahmet Karaarslan. Integrating Proximal Policy Optimization with Physically Realistic Simulation for Robust Autonomous Underwater Vehicle Control. GU J Sci, Part A. 2026 Mar. 1;13(1):165-99. doi:10.54287/gujsa.1813751