Research Article
BibTex RIS Cite

A Hybrid Reinforcement Learning Approach for Cargo Delivery by Autonomous Drone

Year 2025, Volume: 4 Issue: 3, 580 - 603, 20.10.2025
https://doi.org/10.62520/fujece.1652790

Abstract

The use of drones, particularly in the transportation sector and cargo delivery, is among the challenging and limited issues that attract significant attention and focus. In this study, a drone operates in a simulation environment created with Unreal Engine software, operating from the center of the map without any external information, not even route information, and delivers cargo completely autonomously. The drone’s missions include overcoming obstacles, remaining unaffected by weather conditions, finding the cargo vehicle, and delivering the cargo to its intended recipient. Three different algorithms, together with RGB and depth cameras, were used for cargo transportation and navigation purposes in an autonomously moving drone. Six different combinations were created, and comparisons were made across a variety of variables. Each combination was trained for 150,000 steps and evaluated against predetermined metrics. The drone was trained using reinforcement learning algorithms such as DQN, PPO, and hybrid Joint-DQN algorithms, and the LSTM algorithm was also used for memory. These algorithms were tested and compared in the simulation environment. Additionally, RGB and depth cameras were integrated into the drone, and each algorithm was run and evaluated separately using the RGB and depth cameras. In the system, the drone earns positive points as it moves toward the target and receives negative points when it moves in the opposite direction. If the drone crashes into an obstacle, the simulation restarts. The results showed that the algorithms first learned to overcome obstacles and then found the correct path. Given sufficient learning time, the drone successfully completed its mission. Furthermore, when the models were evaluated in terms of performance, the DQN-RGB model was identified as the fastest learning model, with the PPO algorithms lagging behind all other models. As a result, it was noted that although the proposed “Joint” layer slows down the learning rate, it produces a more stable and efficient model in the long run.

Ethical Statement

There is no need for an ethics committee approval in the prepared article. There is no conflict of interest with any person/institution in the prepared article.

References

  • F. Wang, X. Zhu, Z. Zhou, and Y. Tang, “Deep-reinforcement-learning-based UAV autonomous navigation and collision avoidance in unknown environments,” Chin. J. Aeronaut., vol. 37, no. 3, pp. 237–257, 2024.
  • J. Jagannath, A. Jagannath, S. Furman, and T. Gwin, “Deep learning and reinforcement learning for autonomous unmanned aerial systems: Roadmap for theory to deployment,” in Deep Learning for Unmanned Systems, A. Koubaa and A. T. Azar, Eds. Cham, Switzerland: Springer, 2021, pp. 25–82.
  • A. Haque, N.-U.-R. Chowdhury, and M. S. Hossen, “Exploring the benefits of reinforcement learning for autonomous drone navigation and control,” Int. J. Adv. Netw. Appl., vol. 15, no. 1, pp. 5808–5814, 2023.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, pp. 84–90, May 2017.
  • C. Chen, Y. Zhang, Q. Lv, S. Wei, X. Wang, and X. Sun, “RRNet: A hybrid detector for object detection in drone-captured images,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshops (ICCVW), Seoul, South Korea, 2019, pp. 100–108.
  • D. Floreano, and R. J. Wood, “Science, technology and the future of small autonomous drones,” Nature, vol. 521, pp. 460–466, 2015.
  • D. Abel et al., “Expressing non-Markov reward to a Markov agent,” Comput. Sci., 2022, Corpus ID: 253115270.
  • L. Pawel, W. Lilian, K. Minwoo, and O. Hyondong, “Exploration in deep reinforcement learning: A survey,” Inf. Fusion, vol. 85, pp. 1–22, 2022.
  • A. Ronca, G. P. Licks, and G. D. Giacomo, “Markov abstractions for PAC reinforcement learning in non-Markov decision processes,” arXiv:2205.01053, 2022.
  • S. N. Yasar, and E. Karaköse, “Trajectory control of quadcopter in MATLAB simulation environment,” in Proc. Int. Conf. Decis. Aid Sci. Appl. (DASA), Chiangrai, Thailand, 2022, pp. 1127–1131.
  • S. A. H. Mohsan, N.Q.H. Othman, Y. Li, M.H. Alsharif, and M.A. Khan, “Unmanned aerial vehicles (UAVs): Practical aspects, applications, open challenges, security issues, and future trends,” Intell. Serv. Robot., vol. 16, pp. 109–137, 2023.
  • D. Hong, S. Lee, Y.H. Cho, D. Baek, J. Kim, and N. Chang, “Energy-efficient online path planning of multiple drones using reinforcement learning,” IEEE Trans. Veh. Technol., vol. 70, no. 10, pp. 9725–9740, 2021.
  • K. Nagami, and M. Schwager, “HJB-RL: Initializing reinforcement learning with optimal control policies applied to autonomous drone racing,” in Proc. Robot.: Sci. Syst., 2021.
  • E. Cetin, C. Barrado, G. Muñoz, M. Macias, and E. Pastor, “Drone navigation and avoidance of obstacles through deep reinforcement learning,” in Proc. IEEE/AIAA Digit. Avion. Syst. Conf. (DASC), San Diego, CA, USA, 2019, pp. 1–7.
  • M. A. B. Abbass, and H. S. Kang, “Drone elevation control based on Python-Unity integrated framework for reinforcement learning applications,” Drones, vol. 7, no. 4, p. 225, 2023.
  • D. Gozen, and S. Ozer, “Visual object tracking in drone images with deep reinforcement learning,” in Proc. Int. Conf. Pattern Recognit. (ICPR), Milan, Italy, 2021, pp. 10082–10089.
  • D. K. Kim, and T. Chen, “Deep neural network for real-time autonomous indoor navigation,” arXiv:1511.04668, 2015.
  • T. Guo, N. Jiang, B. Li, X. Zhu, Y. Wang, and W. Du, “UAV navigation in high dynamic environments: A deep reinforcement learning approach,” Chin. J. Aeronaut., vol. 34, no. 2, pp. 479–489, 2021.
  • E. Karaköse, “Coordination of multi UAV’s equipped with IoT,” in Proc. Int. Conf. Adv. Technol., Antalya, Türkiye, 2018, pp. 169–172.
  • C. Wang, J. Wang, J. Wang, and X. Zhang, “Deep-reinforcement-learning-based autonomous UAV navigation with sparse rewards,” IEEE Internet Things J., vol. 7, no. 7, pp. 6180–6190, 2020.
  • S. Y. Shin, Y. W. Kang, and Y. G. Kim, “Obstacle avoidance drone by deep reinforcement learning and its racing with human pilot,” Appl. Sci., vol. 9, no. 24, p. 5571, 2019.
  • A. K. Tiwari, and S. V. Nadimpalli, “Augmented random search for quadcopter control: An alternative to reinforcement learning,” arXiv:1911.12553 [cs.LG], 2019.
  • G. Munoz, C. Barrado, E. Çetin, and E. Salami, “Deep reinforcement learning for drone delivery,” Drones, vol. 3, no. 3, p. 72, 2019.
  • Y. Chen, W. Zheng, Y. Zhao, T.H. Song, and H. Shin, “DW-YOLO: An efficient object detector for drones and self-driving vehicles,” Arab. J. Sci. Eng., vol. 48, pp. 1427–1436, 2023.
  • F. Ahmed, J. C. Mohanta, A. Keshari, and P.S.Yadav, “Recent advances in unmanned aerial vehicles: A review,” Arab. J. Sci. Eng., vol. 47, pp. 7963–7984, 2022.
  • T. Thomas, S. Srinivas, and C. Rajendran, “Collaborative truck multi-drone delivery system considering drone scheduling and en route operations,” Ann. Oper. Res., Jun. 2023.
  • Z. Bi, X. Guo, J. Wang, S. Qin, and G. Liu, “Deep reinforcement learning for truck-drone delivery problem,” Drones, vol. 7, no. 7, p. 445, 2023.
  • A. F. U. Din, I. Mir, F. Gul, M. R. A. Nasar, and L. Abualigah, “Reinforced learning-based robust control design for unmanned aerial vehicle,” Arab. J. Sci. Eng., vol. 48, pp. 1221–1236.
  • M. Yilmazer, E. Karakose, and M. Karakose, “Multi-package delivery optimization with drone,” in Proc. Int. Conf. Data Anal. Bus. Ind. (ICDABI), Sakheer, Bahrain, 2021, pp. 65–69.
  • E. Karaköse, “A new last mile delivery approach for the hybrid truck multi-drone problem using a genetic algorithm,” Appl. Sci., vol. 14, no. 2, p. 616, 2024

Otonom Drone ile Kargo Teslimatı için Hibrit Bir Pekiştirmeli Öğrenme Yaklaşımı

Year 2025, Volume: 4 Issue: 3, 580 - 603, 20.10.2025
https://doi.org/10.62520/fujece.1652790

Abstract

Özellikle ulaştırma sektöründe ve kargo teslimatlarında drone kullanımı, önemli derecede ilgi çeken ve odaklanılan zorlu ve sınırlı konular arasındadır. Bu çalışmada, drone Unreal Engine yazılımıyla oluşturulan bir simülasyon ortamında, dışarıdan hiçbir bilgi almadan, hatta rota bilgisi bile olmadan haritanın merkezinden hareket ederek, kargoyu tamamen otonom bir şekilde teslim edebilmektedir. Drone’un görevleri arasında engelleri aşmak, hava koşullarından etkilenmemek, kargo aracını bulmak ve kargoyu sahiplerine teslim etmek gibi yetenekler yer almaktadır. Otonom olarak hareket eden drone’da kargo taşımacılığı ve navigasyon amaçları için RGB ve derinlik kameralarıyla birlikte üç farklı algoritma kullanılmıştır. Altı farklı kombinasyon oluşturularak birçok değişken açısından karşılaştırmalar yapılmıştır. Her kombinasyon, 150.000 adım boyunca eğitilmiş ve belirlenen metriklere göre değerlendirilmiştir. Drone, DQN, PPO ve hibrit Joint-DQN algoritmaları gibi pekiştirmeli öğrenme algoritmaları kullanılarak eğitilmiş ve hafıza için de LSTM algoritması kullanılmıştır. Bu algoritmalar simülasyon ortamında test edilmiş ve karşılaştırılmıştır. Ayrıca, RGB ve derinlik kameraları drone’a entegre edilmiş olup, her algoritma RGB ve derinlik kameralarıyla ayrı ayrı çalıştırılarak değerlendirilmiştir. Sistemde, drone hedefe doğru hareket ettikçe artı puan kazanmakta ve ters yönde hareket ettiğinde eksi puan almaktadır. Eğer drone bir engele çarparsa, simülasyon yeniden başlamaktadır. Elde edilen sonuçlar, algoritmaların önce engelleri aşmayı öğrendiğini sonra doğru yolu bulduğunu göstermiştir. Yeterli öğrenme süresi sağlandığında, drone görevini başarıyla yerine getirmiştir. Ayrıca, modeller performans açısından değerlendirildiğinde, DQN-RGB modeli en hızlı öğrenen model olarak tanımlanmıştır ve PPO algoritmaları tüm modellere kıyasla geride kalmıştır. Sonuç olarak, önerilen “Joint” katmanının öğrenme hızını yavaşlatsa da uzun vadede daha kararlı ve verimli bir model ortaya koyduğu belirtilmiştir.

Ethical Statement

Hazırlanan makalede etik kurul onayına gerek yoktur. Hazırlanan makalede herhangi bir kişi/kurumla çıkar çatışması bulunmamaktadır.

References

  • F. Wang, X. Zhu, Z. Zhou, and Y. Tang, “Deep-reinforcement-learning-based UAV autonomous navigation and collision avoidance in unknown environments,” Chin. J. Aeronaut., vol. 37, no. 3, pp. 237–257, 2024.
  • J. Jagannath, A. Jagannath, S. Furman, and T. Gwin, “Deep learning and reinforcement learning for autonomous unmanned aerial systems: Roadmap for theory to deployment,” in Deep Learning for Unmanned Systems, A. Koubaa and A. T. Azar, Eds. Cham, Switzerland: Springer, 2021, pp. 25–82.
  • A. Haque, N.-U.-R. Chowdhury, and M. S. Hossen, “Exploring the benefits of reinforcement learning for autonomous drone navigation and control,” Int. J. Adv. Netw. Appl., vol. 15, no. 1, pp. 5808–5814, 2023.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, pp. 84–90, May 2017.
  • C. Chen, Y. Zhang, Q. Lv, S. Wei, X. Wang, and X. Sun, “RRNet: A hybrid detector for object detection in drone-captured images,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshops (ICCVW), Seoul, South Korea, 2019, pp. 100–108.
  • D. Floreano, and R. J. Wood, “Science, technology and the future of small autonomous drones,” Nature, vol. 521, pp. 460–466, 2015.
  • D. Abel et al., “Expressing non-Markov reward to a Markov agent,” Comput. Sci., 2022, Corpus ID: 253115270.
  • L. Pawel, W. Lilian, K. Minwoo, and O. Hyondong, “Exploration in deep reinforcement learning: A survey,” Inf. Fusion, vol. 85, pp. 1–22, 2022.
  • A. Ronca, G. P. Licks, and G. D. Giacomo, “Markov abstractions for PAC reinforcement learning in non-Markov decision processes,” arXiv:2205.01053, 2022.
  • S. N. Yasar, and E. Karaköse, “Trajectory control of quadcopter in MATLAB simulation environment,” in Proc. Int. Conf. Decis. Aid Sci. Appl. (DASA), Chiangrai, Thailand, 2022, pp. 1127–1131.
  • S. A. H. Mohsan, N.Q.H. Othman, Y. Li, M.H. Alsharif, and M.A. Khan, “Unmanned aerial vehicles (UAVs): Practical aspects, applications, open challenges, security issues, and future trends,” Intell. Serv. Robot., vol. 16, pp. 109–137, 2023.
  • D. Hong, S. Lee, Y.H. Cho, D. Baek, J. Kim, and N. Chang, “Energy-efficient online path planning of multiple drones using reinforcement learning,” IEEE Trans. Veh. Technol., vol. 70, no. 10, pp. 9725–9740, 2021.
  • K. Nagami, and M. Schwager, “HJB-RL: Initializing reinforcement learning with optimal control policies applied to autonomous drone racing,” in Proc. Robot.: Sci. Syst., 2021.
  • E. Cetin, C. Barrado, G. Muñoz, M. Macias, and E. Pastor, “Drone navigation and avoidance of obstacles through deep reinforcement learning,” in Proc. IEEE/AIAA Digit. Avion. Syst. Conf. (DASC), San Diego, CA, USA, 2019, pp. 1–7.
  • M. A. B. Abbass, and H. S. Kang, “Drone elevation control based on Python-Unity integrated framework for reinforcement learning applications,” Drones, vol. 7, no. 4, p. 225, 2023.
  • D. Gozen, and S. Ozer, “Visual object tracking in drone images with deep reinforcement learning,” in Proc. Int. Conf. Pattern Recognit. (ICPR), Milan, Italy, 2021, pp. 10082–10089.
  • D. K. Kim, and T. Chen, “Deep neural network for real-time autonomous indoor navigation,” arXiv:1511.04668, 2015.
  • T. Guo, N. Jiang, B. Li, X. Zhu, Y. Wang, and W. Du, “UAV navigation in high dynamic environments: A deep reinforcement learning approach,” Chin. J. Aeronaut., vol. 34, no. 2, pp. 479–489, 2021.
  • E. Karaköse, “Coordination of multi UAV’s equipped with IoT,” in Proc. Int. Conf. Adv. Technol., Antalya, Türkiye, 2018, pp. 169–172.
  • C. Wang, J. Wang, J. Wang, and X. Zhang, “Deep-reinforcement-learning-based autonomous UAV navigation with sparse rewards,” IEEE Internet Things J., vol. 7, no. 7, pp. 6180–6190, 2020.
  • S. Y. Shin, Y. W. Kang, and Y. G. Kim, “Obstacle avoidance drone by deep reinforcement learning and its racing with human pilot,” Appl. Sci., vol. 9, no. 24, p. 5571, 2019.
  • A. K. Tiwari, and S. V. Nadimpalli, “Augmented random search for quadcopter control: An alternative to reinforcement learning,” arXiv:1911.12553 [cs.LG], 2019.
  • G. Munoz, C. Barrado, E. Çetin, and E. Salami, “Deep reinforcement learning for drone delivery,” Drones, vol. 3, no. 3, p. 72, 2019.
  • Y. Chen, W. Zheng, Y. Zhao, T.H. Song, and H. Shin, “DW-YOLO: An efficient object detector for drones and self-driving vehicles,” Arab. J. Sci. Eng., vol. 48, pp. 1427–1436, 2023.
  • F. Ahmed, J. C. Mohanta, A. Keshari, and P.S.Yadav, “Recent advances in unmanned aerial vehicles: A review,” Arab. J. Sci. Eng., vol. 47, pp. 7963–7984, 2022.
  • T. Thomas, S. Srinivas, and C. Rajendran, “Collaborative truck multi-drone delivery system considering drone scheduling and en route operations,” Ann. Oper. Res., Jun. 2023.
  • Z. Bi, X. Guo, J. Wang, S. Qin, and G. Liu, “Deep reinforcement learning for truck-drone delivery problem,” Drones, vol. 7, no. 7, p. 445, 2023.
  • A. F. U. Din, I. Mir, F. Gul, M. R. A. Nasar, and L. Abualigah, “Reinforced learning-based robust control design for unmanned aerial vehicle,” Arab. J. Sci. Eng., vol. 48, pp. 1221–1236.
  • M. Yilmazer, E. Karakose, and M. Karakose, “Multi-package delivery optimization with drone,” in Proc. Int. Conf. Data Anal. Bus. Ind. (ICDABI), Sakheer, Bahrain, 2021, pp. 65–69.
  • E. Karaköse, “A new last mile delivery approach for the hybrid truck multi-drone problem using a genetic algorithm,” Appl. Sci., vol. 14, no. 2, p. 616, 2024
There are 30 citations in total.

Details

Primary Language English
Subjects Electrical Engineering (Other)
Journal Section Research Articles
Authors

Ebru Karaköse 0000-0003-1191-6375

Batuhan Bayraktar This is me 0009-0001-2870-3627

Publication Date October 20, 2025
Submission Date March 6, 2025
Acceptance Date August 5, 2025
Published in Issue Year 2025 Volume: 4 Issue: 3

Cite

APA Karaköse, E., & Bayraktar, B. (2025). A Hybrid Reinforcement Learning Approach for Cargo Delivery by Autonomous Drone. Firat University Journal of Experimental and Computational Engineering, 4(3), 580-603. https://doi.org/10.62520/fujece.1652790
AMA Karaköse E, Bayraktar B. A Hybrid Reinforcement Learning Approach for Cargo Delivery by Autonomous Drone. FUJECE. October 2025;4(3):580-603. doi:10.62520/fujece.1652790
Chicago Karaköse, Ebru, and Batuhan Bayraktar. “A Hybrid Reinforcement Learning Approach for Cargo Delivery by Autonomous Drone”. Firat University Journal of Experimental and Computational Engineering 4, no. 3 (October 2025): 580-603. https://doi.org/10.62520/fujece.1652790.
EndNote Karaköse E, Bayraktar B (October 1, 2025) A Hybrid Reinforcement Learning Approach for Cargo Delivery by Autonomous Drone. Firat University Journal of Experimental and Computational Engineering 4 3 580–603.
IEEE E. Karaköse and B. Bayraktar, “A Hybrid Reinforcement Learning Approach for Cargo Delivery by Autonomous Drone”, FUJECE, vol. 4, no. 3, pp. 580–603, 2025, doi: 10.62520/fujece.1652790.
ISNAD Karaköse, Ebru - Bayraktar, Batuhan. “A Hybrid Reinforcement Learning Approach for Cargo Delivery by Autonomous Drone”. Firat University Journal of Experimental and Computational Engineering 4/3 (October2025), 580-603. https://doi.org/10.62520/fujece.1652790.
JAMA Karaköse E, Bayraktar B. A Hybrid Reinforcement Learning Approach for Cargo Delivery by Autonomous Drone. FUJECE. 2025;4:580–603.
MLA Karaköse, Ebru and Batuhan Bayraktar. “A Hybrid Reinforcement Learning Approach for Cargo Delivery by Autonomous Drone”. Firat University Journal of Experimental and Computational Engineering, vol. 4, no. 3, 2025, pp. 580-03, doi:10.62520/fujece.1652790.
Vancouver Karaköse E, Bayraktar B. A Hybrid Reinforcement Learning Approach for Cargo Delivery by Autonomous Drone. FUJECE. 2025;4(3):580-603.