TY - JOUR T1 - Indoor Visual Navigation Based on Deep Reinforcement Learning with Neural Ordinary Differential Equations TT - Nöral Adi Türevsel Denklemler ile Derin Pekiştirmeli Öğrenmeye Dayalı İç Mekan Görsel Navigasyonu AU - Kalaycı Demir, Güleser AU - Ağın, Berk PY - 2025 DA - May Y2 - 2024 DO - 10.21205/deufmd.2025278008 JF - Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi JO - DEUFMD PB - Dokuz Eylül Üniversitesi WT - DergiPark SN - 1302-9304 SP - 224 EP - 233 VL - 27 IS - 80 LA - en AB - Recently, Deep Reinforcement Learning (DRL) has gained attention as a promising approach to tackle the challenging problem of mobile robot navigation. This study proposes reinforcement learning utilizing Neural Ordinary Differential Equations (NODEs), which offer effective training and memory capacities, and applies it to model-free point-to-point navigation task. Through the use of NODEs, we achieved improvements in navigation performance as well as enhancements in resource optimization and adaptation. Extensive simulation studies were conducted using real-world indoor scenes to validate our approach. Results effectively demonstrated the effectiveness of our proposed NODEs-based methodology in enhancing navigation performance compared to traditional ResNet and CNN architectures. Furthermore, curriculum learning strategies were integrated into our study to enable the agent to learn through progressively more complex navigation scenarios. The results obtained indicate that this approach facilitates faster and more robust reinforcement learning. KW - Neural ordinary differential equations KW - resnet KW - deep reinforcement learning KW - visual navigation KW - mobile robot N2 - Son yıllarda, Derin Pekiştirmeli Öğrenme (DPÖ), mobil robot navigasyonun zorlu sorunlarını çözmek için umut vadeden bir yaklaşım olarak ortaya çıkmıştır. Bu çalışma, etkili eğitim ve bellek avantajı sunan Nöral Adi Türevsel Denklemler (NATD) kullanarak pekiştirmeli öğrenme yöntemini önermekte ve modelden bağımsız, noktadan-noktaya navigasyon için uygulamaktadır. NATD kullanımıyla, navigasyon performansında artış ve kaynak optimizasyonu ile adaptasyonda iyileştirme sağlanmıştır. Yaklaşımımızı doğrulamak için, gerçek dünya iç mekan sahneleri kullanılarak kapsamlı simulasyon çalışmaları yapıldı. Sonuçlar, önerdiğimiz NATD tabanlı metodolojinin, geleneksel ResNet ve CNN mimarilerine göre navigasyon performansını artırmada etkili olduğunu göstermiştir. Ayrıca, müfredat öğrenme stratejileri çalışmamıza entegre edilmiş ve ajanın aşamalı olarak daha karmaşık navigasyon senaryoları üzerinden öğrenmesi sağlanmıştır. Elde edilen sonuçlar, bu yaklaşım ile daha hızlı ve daha gürbüz pekiştirmeli öğrenmenin gerçeklenebildiğini göstermektedir. CR - [1] Ferreira, B., Reis, J. 2023. A Systematic Literature Review on the Application of Automation in Logistics, Logistics, Vol. 7, no. 4, p. 80, DOI: 10.3390/logistics7040080. CR - [2] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M. 2013. Playing Atari with Deep Reinforcement Learning. arXiv, http://arxiv.org/abs/1312.5602 (Accessed: Jun. 12, 2024). CR - [3] Fox, I., Lee, J., Pop-Busui, R., Wiens, J. 2020. Deep Reinforcement Learning for Closed-Loop Blood Glucose Control, arXiv, http://arxiv.org/abs/2009.09051 (Accessed: Jun. 12, 2024). CR - [4] Yang, S. 2023. Deep Reinforcement Learning for Portfolio Management, Knowledge-Based Systems, Vol.278, https://doi.org/10.1016/j.knosys.2023.110905. CR - [5] Liu, X.-Y., Yang, H., Chen, Q., Zhang, R., Yang, L., Xiao, B., Wang, C. 2022. FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance, arXiv, http://arxiv.org/abs/2011.09607. (Accessed: Jun. 12, 2024). CR - [6] Shi, H., Shi, L., Xu, M., Hwang, K.-S. 2020. End-to-End Navigation Strategy With Deep Reinforcement Learning for Mobile Robots, IEEE Trans. Ind. Inform., Vol. 16, no. 4, pp. 2393-2402, doi: 10.1109/TII.2019.2936167. CR - [7] Anderson, P., Chang, A., Chaplot, D.S., Dosovitskiy, A. et al. 2018. On Evaluation of Embodied Navigation Agents. arXiv, https://arxiv.org/abs/1807.06757 (Accessed: Sep. 23, 2023) CR - [8] López,, M.E., Bergasa, L. M., Escudero,M.S. 2003. Visually augmented POMDP for indoor robot navigation, in Applied Informatics, pp. 183-187. CR - [9] Gaya, J., Gon√ßalves, L., Duarte, A., Zanchetta, B., Drews-Jr, P., Botelho, S. 2016. Vision-based Obstacle Avoidance Using Deep Learning, doi: 10.1109/LARS-SBR.2016.9. CR - [10] Thrun, S. 2008. Simultaneous localization and mapping. In Robotics and cognitive approaches to spatial mapping, pp. 13-41, Springer Berlin Heidelberg. CR - [11] Balakrishnan, K., Chakravarty, P., Shrivastava, S. 2021. An A* Curriculum Approach to Reinforcement Learning for RGBD Indoor Robot Navigation. arXiv, http://arxiv.org/abs/2101.01774, (Accessed: Aug. 07, 2023). CR - [12] Stentz, A. 1994. Optimal and efficient path planning for partially-known environments, IEEE International Conference on Robotics and Automation, 1994, pp.3310-3317Vol.4. doi: 10.1109/ROBOT.1994.351061. CR - [13] LaValle, S.M. 1998. Rapidly-exploring random trees: A new tool for path planning, Technical Report 98-11, Comput. Sci. Dept, Iowa State Univ. CR - [14] Gasparetto, A., Boscariol, P., Lanzutti, A., Vidoni, R. 2012. Trajectory planning in robotics, Math. Comput. Sci., Vol. 6, no. 3, pp. 269-279. CR - [15] Bonin-Font, F., Ortiz, A., Oliver, G., 2008. Visual navigation for mobile robots: A survey, J. Intell. Robot. Syst., Vol. 53, no. 3, pp. 263-296. CR - [16] Pan, M., Liu, Y., Cao, J., Li, Y., Li, C., Chen, C.-H. 2020. Visual Recognition Based on Deep Learning for Navigation Mark Classification, IEEE Access, Vol. 8, pp. 32767-32775, doi: 10.1109/ACCESS.2020.2973856. CR - [17] Ayyalasomayajula R., Arun, A., Wu, C., Sharma, S., Sethi, A.R., Vasisht, D., Bharadiaet D. 2020. Deep Learning Based Wireless Localization for Indoor Navigation," 26th Annual International Conference on Mobile Computing and Networking, doi: 10.1145/3372224.3380894 CR - [18] Ocaña, M., Bergasa, L. M., Sotelo, M., Flores, R. 2005. Indoor robot navigation using a POMDP based on WiFi and ultrasound observations, IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 2592-2597. CR - [19] Parker-Holder, J., Rajan, R., Song, X., Biedenkapp, A., Miao, Y., Eimer, T., Zhang, B., Nguyen B., et al., 2022. Automated Reinforcement Learning (AutoRL): A Survey and Open Problems, J. Artif. Intell. Res., Vol. 74, pp. 517-568, doi: 10.1613/jair.1.13596. CR - [20] LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep Learning, Nature, Vol. 521, pp. 436-444, doi: 10.1038/nature14539. CR - [21] He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep Residual Learning for Image Recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, doi: 10.1109/CVPR.2016.90. CR - [22] Krizhevsky, A., Sutskever, I., Hinton, G.E. 2012. ImageNet Classification with Deep Convolutional Neural Networks, in Advances in Neural Information Processing Systems, pp 1097-1105. CR - [23] Szegedy C., et al., 2015. Going deeper with convolutions, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-9, 2015. doi: 10.1109/CVPR.2015.7298594. CR - [24] R. T., Rubanova, Y., Bettencourt, J., & Duvenaud, D. K. 2018. Neural ordinary differential equations. Advances in neural information processing systems, 31. CR - [25] Ainsworth, S., Lowrey, K., Thickstun, J., Harchaoui, Z., Srinivasa, S. 2021. Faster Policy Learning with Continuous-Time Gradients. In Learning for Dynamics and Control, pp. 1054-1067, PMLR CR - [26] Yıldız, Ç., Heinonen, M., Lähdesmäki, H. 2021. Continuous-time model-based reinforcement learning. In International Conference on Machine Learning, pp. 12009-1201, PMLR. CR - [27] Meleshkova, Z., Ivanov, S.E., Ivanova, L. 2021. Application of Neural ODE with embedded hybrid method for robotic manipulator control, Procedia Comput. Sci., Vol. 193, pp. 314-324, doi: 10.1016/j.procs.2021.10.032. CR - [28] Du, J., Futoma, J., Doshi-Velez, F. 2020. Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs. Advances in Neural Information Processing Systems, 33, 19805-19816. CR - [29] Zhao, L., Miao, K., Gatsis, K., Papachristodoulou, A. 2024. NLBAC: A Neural Ordinary Differential Equations-based Framework for Stable and Safe Reinforcement Learning. arXiv http://arxiv.org/abs/2401.13148, (Accessed: Apr. 29, 2024). CR - [30] Haarnoja, T., Pong, V., Zhou, A., Dalal, M., Abbeel, P., Levine, S. 2018. Composable Deep Reinforcement Learning for Robotic Manipulation, IEEE International Conference on Robotics and Automation (ICRA), pp. 6244-6251. doi: 10.1109/ICRA.2018.8460756. CR - [31] Soviany, P., Ionescu, R.T., Rota, P., Sebe, N. 2021. Curriculum Learning: A Survey. International Journal of Computer Vision, Vol. 130:6,pp. 1526 - 1565, doi:10.1007/s11263-022-01611-x CR - [32] Zhao, R., Sun, X., Tresp, V., 2019. Maximum Entropy-Regularized Multi-Goal Reinforcement Learning, International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov, Eds., in Proceedings of Machine Learning Research, Vol. 97. PMLR, pp. 7553-7562. CR - [33] Ziebart, B. D. 2010. Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University. CR - [34] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. 2017. Proximal Policy Optimization Algorithms, arXiv preprint arXiv:1707.06347. CR - [35] Xia, F., Zamir, A.R., He, Z., Sax, A., Malik, J., Savarese, S., 2018. Gibson Env: Real-World Perception for Embodied Agents, IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9068-9079. doi: 10.1109/CVPR.2018.00945. CR - [36] Coumans, E., Bai, Y. 2016. Pybullet, a python module for physics simulation for games, robotics and machine learning. UR - https://doi.org/10.21205/deufmd.2025278008 L1 - https://dergipark.org.tr/tr/download/article-file/4034714 ER -