Direct pose estimation from RGB images using 3D objects

Muhammed Ali Dede; Yakup Genç

Araştırma Makalesi

3 Boyutlu nesneleri kullanarak imgelerden poz kestirimi

Yıl 2022, Cilt: 28 Sayı: 2, 277 - 285, 30.04.2022

Muhammed Ali Dede Yakup Genç

Öz

Artırılmış gerçeklik uygulamalarında kullanılmak üzere tek bir kameradan bir nesnenin yön ve konum kestirimini yapan bir algoritma sunulmaktadır. Küçük bir konvolüsyon ağından oluşan bu model 6 serbestlik dereceli konum ve yön bilgisini tek bir KYM (kırmızı-mavi-yeşil) imgeden elde etmektedir. Önerilen model yüksek başarım ve hafıza içermeyen mobil cihazlar için idealdir. Algoritma verilen bir imgeyi 1ms içinde işlemekte ve güncel algoritmaların performansına yakın performans sergilemektedir. Önerilen geometrik kayıp fonksiyonu ve kullanılan cebirsel kısıtlamaları modelin performansını sağlamaktadır. Aynı zamanda sentetik bir veri kümesi de bu tür modellerin performansını ölçmek için önerilmiştir.

Anahtar Kelimeler

Arttırılmış Gerçeklik, Poz kestirimi, Derinöğrenme

Kaynakça

[1] Kendall A, Grimes M, Cipolla R. “PoseNet: A convolutional network for real-time 6-DOF camera relocalization”. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Las Condes, Chile, 11-18 December 2015.
[2] Shotton J, Glocker B, Zach C, Izadi S, Criminisi A, Fitzgibbon A. “Scene coordinate regression forests for camera relocalization in RGB-D images”. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, USA, 23-28 June 2013.
[3] Lin C. “Microsoft COCO: Common objects in context”. In the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 14-21 August 2014.
[4] Gao H, Zhuang L, Kilian QW. “Densely connected convolutional networks”. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 27-30 June 2016.
[5] Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba T. “Places: A 10 million image database for scene recognition”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6), 1452-1464, 2018.
[6] Martin A, Agarwal A, Barham A. “TensorFlow: Large-Scale machine learning on heterogeneous systems”. arXiv, 2016. https://www.tensorflow.org/.
[7] Diederik P, Ba J. “Adam: A method for stochastic optimization” 3rd International Conference on Learning Representations, {ICLR} 2015, San Diego, USA, 7-9 May 2015.
[8] Kehl W, Manhardt F, Tombari F, Ilic S, Navab N. “SSD-6D: making RGB-Based 3D detection and 6D pose estimation great again”. IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22-29 October 2017.
[9] Tekin B, Sinha S, Fua P. “Real-Time seamless single shot 6D object pose prediction”. In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, 18-22 June 2018.
[10] Wang H, Sridhar S, Huang J, Valentin J, Song S, Guibas L. “Normalized object coordinate space for category-level 6D object pose and size Estimation”. In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 16-20 June 2019.
[11] Sock J, In Kim K, Sahin C, Kyun Kim T.” Multi-Task deep networks for depth-based 6D object pose and joint registration in crowd scenarios” British Machine Vision Conference (BMVC), New Castle, UK, 3-6 November 2018.
[12] Sundermeyer M, Marton ZC, Durner M, Brucker M, Triebel, R. “Implicit 3D orientation learning for 6D object detection from RGB images”. In the European Conference on Computer Vision (ECCV), Munich, Germany, 8-14 September 2018.
[13] Oberweger M, Rad M, Lepetit V. “Making deep heatmaps robust to partial occlusions for 3D object pose estimation”. In the European Conference on Computer Vision (ECCV), Munich, Germany, 8-14 September 2018.
[14] Rad M, Lepetit V. “BB8: A scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using Depth”. In the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22-29 October 2017.
[15] Peng S, Liu, Y, Huang Q, Zhou X, Bao H. “PVNet: Pixel-Wise voting network for 6DoF pose estimation”. In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 16-20 June 2019.
[16] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC. “SSD: Single shot multibox detector”. In the European Conference on Computer Vision (ECCV), Amsterdam, Netherlands, 8-16 October 2016.
[17] Redmon J, Divvala S, Girshick R, Farhadi A.” You only look once: unified, real-time object detection”. In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 27-30 June 2016.
[18] Walch F, Hazirbas C, Leal-Taixe L, Sattler T, Hilsenbeck S, Cremers D. “Image-Based localization Using LSTMs for structured feature correlation”. In the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22-29 October 2017.
[19] Durrant-Whyte H, Bailey T. “Simultaneous localization and mapping: part I”. IEEE Robotics Automation Magazine, 13(2), 99-110, 2006.
[20] Lowe D. “Distinctive image features from Scale-Invariant key points”. International Journal of Computer Vision, 60(2), 91–110, 2004.
[21] Forsyth D, Ponce J. Computer Vision: A Modern Approach. 1st ed. New York, USA, Pitman, 2002.
[22] Hinterstoisser S, Lepetit V, Ilic S, Holzer G. B, et al. “Model based training,detection and pose estimation of textureless 3D objectsin heavily cluttered scenes”. 11th Asian Conference on Computer Vision, Daejeon, Korea, 5-9 November 2012.
[23] Zakharov S, Shugurov I, Ilic S. “DPOD: 6D pose object detector and refiner”. In the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October, 2 November 2019.
[24] Laskar Z, Melekhov I, Kalia S, Kannala J. “Camera relocalization by computing pairwise relative poses using convolutional neural network”. In the IEEE International Conference on Computer Vision (ICCV) Workshops, Venice, Italy, 22-29 October 2017.
[25] Melekhov E. “Relative camera pose estimation using convolutional neural networks”. In Advanced Concepts for Intelligent Vision Systems, Springer International Publishing, 2017.
[26] Simonyan K, Vedaldi A, Zisserman A. “Deep inside convolutional networks: Visualising image classification models and saliency Maps”. International Conferance on Learning Representations, Bandiff, Canada, 14-16 April 2014.
[27] Kendall A, Cipolla R. “Geometric loss functions for camera pose regression with deep learning”. In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 21-26 July 2017.
[28] Liu W. et al. SSD: “Single Shot MultiBox Detector”. In the European Conference on Computer Vision (ECCV), Amsterdam, Netherlands, 8-16 October 2016.
[29] Kendall A, Cipolla R. “Modelling uncertainty in deep learning for camera relocalization”. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweeden, 16-21 May 2016.
[30] Xiang Y, Schmidt T, Narayanan V, Fox D. ”PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scene”. Robotics: Science and Systems, Pittsburgh, USA, 26-30 June 2018
[31] Hinterstoisser S. et al., "Gradient response maps for realTime detection of textureless objects". In IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(5), 876-888, 2012.
[32] Martinez M, Collet A, and Srinivasa S S. "MOPED: A scalable and low latency object recognition and pose estimation system". 2010 IEEE International Conference on Robotics and Automation, Anchorahe, USA, 4-8 May 2010.
[33] Wang C, Xu D, Zhu Y, Martin-Martin R, Lu C, et Al . “DenseFusion: 6D object pose estimation by iterative dense fusion”. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 16-20 June 2019.
[34] Doumanoglou A, Kouskouridas R, Malassiotis S, Kim T. “Recovering 6d object pose and predicting next-best-view in the crowd”. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 27-30 June 2016.
[35] Kehl W, Milletari F, Tombari F, Ilic S, and Navab N. “Deep learning of local RGB-D patches for 3D object detection and 6D pose estimation”. In the European Conference on Computer Vision (ECCV), Amsterdam, Netherlands, 8-16 October 2016.
[36] He K, Zhang X, Ren S, Sun J, "Deep residual learning for image recognition". 2016 IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 27-30 June 2016.