Monocular Depth Estimation and Detection of Near Objects

Ali Tezcan Sarızeybek; Ali Hakan Isık

doi:10.55974/utbd.1177526

Araştırma Makalesi

Monocular Depth Estimation and Detection of Near Objects

Yıl 2022, , 124 - 131, 31.12.2022

Ali Tezcan Sarızeybek , Ali Hakan Isık

https://doi.org/10.55974/utbd.1177526

Öz

The image obtained from the cameras is 2D, so we cannot know how far the object is on the image. In order to detect objects only at a certain distance in a camera system, we need to convert the 2D image into 3D. Depth estimation is used to estimate distances to objects. It is the perception of the 2D image as 3D. Although different methods are used to implement this, the method to be applied in this experiment is to detect depth perception with a single camera. After obtaining the depth map, the obtained image will be filtered by objects in the near distance, the distant image will be closed, a new image will be run with the object detection model and object detection will be performed. The desired result in this experiment is, for projects with a low budget, instead of using dual camera or LIDAR methods, it is to ensure that a robot can detect obstacles that will come in front of it with only one camera. As a result, 8 FPS was obtained by running two models on the embedded device, and the loss value was obtained as 0.342 in the inference test performed on the new image, where only close objects were taken after the depth estimation.

Anahtar Kelimeler

Monocular Depth Estimation, Object Detection, Obstacle Avoidance, Image Processing

Kaynakça

[1] Kusupati, U., Cheng, S., Chen, R., & Su, H. (2020). Normal assisted stereo depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2189-2199).
[2] Hess, J., Beinhofer, M., & Burgard, W. (2014, May). A probabilistic approach to high-confidence cleaning guarantees for low-cost cleaning robots. In 2014 IEEE international conference on robotics and automation (ICRA) (pp. 5600-5605). IEEE.
[3] Wang, Y., Lai, Z., Huang, G., Wang, B. H., Van Der Maaten, L., Campbell, M., & Weinberger, K. Q. (2019, May). Anytime stereo image depth estimation on mobile devices. In 2019 international conference on robotics and automation (ICRA) (pp. 5893-5900). IEEE.
[4] Dutta, S., Das, S. D., Shah, N. A., & Tiwari, A. K. (2021). Stacked Deep Multi-Scale Hierarchical Network for Fast Bokeh Effect Rendering from a Single Image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2398-2407).
[5] Ignatov, A., Malivenko, G., Plowman, D., Shukla, S., & Timofte, R. (2021). Fast and accurate single-image depth estimation on mobile devices, mobile ai 2021 challenge: Report. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2545-2557).
[6] Collis, R. T. H. (1969). Lidar. In Advances in Geophysics (Vol. 13, pp. 113-139). Elsevier.
[7] Hecht, J. (2018). Lidar for self-driving cars. Optics and Photonics News, 29(1), 26-33.
[8] Wróżyński, R., Pyszny, K., & Sojka, M. (2020). Quantitative landscape assessment using LiDAR and rendered 360 panoramic images. Remote Sensing, 12(3), 386.
[9] Ullrich, A., & Pfennigbauer, M. (2016, May). Linear LIDAR versus Geiger-mode LIDAR: impact on data properties and data quality. In Laser Radar Technology and Applications XXI (Vol. 9832, pp. 29-45). SPIE.
[10] Long, X., Liu, L., Li, W., Theobalt, C., & Wang, W. (2021). Multi-view depth estimation using epipolar spatio-temporal networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8258-8267).
[11] Kusupati, U., Cheng, S., Chen, R., & Su, H. (2020). Normal assisted stereo depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2189-2199).
[12] Ding, X., Xu, L., Wang, H., Wang, X., & Lv, G. (2011). Stereo depth estimation under different camera calibration and alignment errors. Applied Optics, 50(10), 1289-1301.
[13] Wang, Y., Lai, Z., Huang, G., Wang, B. H., Van Der Maaten, L., Campbell, M., & Weinberger, K. Q. (2019, May). Anytime stereo image depth estimation on mobile devices. In 2019 international conference on robotics and automation (ICRA) (pp. 5893-5900). IEEE.
[14] Fahmy, A. A., Ismail, O., & Al-Janabi, A. K. (2013). Stereo vision based depth estimation algorithm in uncalibrated rectification. Int J Video Image Process Netw Secur, 13(2), 1-8.
[15] Zhao, C., Sun, Q., Zhang, C., Tang, Y., & Qian, F. (2020). Monocular depth estimation based on deep learning: An overview. Science China Technological Sciences, 63(9), 1612-1627.
[16] Yuan, W., Gu, X., Dai, Z., Zhu, S., & Tan, P. (2022). NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation. arXiv preprint arXiv:2203.01502.
[17] Xue, F., Zhuo, G., Huang, Z., Fu, W., Wu, Z., & Ang, M. H. (2020). Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 2330-2337). IEEE.
[18] Huynh, L., Nguyen-Ha, P., Matas, J., Rahtu, E., & Heikkilä, J. (2020, August). Guiding monocular depth estimation using depth-attention volume. In European Conference on Computer Vision (pp. 581-597). Springer, Cham.
[19] Ramamonjisoa, M., & Lepetit, V. (2019). Sharpnet: Fast and accurate recovery of occluding contours in monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (pp. 0-0).
[20] Lee, J. H., & Kim, C. S. (2020, August). Multi-loss rebalancing algorithm for monocular depth estimation. In European Conference on Computer Vision (pp. 785-801). Springer, Cham.
[21] Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., & Koltun, V. (2020). Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence.
[22] Miangoleh, S. M. H., Dille, S., Mai, L., Paris, S., & Aksoy, Y. (2021). Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9685-9694).
[23] Li, S., Luo, Y., Zhu, Y., Zhao, X., Li, Y., & Shan, Y. (2021). Enforcing Temporal Consistency in Video Depth Estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1145-1154).
[24] Jung, D., Choi, J., Lee, Y., Kim, D., Kim, C., Manocha, D., & Lee, D. (2021). DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 12797-12807).
[25] Kopf, J., Rong, X., & Huang, J. B. (2021). Robust consistent video depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1611-1621).
[26] Chang, J., & Wetzstein, G. (2019). Deep optics for monocular depth estimation and 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 10193-10202).
[27] Gkikas, A., Proestakis, E., Amiridis, V., Kazadzis, S., Di Tomaso, E., Marinou, E., ... & García-Pando, C. P. (2022). Quantification of the dust optical depth across spatiotemporal scales with the MIDAS global dataset (2003–2017). Atmospheric Chemistry and Physics, 22(5), 3553-3578.
[28] Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., & Koltun, V. (2020). Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence.
[29] Zhiqiang, W., & Jun, L. (2017, July). A review of object detection based on convolutional neural network. In 2017 36th Chinese control conference (CCC) (pp. 11104-11109). IEEE.
[30] Zhou, X., Gong, W., Fu, W., & Du, F. (2017, May). Application of deep learning in object detection. In 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS) (pp. 631-634). IEEE.
[31] Kemsaram, N., Das, A., & Dubbelman, G. (2019, July). An integrated framework for autonomous driving: object detection, lane detection, and free space detection. In 2019 Third World Conference on Smart Trends in Systems Security and Sustainability (WorldS4) (pp. 260-265). IEEE.
[32] Black, A. W., & Lenzo, K. A. (2001). Flite: a small fast run-time synthesis engine. In 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis.

Monoküler Derinlik Tahmini ve Yakın Nesnelerin Tespiti

Yıl 2022, , 124 - 131, 31.12.2022

Ali Tezcan Sarızeybek , Ali Hakan Isık

https://doi.org/10.55974/utbd.1177526

Öz

Kameralardan elde edilen görüntü 2 boyutlu olduğu için cismin görüntü üzerinde ne kadar uzakta olduğunu bilemeyiz. Bir kamera sisteminde sadece belirli bir mesafedeki nesneleri algılamak için 2 boyutlu görüntüyü 3 boyutluya dönüştürmemiz gerekir. Derinlik tahmini, nesnelere olan mesafeleri tahmin etmek için kullanılır. 2 boyutlu görüntünün 3 boyutlu olarak algılanmasıdır. Bunu uygulamak için farklı yöntemler kullanılsa da, bu deneyde uygulanacak yöntem, tek bir kamera ile derinlik algısını tespit etmektir. Derinlik haritası elde edildikten sonra elde edilen görüntü yakın mesafedeki nesneler tarafından filtrelenecek, uzaktaki görüntü kapatılacak, nesne algılama modeli ile yeni bir görüntü çalıştırılacak ve nesne algılama gerçekleştirilecektir. Bu deneyde istenilen sonuç, düşük bütçeli projeler için çift kamera veya LIDAR yöntemlerini kullanmak yerine, bir robotun önüne gelecek engelleri tek kamera ile tespit edilmesini sağlamaktır. Sonuç olarak, gömülü cihaz üzerinde iki model çalıştırılarak 8 FPS elde edilmiş ve derinlik tahmini sonrası sadece yakın nesnelerin alındığı yeni görüntü üzerinde yapılan çıkarım testinde kayıp değeri 0.342 olarak elde edilmiştir.

Anahtar Kelimeler

Monoküler Derinlik Tahmini, Nesne Tespiti, Engelden Kaçınma, Görüntü İşleme

Kaynakça

[1] Kusupati, U., Cheng, S., Chen, R., & Su, H. (2020). Normal assisted stereo depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2189-2199).
[2] Hess, J., Beinhofer, M., & Burgard, W. (2014, May). A probabilistic approach to high-confidence cleaning guarantees for low-cost cleaning robots. In 2014 IEEE international conference on robotics and automation (ICRA) (pp. 5600-5605). IEEE.
[3] Wang, Y., Lai, Z., Huang, G., Wang, B. H., Van Der Maaten, L., Campbell, M., & Weinberger, K. Q. (2019, May). Anytime stereo image depth estimation on mobile devices. In 2019 international conference on robotics and automation (ICRA) (pp. 5893-5900). IEEE.
[4] Dutta, S., Das, S. D., Shah, N. A., & Tiwari, A. K. (2021). Stacked Deep Multi-Scale Hierarchical Network for Fast Bokeh Effect Rendering from a Single Image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2398-2407).
[5] Ignatov, A., Malivenko, G., Plowman, D., Shukla, S., & Timofte, R. (2021). Fast and accurate single-image depth estimation on mobile devices, mobile ai 2021 challenge: Report. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2545-2557).
[6] Collis, R. T. H. (1969). Lidar. In Advances in Geophysics (Vol. 13, pp. 113-139). Elsevier.
[7] Hecht, J. (2018). Lidar for self-driving cars. Optics and Photonics News, 29(1), 26-33.
[8] Wróżyński, R., Pyszny, K., & Sojka, M. (2020). Quantitative landscape assessment using LiDAR and rendered 360 panoramic images. Remote Sensing, 12(3), 386.
[9] Ullrich, A., & Pfennigbauer, M. (2016, May). Linear LIDAR versus Geiger-mode LIDAR: impact on data properties and data quality. In Laser Radar Technology and Applications XXI (Vol. 9832, pp. 29-45). SPIE.
[10] Long, X., Liu, L., Li, W., Theobalt, C., & Wang, W. (2021). Multi-view depth estimation using epipolar spatio-temporal networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8258-8267).
[11] Kusupati, U., Cheng, S., Chen, R., & Su, H. (2020). Normal assisted stereo depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2189-2199).
[12] Ding, X., Xu, L., Wang, H., Wang, X., & Lv, G. (2011). Stereo depth estimation under different camera calibration and alignment errors. Applied Optics, 50(10), 1289-1301.
[13] Wang, Y., Lai, Z., Huang, G., Wang, B. H., Van Der Maaten, L., Campbell, M., & Weinberger, K. Q. (2019, May). Anytime stereo image depth estimation on mobile devices. In 2019 international conference on robotics and automation (ICRA) (pp. 5893-5900). IEEE.
[14] Fahmy, A. A., Ismail, O., & Al-Janabi, A. K. (2013). Stereo vision based depth estimation algorithm in uncalibrated rectification. Int J Video Image Process Netw Secur, 13(2), 1-8.
[15] Zhao, C., Sun, Q., Zhang, C., Tang, Y., & Qian, F. (2020). Monocular depth estimation based on deep learning: An overview. Science China Technological Sciences, 63(9), 1612-1627.
[16] Yuan, W., Gu, X., Dai, Z., Zhu, S., & Tan, P. (2022). NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation. arXiv preprint arXiv:2203.01502.
[17] Xue, F., Zhuo, G., Huang, Z., Fu, W., Wu, Z., & Ang, M. H. (2020). Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 2330-2337). IEEE.
[18] Huynh, L., Nguyen-Ha, P., Matas, J., Rahtu, E., & Heikkilä, J. (2020, August). Guiding monocular depth estimation using depth-attention volume. In European Conference on Computer Vision (pp. 581-597). Springer, Cham.
[19] Ramamonjisoa, M., & Lepetit, V. (2019). Sharpnet: Fast and accurate recovery of occluding contours in monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (pp. 0-0).
[20] Lee, J. H., & Kim, C. S. (2020, August). Multi-loss rebalancing algorithm for monocular depth estimation. In European Conference on Computer Vision (pp. 785-801). Springer, Cham.
[21] Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., & Koltun, V. (2020). Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence.
[22] Miangoleh, S. M. H., Dille, S., Mai, L., Paris, S., & Aksoy, Y. (2021). Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9685-9694).
[23] Li, S., Luo, Y., Zhu, Y., Zhao, X., Li, Y., & Shan, Y. (2021). Enforcing Temporal Consistency in Video Depth Estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1145-1154).
[24] Jung, D., Choi, J., Lee, Y., Kim, D., Kim, C., Manocha, D., & Lee, D. (2021). DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 12797-12807).
[25] Kopf, J., Rong, X., & Huang, J. B. (2021). Robust consistent video depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1611-1621).
[26] Chang, J., & Wetzstein, G. (2019). Deep optics for monocular depth estimation and 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 10193-10202).
[27] Gkikas, A., Proestakis, E., Amiridis, V., Kazadzis, S., Di Tomaso, E., Marinou, E., ... & García-Pando, C. P. (2022). Quantification of the dust optical depth across spatiotemporal scales with the MIDAS global dataset (2003–2017). Atmospheric Chemistry and Physics, 22(5), 3553-3578.
[28] Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., & Koltun, V. (2020). Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence.
[29] Zhiqiang, W., & Jun, L. (2017, July). A review of object detection based on convolutional neural network. In 2017 36th Chinese control conference (CCC) (pp. 11104-11109). IEEE.
[30] Zhou, X., Gong, W., Fu, W., & Du, F. (2017, May). Application of deep learning in object detection. In 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS) (pp. 631-634). IEEE.
[31] Kemsaram, N., Das, A., & Dubbelman, G. (2019, July). An integrated framework for autonomous driving: object detection, lane detection, and free space detection. In 2019 Third World Conference on Smart Trends in Systems Security and Sustainability (WorldS4) (pp. 260-265). IEEE.
[32] Black, A. W., & Lenzo, K. A. (2001). Flite: a small fast run-time synthesis engine. In 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis.

Toplam 32 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Bölüm	Araştırma Makalesi
Yazarlar	Ali Tezcan Sarızeybek 0000-0001-8949-8332 Ali Hakan Isık 0000-0003-3561-9375
Yayımlanma Tarihi	31 Aralık 2022
Yayımlandığı Sayı	Yıl 2022

Kaynak Göster

IEEE	A. T. Sarızeybek ve A. H. Isık, “Monocular Depth Estimation and Detection of Near Objects”, UTBD, c. 14, sy. 3, ss. 124–131, 2022, doi: 10.55974/utbd.1177526.

Makale Dosyaları

Tam Metin

Dergi isminin Türkçe kısaltması "UTBD" ingilizce kısaltması "IJTS" şeklindedir.

Dergimizde yayınlanan makalelerin tüm bilimsel sorumluluğu yazar(lar)a aittir. Editör, yardımcı editör ve yayıncı dergide yayınlanan yazılar için herhangi bir sorumluluk kabul etmez.