Research Article
BibTex RIS Cite

DepthP+P: Metric Accurate Monocular Depth Estimation using Planar and Parallax

Year 2025, Volume: 6 Issue: 2, 20 - 29
https://doi.org/10.55195/jscai.1748633

Abstract

Current self-supervised monocular depth estimation methods are mostly based on estimating a rigid-body motion representing camera motion. These methods suffer from the well-known scale ambiguity problem in their predictions. We propose DepthP+P, a method that learns to estimate outputs in metric scale by following the traditional planar parallax paradigm. We first align the two frames using a common ground plane which removes the effect of the rotation component in the camera motion. With two neural networks, we predict the depth and the camera translation, which is easier to predict alone compared to predicting it together with rotation. By assuming a known camera height, we can then calculate the induced 2D image motion of a 3D point and use it for reconstructing the target image in a self-supervised monocular approach. We perform experiments on the KITTI driving dataset and show that the planar parallax approach, which only needs to predict camera translation, can be a metrically accurate alternative to the current methods that rely on estimating 6DoF camera motion.

References

  • T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsu-pervised learning of depth and ego-motion from video,” in CVPR, pp. 1851–1858, 2017.
  • Sawhney, “3d geometry from planar parallax,” in CVPR, pp. 929–934, 1994.
  • M. Irani and P. Anandan, “Parallax geometry of pairs of points for 3d scene analysis,” in ECCV (B. Buxton and R. Cipolla, eds.), (Berlin, Heidelberg), pp. 17–30, Springer Berlin Heidelberg, 1996.
  • C. Godard, O. Mac Aodha, M. Firman, and G. J. Bros-tow, “Digging into self-supervised monocular depth estimation,” in ICCV, 2019.
  • M. Irani, P. Anandan, and M. Cohen, “Direct recoveryof planar-parallax from multiple frames,” PAMI, vol. 24, no. 11, pp. 1528–1534, 2002.
  • R. Garg, V. K. Bg, G. Carneiro, and I. Reid, “Unsuper-vised CNN for single view depth estimation: Geometry to the rescue,” in ECCV, pp. 740–756, 2016.
  • C. Godard, O. Mac Aodha, and G. J. Brostow, “Un-supervised monocular depth estimation with left-right consistency,” in CVPR, pp. 270–279, 2017.
  • M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer networks,” in NeurIPS, pp. 2017–2025, 2015.
  • H. Zhan, R. Garg, C. Saroj Weerasekera, K. Li, H. Agar-wal, and I. Reid, “Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction,” in CVPR, pp. 340–349, 2018.
  • R. Mahjourian, M. Wicke, and A. Angelova, “Unsuper-vised learning of depth and ego-motion from monoc- ular video using 3d geometric constraints,” in CVPR, pp. 5667–5675, 2018.
  • C. Wang, J. Miguel Buenaposada, R. Zhu, and S. Lucey, “Learning depth from monocular videos using direct methods,” in CVPR, pp. 2022–2030, 2018.
  • Z. Yin and J. Shi, “GeoNet: Unsupervised learning of dense depth, optical flow and camera pose,” in CVPR, pp. 1983–1992, 2018.
  • Y. Zou, Z. Luo, and J.-B. Huang, “DF-Net: Unsuper-vised joint learning of depth and flow using cross-task consistency,” in ECCV, pp. 36–53, 2018.
  • Y. Chen, C. Schmid, and C. Sminchisescu, “Self-supervised learning with geometric constraints in monocular video: Connecting flow, depth, and cam-era,” in ICCV, pp. 7063–7072, 2019.
  • C. Luo, Z. Yang, P. Wang, Y. Wang, W. Xu, R. Nevatia, and A. Yuille, “Every pixel counts++: Joint learning of geometry and motion with 3d holistic understanding,” PAMI, vol. 42, no. 10, pp. 2624–2641, 2019.
  • A. Ranjan, V. Jampani, L. Balles, K. Kim, D. Sun, J. Wulff, and M. J. Black, “Competitive collabora- tion: Joint unsupervised learning of depth, camera mo-tion, optical flow and motion segmentation,” in CVPR, pp. 12240–12249, 2019.
  • S. Safadoust and F. G ¨uney, “Self-supervised monocular scene decomposition and depth estimation,” in Interna- tional Conference on 3D Vision (3DV), pp. 627–636, 2021.
  • V. Guizilini, R. Ambrus, S. Pillai, A. Raventos, and A. Gaidon, “3D packing for self-supervised monocular depth estimation,” in CVPR, 2020.
  • J. Bian, Z. Li, N. Wang, H. Zhan, C. Shen, M.-M. Cheng, and I. Reid, “Unsupervised scale-consistent depth and ego-motion learning from monocular video,”in NeurIPS, pp. 35–45, 2019.
  • T. Roussel, L. V. Eycken, and T. Tuytelaars, “Monocu-lar depth estimation in new environments with absolute scale,” in IROS, pp. 1735–1741, 2019.
  • F. Bartoccioni, E. Zablocki, P. Perez, M. Cord, and K. Alahari, “Lidartouch: Monocular metric depth estimation with a few-beam lidar,” arXiv.org, vol. 2109.03569, 2021.
  • F. Xue, G. Zhuo, Z. Huang, W. Fu, Z. Wu, and M. H. Ang, “Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving appli-cations,” in IROS, pp. 2330–2337, IEEE, 2020.
  • B. Wagstaff and J. Kelly, “Self-supervised scale recov-ery for monocular depth and egomotion estimation,” in 2021 IEEE/RSJ International Conference on Intelli-gent Robots and Systems (IROS), pp. 2620–2627, IEEE, 2021.
  • M. Irani, P. Anandan, and D. Weinshall, “From refer- ence frames to reference planes: Multi-view parallax geometry and applications,” in ECCV (H. Burkhardt and B. Neumann, eds.), (Berlin, Heidelberg), pp. 829–845, Springer Berlin Heidelberg, 1998.
  • J. Wulff, L. Sevilla-Lara, and M. J. Black, “Optical flow in mostly rigid scenes,” in CVPR, pp. 4671–4680, 2017. [26] K. Chaney, A. Z. Zhu, and K. Daniilidis, “Learning event-based height from plane and parallax,” in IROS, pp. 3690–3696, 2019.
  • Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simon-celli, “Image quality assessment: from error visibility to structural similarity,” TIP, vol. 13, no. 4, pp. 600–612, 2004.
  • O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Con- volutional networks for biomedical image segmenta- tion,” in MICCAI, pp. 234–241, 2015.
  • K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, pp. 770–778, 2016.
  • O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bern- stein, et al., “ImageNet large scale visual recognition challenge,” IJCV, vol. 115, no. 3, pp. 211–252, 2015.
  • D. Eigen, C. Puhrsch, and R. Fergus, “Depth map pre-diction from a single image using a multi-scale deep network,” in NeurIPS, pp. 2366–2374, 2014.
  • A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI dataset,” IJRR, 2013.
  • A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in CVPR, 2012.
  • J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant CNNs,” in 3DV, 2017.
  • Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in ECCV, pp. 402–419, Springer, 2020.
  • Y. Zhu, K. Sapra, F. A. Reda, K. J. Shih, S. Newsam, A. Tao, and B. Catanzaro, “Improving semantic seg- mentation via video propagation and label relaxation,” in CVPR, 2019.
  • Z. Yang, P. Wang, W. Xu, L. Zhao, and R. Nevatia, “Unsupervised learning of geometry from videos with edge-aware depth-normal consistency,” in AAAI, 2018.
  • Z. Yang, P. Wang, Y. Wang, W. Xu, and R. Nevatia, “LEGO: Learning edge with geometry all at once by watching videos,” in CVPR, pp. 225–234, 2018.
  • R. Li, S. Wang, Z. Long, and D. Gu, “Undeepvo: Monocular visual odometry through unsupervised deep learning,” in ICRA, pp. 7286–7291, IEEE, 2018.
There are 38 citations in total.

Details

Primary Language English
Subjects Computer Vision and Multimedia Computation (Other), Deep Learning, Machine Vision
Journal Section Research Article
Authors

Sadra Safadoust 0000-0003-2018-0451

Fatma Guney 0000-0002-0358-983X

Publication Date December 15, 2025
Submission Date July 23, 2025
Acceptance Date September 11, 2025
Published in Issue Year 2025 Volume: 6 Issue: 2

Cite

APA Safadoust, S., & Guney, F. (n.d.). DepthP+P: Metric Accurate Monocular Depth Estimation using Planar and Parallax. Journal of Soft Computing and Artificial Intelligence, 6(2), 20-29. https://doi.org/10.55195/jscai.1748633
AMA Safadoust S, Guney F. DepthP+P: Metric Accurate Monocular Depth Estimation using Planar and Parallax. JSCAI. 6(2):20-29. doi:10.55195/jscai.1748633
Chicago Safadoust, Sadra, and Fatma Guney. “DepthP+P: Metric Accurate Monocular Depth Estimation Using Planar and Parallax”. Journal of Soft Computing and Artificial Intelligence 6, no. 2 n.d.: 20-29. https://doi.org/10.55195/jscai.1748633.
EndNote Safadoust S, Guney F DepthP+P: Metric Accurate Monocular Depth Estimation using Planar and Parallax. Journal of Soft Computing and Artificial Intelligence 6 2 20–29.
IEEE S. Safadoust and F. Guney, “DepthP+P: Metric Accurate Monocular Depth Estimation using Planar and Parallax”, JSCAI, vol. 6, no. 2, pp. 20–29, doi: 10.55195/jscai.1748633.
ISNAD Safadoust, Sadra - Guney, Fatma. “DepthP+P: Metric Accurate Monocular Depth Estimation Using Planar and Parallax”. Journal of Soft Computing and Artificial Intelligence 6/2 (n.d.), 20-29. https://doi.org/10.55195/jscai.1748633.
JAMA Safadoust S, Guney F. DepthP+P: Metric Accurate Monocular Depth Estimation using Planar and Parallax. JSCAI.;6:20–29.
MLA Safadoust, Sadra and Fatma Guney. “DepthP+P: Metric Accurate Monocular Depth Estimation Using Planar and Parallax”. Journal of Soft Computing and Artificial Intelligence, vol. 6, no. 2, pp. 20-29, doi:10.55195/jscai.1748633.
Vancouver Safadoust S, Guney F. DepthP+P: Metric Accurate Monocular Depth Estimation using Planar and Parallax. JSCAI. 6(2):20-9.


COPE Logo

Crossref Logo

DergiPark Logo

Creative Commons Logo

 2025 Journal of Soft Computing and Artificial Intelligence 

ISSN: 2717-8226 | Published Biannually (June & December)

Licensed under
CC BY-NC 4.0