PyTorch implementation of An Adaptive Framework for Learning Unsupervised Depth Completion
Project AdaFrame: Ada(ptive) Frame(work) for Depth Completion
Published in RA-L January 2021 and ICRA 2021
Model have been tested on Ubuntu 16.04, 20.04 using Python 3.5, 3.6, PyTorch 1.2.0
Authors: Alex Wong, Xiaohan Fei
If this work is useful to you, please cite our paper:
@article{wong2021adaptive,
title={An Adaptive Framework for Learning Unsupervised Depth Completion},
author={Wong, Alex and Fei, Xiaohan and Hong, Byung-Woo and Soatto, Stefano},
journal={IEEE Robotics and Automation Letters},
volume={6},
number={2},
pages={3120--3127},
year={2021},
publisher={IEEE}
}
Table of Contents
In the sparse-to-dense depth completion problem, we seek to infer the dense depth map of a 3-D scene using an RGB image and its associated sparse depth measurements in the form of a sparse depth map, obtained either from computational methods such as SfM (Strcuture-from-Motion) or active sensors such as lidar or structured light sensors.
RGB image from the VOID dataset | Our densified depth map -- colored and backprojected to 3D |
---|---|
RGB image from the KITTI dataset | Our densified depth map -- colored and backprojected to 3D |
---|---|
To follow the literature and benchmarks for this task, you may visit: Awesome State of Depth Completion
A number of computer vision problems can be formulated as an energy function which consists of the linear combination of a data fidelity (fitness to data) term and a regularizer (bias or prior). The data fidelity is weighted uniformly by a scalar α and the regularizer by γ that determine their relative significance.
However, uniform static α does not account for visibility phenomenon (occlusions) and uniform static γ does may impose too much or too little regularization. We propose an adaptive framework (α and γ) that consists of weighting schemes that vary spatially (image domain) and temporally (over training time) based on the residual or fitness of model to data.
α starts by weighting all pixel locations approximately uniformly and gradually downweights regions with high residual over time. α is conditioned on the mean or global residual, as the model become better fitted to the data, we become more confident that the high residual regions be results of occlusions yielding a sharper curve over time. Here is a visualization of α:
γ starts by imposing a low degree of regularization and only increase regularization where appropriate based on the fitness of the model to the data. Here is a visualization of γ:
We note that alpha and gamma are complementary. Ill-posed regions such as occlusions cannot be uniquely determined by the data and hence we need regularization. So, gamma increases regularization around ill-posed regions allowing neighboring point estimates that fit the data well to fill them in the gaps.
We apologize for the delay, we want to release the PyTorch implementation in conjunction to an update to VOICED.
You may also find the following projects useful:
- ScaffNet and FusionNet: Learning Topology from Synthetic Data for Unsupervised Depth Completion. An unsupervised sparse-to-dense depth completion method that learns to map sparse geometry to dense topology from synthetic data and refines the initial estimate with real image. This work is published in the Robotics and Automation Letters (RA-L) 2021 and the International Conference on Robotics and Automation (ICRA) 2021.
- VOICED: Unsupervised Depth Completion from Visual Inertial Odometry. An unsupervised sparse-to-dense depth completion method, developed by the authors. The paper introduces Scaffolding for depth completion and a light-weight network to refine it. This work is published in the Robotics and Automation Letters (RA-L) 2020 and the International Conference on Robotics and Automation (ICRA) 2020.
- VOID: from Unsupervised Depth Completion from Visual Inertial Odometry. A dataset, developed by the authors, containing indoor and outdoor scenes with non-trivial 6 degrees of freedom. The dataset is published along with this work in the Robotics and Automation Letters (RA-L) 2020 and the International Conference on Robotics and Automation (ICRA) 2020.
- XIVO: The Visual-Inertial Odometry system developed at UCLA Vision Lab. This work is built on top of XIVO. The VOID dataset used by this work also leverages XIVO to obtain sparse points and camera poses.
- GeoSup: Geo-Supervised Visual Depth Prediction. A single image depth prediction method developed by the authors, published in the Robotics and Automation Letters (RA-L) 2019 and the International Conference on Robotics and Automation (ICRA) 2019. This work was awarded Best Paper in Robot Vision at ICRA 2019.
- AdaReg: Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction. A single image depth prediction method that introduces adaptive regularization. This work was published in the proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) 2019.
We also have works in adversarial attacks on depth estimation methods:
- Stereopagnosia: Stereopagnosia: Fooling Stereo Networks with Adversarial Perturbations. Adversarial perturbations for stereo depth estimation, published in the Proceedings of AAAI Conference on Artificial Intelligence (AAAI) 2021.
- Targeted Attacks for Monodepth: Targeted Adversarial Perturbations for Monocular Depth Prediction. Targeted adversarial perturbations attacks for monocular depth estimation, published in the proceedings of Neural Information Processing Systems (NeurIPS) 2020.
This software is property of the UC Regents, and is provided free of charge for research purposes only. It comes with no warranties, expressed or implied, according to these terms and conditions. For commercial use, please contact UCLA TDG.