ER-Depth: Enhancing the Robustness of Self-Supervised Monocular Depth Estimation in Challenging Scenes

University of Science and Technology of China
*Indicates Equal Contribution
MY ALT TEXT

The first-stage training framework of EC-Depth. In the first stage, we construct an image triplet with one standard sample and two augmented challenging samples. For the standard sample, we exploit the photometric loss to provide supervision signals. Then, the perturbation-invariant depth consistency loss is applied to constrain the consistency of depth predictions under different perturbations for reliable supervision on challenging samples.

MY ALT TEXT

The second-stage training framework of EC-Depth. In the second stage, we leverage the Mean Teacher paradigm to generate pseudo-labels for self-distillation. To enhance the overall quality of the pseudo-labels, we propose a depth consistency-based filter (DC-Filter) to select pseudo-labels robust against various perturbations and a geometric consistency-based filter (GC-Filter) to select pseudo-labels accurate and reliable enough.

Abstract

Self-supervised monocular depth estimation holds significant importance in the fields of autonomous driving and robotics. However, existing methods are typically trained and evaluated on clear, sunny datasets, overlooking the impact of various adverse conditions commonly encountered in real-world applications, such as rainy weather, low visibility, and motion blur. As a result, they often struggle in challenging scenarios and produce artifacts. To address this issue, we propose ER-Depth, a novel two-stage self-supervised framework designed for robust depth estimation. In the first stage, we propose perturbation-invariant depth consistency regularization to propagate reliable supervision from standard to challenging scenes. In the second stage, we adopt the Mean Teacher paradigm for self-distillation and present a novel consistency-based pseudo-label filtering strategy to improve the quality of pseudo-labels. Extensive experiments demonstrate that our method exhibits exceptional robustness in challenging scenarios while maintaining high performance in standard scenes, significantly outperforming existing state-of-the-art methods on challenging KITTI-C, DrivingStereo, and NuScenes-Night benchmarks. Project page: https://ruijiezhu94.github.io/ERDepth_page. \end{abstract}

BibTeX

@article{zhu2023ecdepth,
  title={EC-Depth: Exploring the consistency of self-supervised monocular depth estimation in challenging scenes},
  author={Song, Ziyang and Zhu, Ruijie and Wang, Chuxin and Jiacheng Deng and He, Jianfeng and Zhang, Tianzhu},
  journal={arXiv preprint arXiv:2310.08044},
  year={2023}
}