Semantic-aware self-supervised depth estimation for stereo 3D detection

Published in Pattern Recognition Letters, 2023

Besides the 3D object supervision, the auxiliary disparity supervision is usually indispensable when training a stereo-based 3D object detector. The disparity supervision is either transformed from LiDAR points or generated from pre-trained models. However, the former suffers from the high cost and over-sensitivity to airborne particles of LiDAR devices, and the latter from the limited cross-dataset transferability of contemporary stereo matching models. To alleviate those problems, we propose a self-supervision framework for stereo-based 3D detection that relies on neither LiDARs nor external models. A Depth-based Self-supervision (DSelf) is proposed to unify the coordinate spaces of self-supervised losses and detection into a 3D space. However, the DSelf supervision is dense compared with the sparse LiDAR points, which introduces redundancy and irrelevancy into the stereo features. A Semantic-Aware Sampler (SASampler) is proposed to address the problems by an unbalanced sampling of foreground and background pixels. Combining our SASampler and DSelf supervision, the resultant detector (named S3D) achieves state-ofthe-art detection results without explicit disparity supervisions.

Recommended citation: Sun Hanqing, Cao Jiale, Pang Yanwei. Semantic-aware self-supervised depth estimation for stereo 3D detection. Pattern Recognition Letters, 2023, 167: 164-170.
Download Paper