2022 26th International Conference on Pattern Recognition (ICPR)
Download PDF

Abstract

Recent advances in object detection tasks enable the detection network to predict 3D objects from a monocular image, but the performance of monocular 3D object detectors is inferior due to the depth information lost in the image. Most monocular 3D detectors do not utilize sequential information from multi-frame images, even though the object’s temporal motion is very informative for 3D object detection. In this paper, we propose a sequential image-based 3D object detection architecture that focuses on improving the localization performance of 3D detectors using temporal information for autonomous driving applications. To this end, the proposed network is trained with a pair of sequential images to predict 3D objects with their localization uncertainties on each image. Afterward, the object detected from sequential images is associated, and paired object features are fed to the sub-network to predict the depth displacement between frames. Finally, paired objects and their predicted depths and depth displacement are refined to minimize residuals between predictions and output the final 3D location of objects. The experimental results on challenging the nuScenes dataset demonstrate that our method improves the performance of the 3D detector by reducing the localization error.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles