2024 IEEE International Conference on Multimedia and Expo (ICME)
Download PDF

Abstract

The increasing availability of depth sensors has facilitated the acquisition of depth images, thereby driving advancements in RGBD tracking. However, compared to RGB benchmarks, deficient data hampers the sufficient learning of RGBD trackers. In this paper, instead of developing a new RGBD tracker from scratch, we aim to learn a depth-fused refinement module that enables existing RGB trackers to adapt to RGBD scenes. Specifically, we introduce a compact yet effective module, named DepthRefiner (DR), based on multi-head self-attention, a simple bimodal fusion technique, and the center-based head. This approach leverages the learned prior representations of RGB trackers from large-scale RGB data and can be flexibly integrated into various off-the-shelf trackers without modifying original pipelines. Comprehensive experiments on CDTB, DepthTrack, VOT-RGBD2022, and RGBD1K benchmarks with multiple base trackers validate that our approach significantly improves the base tracker’s performance while adding minimal computational overhead.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles