Abstract
In the field of human-computer interaction, video-based gesture detection has already sparked a lot of attention. Unlike typical RGB gesture movies, RGB-D gesture videos also record the depth information corresponding to all of the pixels in each frame, potentially reducing the impact of lighting and background fluctuations. To the best of our knowledge, current RGB-D gesture video datasets primarily focus on high classification accuracy while ignoring adequate illumination and backdrop variations that occur in real life. These will stifle the advancement of gesture recognition algorithms in some way. Instead, we present DG-13, an RGB-D gesture video dataset that properly accounts for various brightness and backdrops. On the proposed DG-13 dataset, benchmark evaluations of five exemplary light-weighted 3D CNN networks are also provided. Experiment results reveal that when there are significant illumination and backdrop differences, RGB-D gesture video can successfully aid increase classification accuracy and numerous other classification metrics. The DG-13 dataset and benchmark codes are public at https://github.com/xiaooo-jian/DG-13.