Abstract
We propose a 3D action recognition algorithm which uses depth-based Gradient Local Auto-Correlations (GLAC) feature and Locality-constrained Affine Subspace Coding (LASC) to improve the discriminative ability of human actions in spatio-temporal subsequences of 3D depth videos. First, each entire depth video sequence is divided automatically into a set of subsequences (i.e., multi-scale sub-actions) by the normalized motion energy vector. Next Depth Motion Maps (DMMs) based GLAC features are employed to capture the shape information and motion cues of each sub-action. In order to obtain a more compact and discriminative representation, LASC is then proposed to encode the features extracted from the depth video. We show that the use of LASC exhibits better performance compared to existing methods such as Locality-constrained Linear Coding (LLC). On all three datasets we obtain competitive results compared to fifteen methods, while using fewer features and less complex models.