Abstract
In this paper we propose the first method known to the authors that successfully differentiates spontaneous from posed facial expressions using a realistic training corpus. We propose a new spatiotemporal local texture descriptor (CLBP-TOP) that outperforms other descriptors. We demonstrate that our temporal interpolation and visual/near-infrared fusion methods improve the differentiation performance. Finally, we propose a new generic facial expression recognition framework that subdivides the facial expression recognition problem into a cascade of smaller tasks that are simpler to tackle. The system is the first to differentiate spontaneous from posed facial expressions with a realistic corpus and achieves promising results.

