Abstract
Due to facial images usually evoking multiple emotions with different intensities, there exists much ambiguity in facial expression recognition (FER). Previous methods jointly optimize multi-label learning (MLL) and label distribution learning (LDL) to suppress ambiguity, which have achieved excellent performance. However, the different convergence speed of MLL and LDL make the model easily over-fitting. To address this problem, we propose a dynamic distribution supervision (D2S) method, where the label distribution information is introduced as auxiliary supervision for multilabel classification. Specifically, we develop a multi-task framework in which MLL and LDL are optimized simultaneously. The losses are dynamically weighted to overcome the inconsistency of inter-losses optimization between the two tasks. Extensive experiments on the largest benchmark dataset, i.e., RAF-ML, demonstrate the superiority of the proposed method.