Enhanced Speech Emotion Recognition Incorporating Speaker-Sensitive Interactions in Conversations

Jiachen Luo; Huy Phan; Lin Wang; Joshua Reiss

doi:10.1109/ICME57554.2024.10687895

2024 IEEE International Conference on Multimedia and Expo (ICME)

Enhanced Speech Emotion Recognition Incorporating Speaker-Sensitive Interactions in Conversations

Year: 2024, Pages: 1-6

DOI Bookmark: 10.1109/ICME57554.2024.10687895

Authors

Jiachen Luo, Queen Mary University of London,Centre for Digital Music,UK
Huy Phan, Amazon Alexa,Cambridge,MA,USA
Lin Wang, Queen Mary University of London,Centre for Digital Music,UK
Joshua Reiss, Queen Mary University of London,Centre for Digital Music,UK

Abstract

Accurately detecting emotions in conversation is a necessary yet challenging task due to the complexity of emotions and dynamics in dialogues. The emotional state of a speaker can be influenced by many different factors, such as interlocutor stimulus, dialogue scene, and topic. In this work, we propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions. First, we use a pretrained WavLM model to extract frame-based audio representation in individual utterances. Second, an attentive bi-directional gated recurrent unit (GRU) models contextual-sensitive information and explores listener dependency and speaker influence jointly in a simple, fast, parameter-efficient way. The experiments conducted on the standard conversational dataset MELD demonstrate the effectiveness of the proposed method when compared against state-of the-art methods.

Like what you’re reading?

Already a member?

Get this article FREE with a new membership!

Predicting speaker recognition reliability by considering emotional content
2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII)
Speaker Segmentation and Adaptation for Speech Recognition on Multiple-Speaker Audio Conference Data
2007 International Conference on Multimedia & Expo
Statistical Evaluation of Speech Features for Emotion Recognition
2009 Fourth International Conference on Digital Telecommunications
Exploring Cross-Modality Affective Reactions for Audiovisual Emotion Recognition
IEEE Transactions on Affective Computing
Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models
IEEE Transactions on Affective Computing
Fine-Grained Early Frequency Attention for Deep Speaker Representation Learning
IEEE Transactions on Artificial Intelligence
Emotion Flip Reasoning in Multiparty Conversations
IEEE Transactions on Artificial Intelligence
Mutual Cross-Attention in Dyadic Fusion Networks for Audio-Video Emotion Recognition
2023 11th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)
AMuSE: Adaptive Multimodal Analysis for Speaker Emotion Recognition in Group Conversations
2023 IEEE Ninth Multimedia Big Data (BigMM)
Emotional Speaker Recognition based on Machine and Deep Learning
2020 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC)

Enhanced Speech Emotion Recognition Incorporating Speaker-Sensitive Interactions in Conversations

Authors

Abstract

Related Articles