Abstract
Nonverbal behaviors and their co-occurring speech interplay in a nontrivial way to communicate a message. These complex relationships have to be carefully considered in designing intelligent virtual agents (IVAs) displaying believable behaviors. An important aspect that regulates the relationship between gesture and speech is the underlying discourse function of the message. This paper introduces the MSP-AVATAR data, a new multimedia corpus designed to explore the relationship between discourse functions, speech and nonverbal behaviors. This corpus comprises motion capture data (upper-body skeleton and facial motion), frontal-view videos, and high quality audio from four actors engaged in dyadic interactions. Actors performed improvisation scenarios, where each recording is carefully designed to dominate the elicitation of characteristics gestures associated with a specific discourse function. Since detailed information from the face and the body is available, this corpus is suitable for rule-based and speech-based generation of body, hand and facial behaviors for IVAs. This study describes the design, recording, and annotation of this valuable corpus. It also provides analysis of the gestures observed in the recordings.