Abstract
Human arm and body gestures have long been known to hold significance in communication, especially with respect to teaching. We gather ground truth annotations of gesture appearance using a 27-bit pose vector. We manually annotate and analyze the gestures of two instructors, each in a 75-minute computer science lecture recorded to digital video, finding 866 gestures and identifying 126 fine equivalence classes which could be further clustered into 9 semantic classes. We observe these classes encompassing “pedagogical” gestures of punctuation and encouragement, as well as traditional classes such as deictic and metaphoric. We note that gestures appear to be both highly idiosyncratic and highly repetitive. We introduce a tool to facilitate the manual annotation of gestures in video, and present initial results on their frequencies and co-occurrences; in particular, we find that pointing (deictic) and “spreading” (pedagogical) predominate, and that 5 poses represent 80% of the variation in the annotated ground truth.