Abstract
In this study we present robust solution for estimating 3D pose and shape of human targets from multiple, synchronized video streams. The objective is to automatically estimate physical attributes of the targets that would allow us to analyze its behavior non-intrusively. Proposed system estimates the anthropometric skeleton, pose and shape of the human target from the 3D visual hull reconstructed from multiple silhouettes of the target. Discriminative (bottom-up) method is used to first initialize 3D pose of the targets using low-level features extracted from the 2D image. The pose is refined using generative (top-down) method that also estimates the optimal skeleton of the target using anthropometric prior models learned from the CAESAR dataset. Statistical shape models are also learned from the CAESAR dataset and are used to model both global and local shape variability of human body parts. We also propose a novel optimization scheme to fit 3D shape by searching in the parametric space of local parts model and constraining the overall shape using a global shape model. The system provides a useful framework for automatically identifying dispropotionate body parts, estimating size of backpacks and inferring attributes like gender, age and ethnicity of the human target.