Abstract
People counting is one of the key components in video surveillance applications, however, due to occlusion, illumination, color and texture variation, the problem is far from being solved. Different from traditional visible camera based systems, we construct a novel system that uses vertical Kinect sensor for people counting, where the depth information is used to remove the affect of the appearance variation. Since the head is always closer to the Kinect sensor than other parts of the body, people counting task equals to find the suitable local minimum regions. According to the particularity of the depth map, we propose a novel unsupervised water filling method that can find these regions with the property of robustness, locality and scale-invariance. Experimental comparisons with mean shift and random forest on two databases validate the superiority of our water filling algorithm in people counting.