Abstract
Background Modelling is a crucial step in background/foreground detection which could be used in video analysis, such as surveillance, people counting, face detection and pose estimation. Most methods need to choose the hyper parameters manually or use ground truth background masks (GT). In this work, we present an unsupervised deep background (BG) modelling method called BM-Unet which is based on a generative architecture that given a certain frame as input it generates as output the corresponding background image — to be more precise, a probabilistic heat map of the colour values. Our method learns parameters automatically and an augmented version of it that utilises colour, intensity differences and optical flow between a reference and a target frame is robust to rapid illumination changes and camera jitter. Besides, it can be used on a new video sequence without the need of ground truth background/foreground masks for training. Experiment evaluations on challenging sequences in SBMnet data set demonstrate promising results over state-of-the-art methods.