Abstract
Video scenes often have cluttered backgrounds and numerous stationary and moving objects. Detecting the occurrence of anomalous events in such scenarios is very challenging. Traditional approaches in the field employ hand-designed features to identify anomalies. This method is not competent in learning the temporal-spatial knowledge of the video. In this paper, we propose a new model for Video Anomaly Detection (VadTR) with Transformer, which utilizes Transformer to automatically learn video representations to extract features from both spatial-temporal dimensions. Concurrently, we use the reconstruction loss method to achieve video anomaly detection. The comparisons on public datasets show that our method has leading performance.