Abstract
This paper proposes an efficient supervised video summarization algorithm with self-attention based encoder-decoder network. Given an input video, we implement a Bi-GRU network to encode the contextual information of the video frames using self-attention mechanism, and a GRU network as the decoder, accompanying with a regression network to predict the importance score of every video frame. Experiments and analysis are conducted on the public benchmark datasets TvSum and SumMe, the results validate the superiority of our algorithm.