2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)
Download PDF

Abstract

This paper proposes a supervised deep hashing approach for highly efficient and effective cover song detection. Our system consists of two identical sub-neural networks, each one having a hash layer to learn a binary representations of input audio in the form of spectral features. A loss function joins the two outputs of the sub-networks by minimizing the Hamming distance for a pair of audio files covering the same music work. We further enhance system performance by loudness embedding, beat synchronization, and early fusion of input audio features. The output of 128-bit hash reaches state-of-the-art performance with mean pairwise accuracy. This system demonstrates the possibility of memory-efficient and real-time efficient cover song detection with satisfiable accuracy in large scale.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles