2024 2nd China Power Supply Society Electromagnetic Compatibility Conference (CPEMC)
Download PDF

Abstract

An ideal audio retrieval system identifies a short query snippet from a massive audio database with both robustness and efficiency. Unfortunately, none of the existing systems could robustly handle all distortions while keeping efficient. An efficient audio retrieval method of the systems must match the features of the fingerprint. Enhanced Sampling and Counting method (eSC), the state-of-the-art audio retrieval method, proposed for Philips-like fingerprints, has achieved both high efficiency and strong robustness, featuring time-stretch resistance. We argue that Philips fingerprint, robust to many types of distortions except speed-change which includes time-stretch and pitch-shift, combined with eSC is promising towards an ideal audio retrieval system, if we could make it robust to pitch-shift. To achieve the goal, this paper proposes a peak-point based energy bands computation method (PPEB) to enhance Philips fingerprint (PF) with resistance to pitch-shift, and the resulting fingerprint is called Peak-point based Philips fingerprint (PPF). Experimental results show that PPF can resist pitch-shift ranging from 70% to 130%, while retaining the robustness of PF to various noise distortions.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles