Abstract
Visual information on the web, in particular in form of images, is increasing at a rapid rate. Consequently, efficient and effective techniques to retrieve visual information are sought after, especially since users rarely annotate images. In this paper, we present a very fast method for content-based image retrieval of JPEG compressed images. Our method works directly in the compressed domain of JPEG and is based solely on information available in the image header. We make use of the fact that the Huffman tables that JPEG uses for entropy coding can be optimised to maximise compression. Since this process adapts the Huffman tables to the image content, we utilise the tables directly, in particular the prefix code lengths of AC luminance and chrominance table entries, as image features, and employ the L_1 distance between the code length vectors as similarity measure. We evaluate our method on benchmark databases of varying sizes up to in excess of 1 million images, and show that our approach provides good retrieval performance, while providing a more than 30-fold speedup compared to JPEG compressed domain algorithms and more than 150-fold compared to common pixel domain techniques for online image retrieval.