Abstract
We present an approach for image retrieval using a very large number of highly selective features and efficient on-line learning. Our approach is predicated on the assumption that each image is generated by a sparse set of visual “causes” and that images which are visually similar share causes. We propose a mechanism for computing a very large number of highly selective features, which capture some aspects of this causal structure (in our implementation there are over 45,000 highly selective features). At query time, a user selects a few example images, and a technique known as “boosting” is used to learn a classification function in this feature space. By construction, the boosting procedure learns a simple classifier, which only relies on 20 of the features. As a result, a very large database of images can be scanned rapidly, perhaps a million images per second. Finally, we will describe a set of experiments performed using our retrieval system on a database of 3000 images.