Abstract
Constructing a feature representation invariant to certain types of geometric and photometric transformations is of significant importance in many computer vision applications. In spite of significant effort, developing invariant feature representations remains a challenging problem. Most of the existing representations often fail to satisfy the longterm repeatability requirements of specific applications like vision-based localization, applications whose domain includes significant, non-uniform illumination and environmental changes. To these ends, we explore the use of natural image pairs (i.e. images captured of the same location but at different times) as an additional source of supervision to generate an improved feature representation for the task of vision-based localization. Specifically, we resort to training deep denoising autoencoder, with CNN feature representation of one image in the pair being treated as a noisy version of the other. The resulting system thereby learns localization features which are both discriminative and invariant to illumination and environmental changes. In experiments tailored towards vision-based localization, features generated using the proposed method produced higher matching rates than state-of-the-art image features.