Applications of Computer Vision, IEEE Workshop on
Download PDF

Abstract

Images for 3D mapping are always recorded in such a way that relevant scene parts are seen from multiple viewpoints, so as to facilitate camera orientation and 3D point triangulation. Beyond geometric reconstruction, automatic mapping also requires the semantic interpretation of the image content, and for that task the redundancy provided by overlapping images has been exploited much less. Here we address the task of learning a classifier for pixel-wise semantic labeling of the observed scene. The main insight is that the mere fact that two regions in different images depict the same 3D scene point yields a constraint which can be exploited in the learning phase, namely that they should receive the same class label, even if it is not known which one. In analogy to geometric “tie points” — image correspondences with a priori unknown 3D coordinates, which nevertheless constrain camera orientation — we call these correspondences “semantic tie points”. We show how to integrate this weaker form of supervision, which is readily available in any multi-view dataset, into a random forest classifier, and demonstrate improved classification performance of the resulting classifier in an aerial dataset.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles