Abstract
Existing approaches to parsing images of objects featuring complex, non-hierarchical structure rely on exploration of a large search space combining the structure of the object and positions of its parts. The latter task requires randomized or greedy algorithms that do not produce repeatable results or strongly depend on the initial solution. To address the problem we propose to model and optimize the structure of the object and position of its parts separately. We encode the possible object structures in a graph grammar. Then, for a given structure, the positions of the parts are inferred using standard MAP-MRF techniques. This way we limit the application of the less reliable greedy or randomized optimization algorithm to structure inference. We apply our method to parsing images of building facades. The results of our experiments compare favorably to the state of the art.