Abstract
Viewpoint estimation in 3D scene is a cutting-edge subject, which has a strong relationship with 3D recognition, scene understanding, multi-view 3D modeling of objects or scenes, etc. The existing techniques of coding space invariance in deep convolution neural networks only model the 2D transform field, which does not take into account the fact that objects in 2D field are projections of 3D ones. Calculating a high quality 3D viewpoint in scenes will help us get a more comprehensive analysis of 3D field. However, the sample size of the natural image data set with viewpoint annotations is too small, and each kind of target is trained independently by deep learning method, so the parameters of each category model are not universal. In this paper, we proposed a novel framework that joints object detection and viewpoint estimation of image object based on parameter sharing. We control the rendering parameters to expand the number of effective sample images, use the Kernel Density Estimation function, to get close to the viewpoint of each kind of target instance in the natural scene. We optimize the loss function of the viewpoint estimation task, and designs an image target viewpoint estimation method based on parameters sharing, which reduces the network redundant parameters and speeds up the iteration speed. Our experiments on PASCAL 3D + data set demonstrate the effectiveness of our pipeline on joint object detection and viewpoint estimation.