2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Download PDF

Abstract

Digitally modeling and reconstructing talking humans is important in telepresence applications of AR or VR environments. However, current methods often fail to effectively address the inability to capture local details of avatars due to resolution or image quality limitations or effectively generate realistic and natural 3D color representations. Meanwhile, invisible regions may cause the reconstruction results to appear hollow or missing. To alleviate these problems, in this paper, we propose a novel approach, called GFAvatar. Compared to existing methods, GFAvatar improves the quality of point cloud texture features by designing the fusion of image texture and 3D texture information. We achieve end-to-end learning by using an image super-resolution approach combined with our designed PointNet variant to extract detailed features from head avatars and enhance the representation of point cloud features. Our multimodal color fusion network combines image and point cloud color data, generating more precise and expressive 3D color representations for better avatar quality. We also design a texture consistency loss function to address the problem of abnormal local color in the fusion network. Further, to efficiently address the challenges posed by disordered point clouds, we carefully elaborate a 3D grids optimization to improve the integrity of facial reconstruction. Extensive experimental results on available datasets indicate the superiority in comparison with most state-of-the-arts.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Similar Articles