Abstract
Document image dewarping aims to reconstruct the flat document image from distorted inputs. Previous methods often use geometric or text-line priors to guide the dewarping process. However, document images contain diverse contents, including figures, tables, or paragraph structures, image dewarping without considering the layout structure may fail to obtain global optimization. This paper proposes an encoder-decoder neural network, called DocTLNet, to achieve document image dewarping, which uses both 3D geometry and layout as constraints to refine the content details. To further enhance the layout details, a layout-aug loss is also proposed to explicitly guides the network to handle the distorted layout boundaries. Qualitative and quantitative experiments were conducted on DocUNet benchmark, and the results indicate that our DocTLNet is superior to related methods.