2012 IEEE Symposium on Computers and Communications (ISCC)
Download PDF

Abstract

We introduce an improved text-to-sketch synthesis method using two-stage dual augmentation based on the large-scale pre-trained CLIP and CLIPDraw models. In the first stage, the input text is fed to CLIPDraw to produce text augmentation adaptively. In the second stage, attention mechanisms and structural images with lower strokes are adopted for image augmentation enhancement. Parameters of the Bezier drawing curves are optimized using global and local loss terms. Our method produces visually plausible drawings with better stroke layouts and improved drawing details. There is no need for model re-training or parameter tuning. We further utilize CLIPScore, a reference-free metric, to evaluate the matching of the generated image against the input text description. Experimental results show that the proposed method produces improved drawing sketches.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Similar Articles