Sketch Video Synthesis



University of Saarland Tencent AI Lab Tencent AI Lab University of Macau
*Indicates Corresponding Author

Video Sketching

Bear

Break Dance

Dance Jump

Car Turn

Surf

Hockey

Abstract

Understanding semantic intricacies and high-level concepts is essential in image sketch generation, and this challenge becomes even more formidable when applied to the domain of videos. To address this, we propose a novel optimization-based framework for sketching videos represented by the frame-wise Bézier Curves. In detail, we first propose a cross-frame stroke initialization approach to warm up the location and the width of each curve. Then, we optimize the locations of these curves by utilizing a semantic loss based on CLIP features and a newly designed consistency loss using the self-decomposed 2D atlas network. Built upon these design elements, the resulting sketch video showcases impressive visual abstraction and temporal coherence. Furthermore, by transforming a video into SVG lines through the sketching process, our method unlocks applications in sketch-based video editing and video doodling, enabled through video composition, as exemplified in the teaser.

Pipeline

Left, Layered Atlas: Firstly, we train a layered atlas to decompose the video into separated 2D images (foreground/background atlas). The atlas maps video 3D coordinates into 2D coordinates, where the same image coordinates have the same color.

Right, Video Sketching Optimization: Afterward, we propose novel initialization methods to utilize the mapping network Mf to generate proper sketches cross-video with correspondence and optimize the location of the generated sketches, maintaining temporal coherence and semantic alignment by introducing novel consistency and semantic loss.

Video Editing based on Sketching

Snowboard

Break Dance

Dance Jump

Car Turn

Mallard

Scooter

Comparsions

       Original                Ours                  Clipasso                 Canny                  Hed

mallard

       Original                Ours                  Clipasso                 Canny                  Hed

Snowboard

       Original                Ours                  Clipasso                 Canny                  Hed

Soapbox

       Original                Ours                  Clipasso                 Canny                  Hed

Scooter

       Original                Ours                  Clipasso                 Canny                  Hed

Dance Jump

       Original                Ours                  Clipasso                 Canny                  Hed

Stunt

Given an input video (with the foreground object), we sketch the video using Bézier Curves so that the video can be represented by scalable vector graphics (SVG). The flexibility of SVG enables various rendering techniques, including resizing, color filling, and overlaying doodles on the original background images.

BibTeX


      @article{zheng2023sketch,
        title={Sketch Video Synthesis}, 
        author={Yudian Zheng and Xiaodong Cun and Menghan Xia and Chi-Man Pun},
        year={2023},
        eprint={2311.15306},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
      }