Sketches, which are simple and concise, have been used in recent deep image synthesis methods to allow intuitive generation and editing of facial images. However, it is nontrivial to extend such methods to video editing due to various challenges, ranging from appropriate manipulation propagation and fusion of multiple editing operations to ensure temporal coherence and visual quality. To address these issues, we propose a novel sketch-based facial video editing framework, in which we represent editing manipulations in latent space and propose specific propagation and fusion modules to generate high-quality video editing results based on StyleGAN3. Specifically, we first design an optimization approach to represent sketch editing manipulations by editing vectors, which are propagated to the whole video sequence using a proper strategy to cope with different editing needs. Specifically, input editing operations are classified into two categories: temporally consistent editing and temporally variant editing. The former (e.g., change of face shape) is applied to the whole video sequence directly, while the latter (e.g., change of facial expression or dynamics) is propagated with the guidance of expression or only affects adjacent frames in a given time window. Since users often perform different editing operations in multiple frames, we further present a region-aware fusion approach to fuse diverse editing effects. Our method supports video editing on facial structure and expression movement by sketch, which cannot be achieved by previous works. Both qualitative and quantitative evaluations show the superior editing ability of our system to existing and alternative solutions.
- System
- Ubuntu 16.04 or later
- NVIDIA GPU RTX 3090 + CUDA 11.1 + cudnn 8.0.4
- Software
- Python 3.8
- Jittor. More details in Jittor
- Packages. Note: cupy-cuda111 is for CUDA 11.1.
pip install -r requirements.txt
- (Optional) If get a 'cutt' error, please disable 'cutt'.
export use_cutt=0
- Download the lpips models from [LPIPS-jittor].
Put the weights into
./lpips/weights/
- Download the face parsing models from [Face-parsing-jittor].
Put the weights into
./modules/face_parsing_jittor/checkpoints/
- Download the first-order models from [First-order-jittor].
Put the weights into
./modules/first_order/weights/
- Download the dlib alignment models from [dlib].
Unzip and put the weights into
./weights/
- Download the face 3D recon models from [Face-recon-jittor].
Put the weights into
./modules/face_recon_jittor/checkpoints/
. Basel Face Model 2009 (BFM09) and the Expression Basis (Exp_Pca.bin) should also be downloaded and put into./modules/face_recon_jittor/BFM/
, organized as in [Face-recon-jittor] - Download the StyleGAN3 models [StyleGAN3-jittor], E4E models for StyleGAN3 [E4E-jittor] and DeepFaceVideoEditing weights [Google Drive]. Put the weights into
./weights/
Download examples from [Google Drive]. This Link contains 10 examples which are used in our user study and can be used for further comparsions. Unzip it and put the video directories in ./video_editings/
.
For each video example, the original video and editing operations are organized as the following structure:
video_editings
│
└─── example2
└─── XXX.mp4
└─── edit
└─── baseShape
| └─── edit1
| │ └─── img.jpg
| │ └─── sketch_edit.jpg
| │ └─── mask_edit.jpg
| └─── edit2
| └─── ...
└─── window
└─── exp
|
└─── ...
Edit example2 with 3 editing operations. Results are saved as merge.mp4:
./video_editings/example2/running_script/edit_video.sh
Other examples can be edited in the similar way by replacing the 'example2' with other directory.
The following describes the details of video editing.
Modify the ./configs/paths_config.py
: change the input_video_path
to video example directory and video_name
to the name of input video. (Default settings are for example2)
Extract and align all frames from input video.
python video_align.py
The generated aligned frames will be in align_frames
directory for each example.
In order to recontruct input video, use PTI method to finetune StyleGAN3 generator.
python run_pti_stylegan3.py
PTI weights will be generated in example directory and pti results for 1st frame will be generated in pti_results
directory.
Generate sketch optimization results for single frame:
python run_sketch.py --inversion_edit_path XXX
Option --inversion_edit_path
is the sketch editing directory which contains image, sketch and mask.
The edit frame is named img.jpg
, drawn sketch is named sketch_edit.jpg
and drawn mask is named mask_edit.jpg
.
Note:
-- The sketch weights and RGB weights could be tuned to generate the best results.
-- For each editing operations, this script should be run with different --inversion_edit_path
. For example, this script should be run 3 times if 3 editing operations are applied for a single video.
Before propagating the editing effect, the editing vectors should be generated using the above script.
Modify the ./configs/paths_config.py
, corresponding to 3 directories in ./video_editings/exampleXXX/edit/
.
- BaseShape editing: Set operation directories in
shapePath_list
- Time window editing: Set operation directories in
windowPath_list
- Expression Guidance editing: Set operation directories in
expPath_list
The time window parameters are set in ./configs/hyperparameters.py
.
Then, generate the propagation results:
python run_editing.py
Edited frames will be saved in edit/edit_video
directory.
Merge the face regions and realign generated frames into original frames.
python video_merge.py
Merged frames will be saved in merge_images
directory. Final edited videos will be generated named as merged.mp4
.
If you found this code useful please cite our work as:
@article {DeepFaceVideoEditing2022,
author = {Liu, Feng-Lin and Chen, Shu-Yu and Lai, Yu-Kun and Li, Chunpeng and Jiang, Yue-Ren and Fu, Hongbo and Gao, Lin},
title = {{DeepFaceVideoEditing}: Sketch-based Deep Editing of Face Videos},
journal = {ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH 2022)},
year = {2022},
volume = 41,
pages = {167:1--167:16},
number = 4
}