From left to right: baseline, comparison method 1, this project's method, comparison method 2
SEGAN
is a project aims to control semantic attributes of results generated by StyleGAN2 through modified it's latent space. In their paper, they discussed the impact of latent space on the results of human face images, lower layers affect more general semantic attributes, such as gender, skin color, and higher layers affect in details, such as smiles, hairstyles, etc. It is worth noting that the latent space here is not equal to the space of the initial noise sampled in the original GAN, we will discuss it more in next section. They used this discovery to do "StyleMix", it can mix styles from different seed images to cantrol the semantic attributes of new generated images. But this control is not precise, it is difficult to decouple semantic attributes and control them independently in this way. Therefore, SEGAN introduces a linear subspace to locate interpretable and controllable dimensions from vectors from latent space.
StyleGAN has a intermediate latent space
The idea of using linear subspace comes from EigenGAN, showed that linear subspace can find interpretable and controllable dimensions from different generator layers. They succeeded on the original GAN, so the questions here is how to apply this method in StyleGAN2 which have a quite different structure. The network architecture will be discussed in later section.
Refer to this note.
- resolution: 64
- learning rate: 0.002
- batch size: 16
- dimension of latent vecotr: 64
- r1 weight: 10
- regularzation interval of discriminator: 16
- regularzation interval of generator: 4
Directly adding a linear subspace in each block and add the output to the feature map as in the original GAN does not fully apply to the case of StyleGAN2. This would make the network too complex, and more importantly we want to modify semantic attributes by modifying --mode=2
to switch to this model.
The dataset can be found here, It's a very small subset of the danbooru dataset.
The baseline is original StyleGAN2, use --mode=0
to switch to it. There are still two modes as comparison, --mode=1
and --mode=3
, their model structure showed in following figures.
Some samples generated by mode 0 to 3, left to right.
The red curve's (mode=2) log have some trouble, need to retrain it. TODO
Model | FID |
---|---|
origin (0) | 133.75 |
1 | 139.33 |
this project's (2) | 123.79 |
2 | 163.29 |
Model | PPL( |
---|---|
origin (0) | 821.70 |
1 | 816.25 |
this project's (2) | 818.62 |
2 | 824.03 |
Some samples generated by mode 0 to 3, left to right. But middle images are generated by interpolating the top and bottom latent vectors. It can be seen that except for the second to last image in the last column, they basically follow certain linear rules. !()[/pictures/figure5.png]
Above figure shows control semantic attributes of generated results by modified their latent vector in specific layer and dimensions which found by linear space. Where L means layer, N means number of linear subspace, D means dimension.
It can be seen that --mode=2
has the most balanced performance, so this project uses this structure as default.
- Here's no regularzation in linear space so actually the implementation of linear space in this project is different with EigenGAN, have to fix it fulture.TODO
- Better to train in bigger dataset.