Skip to content

Commit

Permalink
Merge branch 'dev' of https://github.com/kohya-ss/sd-scripts into sd-…
Browse files Browse the repository at this point in the history
…scripts-dev
  • Loading branch information
bmaltais committed Dec 3, 2023
2 parents e13993f + 46cf41c commit cb7a9a8
Show file tree
Hide file tree
Showing 11 changed files with 292 additions and 124 deletions.
24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -633,6 +633,30 @@ ControlNet-LLLite, a novel method for ControlNet with SDXL, is added. See [docum

## Change History

### Dec 3, 2023 / 2023/12/3

- `finetune\tag_images_by_wd14_tagger.py` now supports the separator other than `,` with `--caption_separator` option. Thanks to KohakuBlueleaf! PR [#913](https://github.com/kohya-ss/sd-scripts/pull/913)
- Min SNR Gamma with V-predicition (SD 2.1) is fixed. Thanks to feffy380! PR[#934](https://github.com/kohya-ss/sd-scripts/pull/934)
- See [#673](https://github.com/kohya-ss/sd-scripts/issues/673) for details.
- `--min_diff` and `--clamp_quantile` options are added to `networks/extract_lora_from_models.py`. Thanks to wkpark! PR [#936](https://github.com/kohya-ss/sd-scripts/pull/936)
- The default values are same as the previous version.
- Deep Shrink hires fix is supported in `sdxl_gen_img.py` and `gen_img_diffusers.py`.
- `--ds_timesteps_1` and `--ds_timesteps_2` options denote the timesteps of the Deep Shrink for the first and second stages.
- `--ds_depth_1` and `--ds_depth_2` options denote the depth (block index) of the Deep Shrink for the first and second stages.
- `--ds_ratio` option denotes the ratio of the Deep Shrink. `0.5` means the half of the original latent size for the Deep Shrink.
- `--dst1`, `--dst2`, `--dsd1`, `--dsd2` and `--dsr` prompt options are also available.

- `finetune\tag_images_by_wd14_tagger.py``--caption_separator` オプションでカンマ以外の区切り文字を指定できるようになりました。KohakuBlueleaf 氏に感謝します。 PR [#913](https://github.com/kohya-ss/sd-scripts/pull/913)
- V-predicition (SD 2.1) での Min SNR Gamma が修正されました。feffy380 氏に感謝します。 PR[#934](https://github.com/kohya-ss/sd-scripts/pull/934)
- 詳細は [#673](https://github.com/kohya-ss/sd-scripts/issues/673) を参照してください。
- `networks/extract_lora_from_models.py``--min_diff``--clamp_quantile` オプションが追加されました。wkpark 氏に感謝します。 PR [#936](https://github.com/kohya-ss/sd-scripts/pull/936)
- デフォルト値は前のバージョンと同じです。
- `sdxl_gen_img.py``gen_img_diffusers.py` で Deep Shrink hires fix をサポートしました。
- `--ds_timesteps_1``--ds_timesteps_2` オプションは Deep Shrink の第一段階と第二段階の timesteps を指定します。
- `--ds_depth_1``--ds_depth_2` オプションは Deep Shrink の第一段階と第二段階の深さ(ブロックの index)を指定します。
- `--ds_ratio` オプションは Deep Shrink の比率を指定します。`0.5` を指定すると Deep Shrink 適用時の latent は元のサイズの半分になります。
- `--dst1``--dst2``--dsd1``--dsd2``--dsr` プロンプトオプションも使用できます。

### Nov 5, 2023 / 2023/11/5

- `sdxl_train.py` now supports different learning rates for each Text Encoder.
Expand Down
4 changes: 4 additions & 0 deletions docs/train_README-ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -374,6 +374,10 @@ classがひとつで対象が複数の場合、正則化画像フォルダはひ

サンプル出力するステップ数またはエポック数を指定します。この数ごとにサンプル出力します。両方指定するとエポック数が優先されます。

- `--sample_at_first`

学習開始前にサンプル出力します。学習前との比較ができます。

- `--sample_prompts`

サンプル出力用プロンプトのファイルを指定します。
Expand Down
3 changes: 3 additions & 0 deletions fine_tune.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,9 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module):
accelerator.print(f"\nepoch {epoch+1}/{num_train_epochs}")
current_epoch.value = epoch + 1

# For --sample_at_first
train_util.sample_images(accelerator, args, epoch, global_step, accelerator.device, vae, tokenizer, text_encoder, unet)

for m in training_models:
m.train()

Expand Down
47 changes: 43 additions & 4 deletions library/original_unet.py
Original file line number Diff line number Diff line change
Expand Up @@ -586,6 +586,9 @@ def __init__(
self.use_memory_efficient_attention_mem_eff = False
self.use_sdpa = False

# Attention processor
self.processor = None

def set_use_memory_efficient_attention(self, xformers, mem_eff):
self.use_memory_efficient_attention_xformers = xformers
self.use_memory_efficient_attention_mem_eff = mem_eff
Expand All @@ -607,7 +610,28 @@ def reshape_batch_dim_to_heads(self, tensor):
tensor = tensor.permute(0, 2, 1, 3).reshape(batch_size // head_size, seq_len, dim * head_size)
return tensor

def forward(self, hidden_states, context=None, mask=None):
def set_processor(self):
return self.processor

def get_processor(self):
return self.processor

def forward(self, hidden_states, context=None, mask=None, **kwargs):
if self.processor is not None:
(
hidden_states,
encoder_hidden_states,
attention_mask,
) = translate_attention_names_from_diffusers(
hidden_states=hidden_states, context=context, mask=mask, **kwargs
)
return self.processor(
attn=self,
hidden_states=hidden_states,
encoder_hidden_states=context,
attention_mask=mask,
**kwargs
)
if self.use_memory_efficient_attention_xformers:
return self.forward_memory_efficient_xformers(hidden_states, context, mask)
if self.use_memory_efficient_attention_mem_eff:
Expand Down Expand Up @@ -720,6 +744,21 @@ def forward_sdpa(self, x, context=None, mask=None):
out = self.to_out[0](out)
return out

def translate_attention_names_from_diffusers(
hidden_states: torch.FloatTensor,
context: Optional[torch.FloatTensor] = None,
mask: Optional[torch.FloatTensor] = None,
# HF naming
encoder_hidden_states: Optional[torch.FloatTensor] = None,
attention_mask: Optional[torch.FloatTensor] = None
):
# translate from hugging face diffusers
context = context if context is not None else encoder_hidden_states

# translate from hugging face diffusers
mask = mask if mask is not None else attention_mask

return hidden_states, context, mask

# feedforward
class GEGLU(nn.Module):
Expand Down Expand Up @@ -1350,7 +1389,7 @@ def __init__(
self.out_channels = OUT_CHANNELS

self.sample_size = sample_size
self.prepare_config()
self.prepare_config(sample_size=sample_size)

# state_dictの書式が変わるのでmoduleの持ち方は変えられない

Expand Down Expand Up @@ -1437,8 +1476,8 @@ def __init__(
self.conv_out = nn.Conv2d(BLOCK_OUT_CHANNELS[0], OUT_CHANNELS, kernel_size=3, padding=1)

# region diffusers compatibility
def prepare_config(self):
self.config = SimpleNamespace()
def prepare_config(self, *args, **kwargs):
self.config = SimpleNamespace(**kwargs)

@property
def dtype(self) -> torch.dtype:
Expand Down
Loading

0 comments on commit cb7a9a8

Please sign in to comment.