Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication of the upscalers #152

Open
rom1504 opened this issue Jun 19, 2022 · 5 comments
Open

Replication of the upscalers #152

rom1504 opened this issue Jun 19, 2022 · 5 comments

Comments

@rom1504
Copy link
Collaborator

rom1504 commented Jun 19, 2022

Hey, so we got decent versions of the prior and the basic decoder now.

I think the current code is already able to train upscalers but we need more doc for it.

Let's have a upscaler.md explaining

  • What is it
  • How to prepare the dataset
  • what hyper parameters
  • command to run the training
  • expected GPU hours cost

And then train it!

We can also discuss what's the right dataset, but I figure the laion5B subset we call "laion high resolution" could do the trick (it's 170M images in 1024x1024 or bigger)

I understand only the image (and clip image EMB) is needed and no text ?

@nousr
Copy link
Collaborator

nousr commented Jun 19, 2022

Here's some relevant sections of the paper for reference while in this thread


image
image
image
image

@lucidrains
Copy link
Owner

lucidrains commented Jun 20, 2022

they are also using the BSR degradation used by Rombach et al https://github.com/CompVis/latent-diffusion/tree/e66308c7f2e64cb581c6d27ab6fbeb846828253b/ldm/modules/image_degradation https://github.com/cszn/BSRGAN/blob/main/utils/utils_blindsr.py that I don't have in the repository yet

tempted to just go with Imagen's noising procedure (on top of the blur) and call it a day (it would be a lot simpler)

@lucidrains
Copy link
Owner

ok, 0.11.0 should allow for the different noise schedules across different unets, as in the paper

after adding the BSR image degradation (or some alternative), i think i'm comfortable giving the repository a 1.0

@lucidrains
Copy link
Owner

I understand only the image (and clip image EMB) is needed and no text ?

@rom1504 yup, no text conditioning needed, i think it should all be in the image embedding!

@YUHANG-Ma
Copy link

Hi all,
I am aiming to train the decoder and upsampler. Because the decoder and upsampler have too many parameters, so I decide to train them seperately. I saw in the readme which says the upsampler and the decoder net can be trained seperately. I viewed the code, in my understanding, although I can train them seperately, I need to load the parameters of both unet 0 and unet 1 and change the unet number into 1 to train only unet 1. I don't know if I am right. If so, I couldn't train unet0 and unet 1 in two seperate machines. I am wondering how I could train the decoder net and upsamplers seperately?
Best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants