Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] UltraHDFix Pipeline (method from imaginAIry) #12

Open
iwr-redmond opened this issue Dec 5, 2024 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@iwr-redmond
Copy link

iwr-redmond commented Dec 5, 2024

Version 14 of Bryce Drennan's imaginAIry package conceived of a very helpful way to abstract away the details of going from SD15 resolution to full hires images using ControlNet:

  1. Define resolutions for easy retrieval by the user (utils/named_resolutions.py)
  2. Generate a low-resolution "composition" image (api/generate_refiners.py#L218)
  3. Use the Tile ControlNet to turn this low-resolution starting point into the final image (api/generate_refiners.py#L260). (Note that in ImaginAIry, 'details' is an alias for control_v11f1e_sd15_tile (see config.py#L345).

Looking at constants.py in the current dev branch, you have already set up Tile-capable ControlNets for all three supported architectures. This would allow stablepy to replicate this user-friendly upscaling method easily.

I propose a new pipeline ultrahdfix (or similar) to do as follows:

  1. Calculate the correct minimum resolution for the relevant model - 512px for SD15, 1024px for SDXL and FLUX - and generate a correctly scaled image per the user request, e.g. FHD [1920x1080] = 512x288
  2. Do the initial generation, optionally with detailfix (TileCN does well with details when upscaling, at least in SD15)
  3. Use ESRGAN to do the initial upscale for the original composition image, allowing the ControlNet model to focus entirely on detailing. (This may need to be done multiple times if the scale factor is greater than 4x; in the example it is 3.75).
  4. Send the upscaled PIL composition image, prompts (SD) / prompt (FLUX), Loras (SD/FLUX) and embeddings (SD) to the relevant ControlNet for the model architecture.
  5. Crop the final PIL image to exactly match the correct resolution if required (ESRGAN can sometimes be a few pixels out).
  6. Provide the final PIL image to the user, with an option to retrieve the composition image for comparison.

As in the source program, the proposed helper pipeline would chain these preexisting steps together to make model-agnostic ultra high-resolution image generation as simple as possible for the user.

@iwr-redmond
Copy link
Author

iwr-redmond commented Dec 5, 2024

I wrote this demo code a while ago as I was experimenting. It's good enough for government work.

# do some math
comp_scale_width = gen_width / gen_native
comp_scale_height = gen_height / gen_native

# get the maximum compression ratio needed for the composition image
if comp_scale_width > comp_scale_height:
    comp_scale=comp_scale_width
else:
    comp_scale=comp_scale_height

# calculate the initial composition image size
comp_width_float = gen_width / comp_scale
comp_width = round(comp_width_float)
comp_height_float = gen_height / comp_scale
comp_height = round(comp_height_float)
print('UltraHD: Initial image size is ', comp_width, 'x', comp_height)

# get the upscale variable
if comp_scale > 4:
    comp_upscale = 4
else:
    comp_upscale=comp_scale

print('UltraHD: Initial upscale factor is', comp_upscale)

# step 1:
model = Model_Diffusers(
    base_model_id=gen_checkpoint,
    task_name='txt2img',
    retain_task_model_in_cache = False,
    vae_model = gen_vae,
)

comp_image, path_image = model(
    prompt=gen_prompt,
    negative_prompt=gen_negative,
    img_width = comp_width,
    img_height = comp_height,
    num_steps = gen_steps,
    guidance_scale = gen_guidance,
    sampler = gen_scheduler,
    FreeU = True,
    clip_skip = True,
    adetailer_A = True,
    adetailer_A_params=adetailer_params_A,
    xformers_memory_efficient_attention = True,
    hires_steps = 0,
    save_generated_images = False,
    upscaler_model_path = gen_upscaler,
    upscaler_increases_size = comp_upscale,
)

# now reload with controlnet tile
model = Model_Diffusers(
    base_model_id=gen_checkpoint,
    task_name='tile',
    retain_task_model_in_cache = False,
    vae_model = gen_vae,
)

# define our own image for step2
upscaled_image=comp_image[-1].convert("RGB")
upscaled_width, upscaled_height = upscaled_image.size

# check whether we need another upscale
if upscaled_width < gen_width:
    final_width = gen_width
else:
    # todo: upscale again with esrgan
    # for now just let TileCN handle this
    final_width = upscaled_width

# step 2:
detail_image, path_image = model(
    image=upscaled_image,
    t2i_adapter_preprocessor = True,
    prompt=gen_prompt,
    negative_prompt=gen_negative,
    preprocess_resolution = upscaled_width,
    image_resolution = final_width,
    num_steps = gen_steps,
    guidance_scale = gen_guidance,
    sampler = gen_scheduler,
    FreeU = True,
    clip_skip = True,
    adetailer_A = False,
    xformers_memory_efficient_attention = True,
    hires_steps = 0,
    save_generated_images = False,
)

# define our own image for step3
detailed_image=detail_image[-1].convert("RGB")
detailed_width, detailed_height = detailed_image.size

# step3: finalise the upscaled resolution
# get the difference between the detailed and final sizes
crop_width = detailed_width - gen_width
crop_height = detailed_height - gen_height

# calculate the vertical crop
if crop_width > 0:
    crop_width_distribution = crop_width / 2
    crop_left = 0 + crop_width_distribution
    crop_right = detailed_width - crop_width_distribution
else:
    crop_left = 0
    crop_right = detailed_width

# calculate the horizontal crop
if crop_height > 0:
    crop_height_distribution = crop_height / 2
    crop_top = 0 + crop_height_distribution
    crop_bottom = detailed_height - crop_height_distribution
else: 
    crop_top = 0
    crop_bottom = detailed_height

# now apply the crop
if crop_width > 0 or crop_height > 0:
    print('UltraHD: normalising image by ', crop_width, 'W and ', crop_height, 'H')
    final_image=detailed_image.crop((crop_left, crop_top, crop_right, crop_bottom))
else:
    final_image=detailed_image

# save the final image
# get a timestamp
timestamp = datetime.now()
timestamp_name = timestamp.strftime("%Y_%m_%d_%H%M%S")

# get the sanitised prompt string
# remove non-alphanumeric characters
safeprompt_name = ""
safeprompt_separator = "_"
for char in gen_prompt:
    if char.isalpha() or char.isdigit():
        safeprompt_name += char
    elif char == " ":
        safeprompt_name += safeprompt_separator

timestamped_png = timestamp_name + '_' + safeprompt_name[:75] + '.png'

print('UltraHD: saving file', timestamped_png)
# note: by default, the save method does not include exif data
final_image.save(timestamped_png,"PNG")

@R3gm R3gm added the enhancement New feature or request label Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants