[Feature Request] UltraHDFix Pipeline (method from imaginAIry) #12

iwr-redmond · 2024-12-05T14:32:41Z

Version 14 of Bryce Drennan's imaginAIry package conceived of a very helpful way to abstract away the details of going from SD15 resolution to full hires images using ControlNet:

Define resolutions for easy retrieval by the user (utils/named_resolutions.py)
Generate a low-resolution "composition" image (api/generate_refiners.py#L218)
Use the Tile ControlNet to turn this low-resolution starting point into the final image (api/generate_refiners.py#L260). (Note that in ImaginAIry, 'details' is an alias for control_v11f1e_sd15_tile (see config.py#L345).

Looking at constants.py in the current dev branch, you have already set up Tile-capable ControlNets for all three supported architectures. This would allow stablepy to replicate this user-friendly upscaling method easily.

I propose a new pipeline ultrahdfix (or similar) to do as follows:

Calculate the correct minimum resolution for the relevant model - 512px for SD15, 1024px for SDXL and FLUX - and generate a correctly scaled image per the user request, e.g. FHD [1920x1080] = 512x288
Do the initial generation, optionally with detailfix (TileCN does well with details when upscaling, at least in SD15)
Use ESRGAN to do the initial upscale for the original composition image, allowing the ControlNet model to focus entirely on detailing. (This may need to be done multiple times if the scale factor is greater than 4x; in the example it is 3.75).
Send the upscaled PIL composition image, prompts (SD) / prompt (FLUX), Loras (SD/FLUX) and embeddings (SD) to the relevant ControlNet for the model architecture.
Crop the final PIL image to exactly match the correct resolution if required (ESRGAN can sometimes be a few pixels out).
Provide the final PIL image to the user, with an option to retrieve the composition image for comparison.

As in the source program, the proposed helper pipeline would chain these preexisting steps together to make model-agnostic ultra high-resolution image generation as simple as possible for the user.

iwr-redmond · 2024-12-05T16:58:19Z

I wrote this demo code a while ago as I was experimenting. It's good enough for government work.

# do some math
comp_scale_width = gen_width / gen_native
comp_scale_height = gen_height / gen_native

# get the maximum compression ratio needed for the composition image
if comp_scale_width > comp_scale_height:
    comp_scale=comp_scale_width
else:
    comp_scale=comp_scale_height

# calculate the initial composition image size
comp_width_float = gen_width / comp_scale
comp_width = round(comp_width_float)
comp_height_float = gen_height / comp_scale
comp_height = round(comp_height_float)
print('UltraHD: Initial image size is ', comp_width, 'x', comp_height)

# get the upscale variable
if comp_scale > 4:
    comp_upscale = 4
else:
    comp_upscale=comp_scale

print('UltraHD: Initial upscale factor is', comp_upscale)

# step 1:
model = Model_Diffusers(
    base_model_id=gen_checkpoint,
    task_name='txt2img',
    retain_task_model_in_cache = False,
    vae_model = gen_vae,
)

comp_image, path_image = model(
    prompt=gen_prompt,
    negative_prompt=gen_negative,
    img_width = comp_width,
    img_height = comp_height,
    num_steps = gen_steps,
    guidance_scale = gen_guidance,
    sampler = gen_scheduler,
    FreeU = True,
    clip_skip = True,
    adetailer_A = True,
    adetailer_A_params=adetailer_params_A,
    xformers_memory_efficient_attention = True,
    hires_steps = 0,
    save_generated_images = False,
    upscaler_model_path = gen_upscaler,
    upscaler_increases_size = comp_upscale,
)

# now reload with controlnet tile
model = Model_Diffusers(
    base_model_id=gen_checkpoint,
    task_name='tile',
    retain_task_model_in_cache = False,
    vae_model = gen_vae,
)

# define our own image for step2
upscaled_image=comp_image[-1].convert("RGB")
upscaled_width, upscaled_height = upscaled_image.size

# check whether we need another upscale
if upscaled_width < gen_width:
    final_width = gen_width
else:
    # todo: upscale again with esrgan
    # for now just let TileCN handle this
    final_width = upscaled_width

# step 2:
detail_image, path_image = model(
    image=upscaled_image,
    t2i_adapter_preprocessor = True,
    prompt=gen_prompt,
    negative_prompt=gen_negative,
    preprocess_resolution = upscaled_width,
    image_resolution = final_width,
    num_steps = gen_steps,
    guidance_scale = gen_guidance,
    sampler = gen_scheduler,
    FreeU = True,
    clip_skip = True,
    adetailer_A = False,
    xformers_memory_efficient_attention = True,
    hires_steps = 0,
    save_generated_images = False,
)

# define our own image for step3
detailed_image=detail_image[-1].convert("RGB")
detailed_width, detailed_height = detailed_image.size

# step3: finalise the upscaled resolution
# get the difference between the detailed and final sizes
crop_width = detailed_width - gen_width
crop_height = detailed_height - gen_height

# calculate the vertical crop
if crop_width > 0:
    crop_width_distribution = crop_width / 2
    crop_left = 0 + crop_width_distribution
    crop_right = detailed_width - crop_width_distribution
else:
    crop_left = 0
    crop_right = detailed_width

# calculate the horizontal crop
if crop_height > 0:
    crop_height_distribution = crop_height / 2
    crop_top = 0 + crop_height_distribution
    crop_bottom = detailed_height - crop_height_distribution
else: 
    crop_top = 0
    crop_bottom = detailed_height

# now apply the crop
if crop_width > 0 or crop_height > 0:
    print('UltraHD: normalising image by ', crop_width, 'W and ', crop_height, 'H')
    final_image=detailed_image.crop((crop_left, crop_top, crop_right, crop_bottom))
else:
    final_image=detailed_image

# save the final image
# get a timestamp
timestamp = datetime.now()
timestamp_name = timestamp.strftime("%Y_%m_%d_%H%M%S")

# get the sanitised prompt string
# remove non-alphanumeric characters
safeprompt_name = ""
safeprompt_separator = "_"
for char in gen_prompt:
    if char.isalpha() or char.isdigit():
        safeprompt_name += char
    elif char == " ":
        safeprompt_name += safeprompt_separator

timestamped_png = timestamp_name + '_' + safeprompt_name[:75] + '.png'

print('UltraHD: saving file', timestamped_png)
# note: by default, the save method does not include exif data
final_image.save(timestamped_png,"PNG")

R3gm added the enhancement New feature or request label Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] UltraHDFix Pipeline (method from imaginAIry) #12

[Feature Request] UltraHDFix Pipeline (method from imaginAIry) #12

iwr-redmond commented Dec 5, 2024 •

edited

Loading

iwr-redmond commented Dec 5, 2024 •

edited

Loading

[Feature Request] UltraHDFix Pipeline (method from imaginAIry) #12

[Feature Request] UltraHDFix Pipeline (method from imaginAIry) #12

Comments

iwr-redmond commented Dec 5, 2024 • edited Loading

iwr-redmond commented Dec 5, 2024 • edited Loading

iwr-redmond commented Dec 5, 2024 •

edited

Loading

iwr-redmond commented Dec 5, 2024 •

edited

Loading