From Authors: Regarding how to test your own data #17

shallowdream204 · 2024-11-04T11:30:05Z

Currently, the inference code in the codebase has not yet been updated to include the extraction process for text prompts and T5 features. The current code only supports super-resolution from 256 to 1024, and support for arbitrary resolutions has not yet been added. I will update these parts as soon as possible; it's a busy time with the CVPR deadline approaching. Thank you all for your patience and understanding! I apologize for any inconvenience this may have caused you.

Updates
We have released more user-friendly inference code. Feel free to test your own images!

ningbende · 2024-11-05T01:56:03Z

Thank you for the update and for your hard work! We completely understand that with the CVPR deadline approaching, it's a particularly busy time for everyone.

SkaarFacee · 2024-11-08T12:06:57Z

Have you tested this on any ocr intensive images ? I was very curious of whether you could handle text degradation usually seen in scanned images of bills/textbooks

YHX021014 · 2024-11-12T12:21:53Z

Hi there,

I have successfully implemented the functionality to test custom data by adding the extraction process for text prompts and T5 features. This now allows for flexible usage beyond the current super-resolution from 256 to 1024.

Here are the steps to achieve this:

Step 1: Use the MLLM LLaVa to generate a description of the image and obtain the caption.
Example code llavaInfer.py:

from PIL import Image
import os
from pathlib import Path
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration
import argparse

def inferLLaVa_oneImage(img_path, prompt):
    image = Image.open(img_path)
    inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda:0")
    for k,v in inputs.items():
        print(k,v.shape)

    with torch.no_grad():
        outputs = model.generate(**inputs, max_new_tokens=100)

    description = processor.batch_decode(outputs, skip_special_tokens=True)
    description = description[0].split("ASSISTANT:")[-1]
    print(description)


def generate_descriptions_for_directory(image_dir, output_dir):
    
    os.makedirs(output_dir, exist_ok=True)
    
    for image_file in Path(image_dir).glob("*.png"): 
        image = Image.open(image_file)
        prompt = "USER: <image>\nDescribe this image and its style in a very detailed manner.\nASSISTANT:"
        inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda:0")

        with torch.no_grad():
            outputs = model.generate(**inputs, max_new_tokens=100)

        description = processor.batch_decode(outputs, skip_special_tokens=True)
        description = description[0].split("ASSISTANT:")[-1]

        output_file = Path(output_dir) / f"{image_file.stem}.txt"
        with open(output_file, "w") as f:
            f.write(description)
        print(f"Generated description for {image_file.name} saved to {output_file}")


if __name__ == '__main__':
    
    parser = argparse.ArgumentParser(description="Generate image descriptions using Llava model.")
    parser.add_argument('--images_dir', type=str, required=True, help="Directory containing input images.")
    parser.add_argument('--caption_dir', type=str, required=True, help="Directory to save generated captions.")

    args = parser.parse_args()

    processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf")
    model = LlavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-7b-hf", device_map="auto")
    
    generate_descriptions_for_directory(args.images_dir, args.caption_dir)

Command:

CUDA_VISIBLE_DEVICES=0,1 python llavaInfer.py \
	--images_dir /path/to/image/folder \
	--caption_dir /path/to/save/caption/folder

Step 2: Use tools/extract_t5_features.py to extract T5 features, resulting in .npz files.
Command:

python3 tools/extract_t5_features.py \
--t5_ckpt /path/to/t5-v1_1-xxl \
--caption_folder /path/to/caption/folder \
--save_npz_folder /path/to/save/npz/folder

Step 3: Image Restoration
Once you have the required .npz files and the low-quality images, you can proceed with Image Restoration using the inference methods introduced by the authors.
Command:

python3 -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 \
    test_1024.py configs/DreamClear/DreamClear_Test.py \
    --dreamclear_ckpt /path/to/DreamClear-1024.pth \
    --swinir_ckpt /path/to/general_swinir_v1.ckpt \
    --vae_ckpt /path/to/sd-vae-ft-ema \
    --lre --cfg_scale 4.5 --color_align wavelet \
    --image_path /path/to/RealLQ250/lq \
    --npz_path /path/to/RealLQ250/npz \
    --save_dir validation

If anyone needs help or details on this implementation, please feel free to reach out!

zelenooki87 · 2024-11-19T18:41:00Z

Hi there,

I have successfully implemented the functionality to test custom data by adding the extraction process for text prompts and T5 features. This now allows for flexible usage beyond the current super-resolution from 256 to 1024.

Here are the steps to achieve this:

Step 1: Use the MLLM LLaVa to generate a description of the image and obtain the caption. Example code llavaInfer.py:

from PIL import Image
import os
from pathlib import Path
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration
import argparse

def inferLLaVa_oneImage(img_path, prompt):
    image = Image.open(img_path)
    inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda:0")
    for k,v in inputs.items():
        print(k,v.shape)

    with torch.no_grad():
        outputs = model.generate(**inputs, max_new_tokens=100)

    description = processor.batch_decode(outputs, skip_special_tokens=True)
    description = description[0].split("ASSISTANT:")[-1]
    print(description)


def generate_descriptions_for_directory(image_dir, output_dir):
    
    os.makedirs(output_dir, exist_ok=True)
    
    for image_file in Path(image_dir).glob("*.png"): 
        image = Image.open(image_file)
        prompt = "USER: <image>\nDescribe this image and its style in a very detailed manner.\nASSISTANT:"
        inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda:0")

        with torch.no_grad():
            outputs = model.generate(**inputs, max_new_tokens=100)

        description = processor.batch_decode(outputs, skip_special_tokens=True)
        description = description[0].split("ASSISTANT:")[-1]

        output_file = Path(output_dir) / f"{image_file.stem}.txt"
        with open(output_file, "w") as f:
            f.write(description)
        print(f"Generated description for {image_file.name} saved to {output_file}")


if __name__ == '__main__':
    
    parser = argparse.ArgumentParser(description="Generate image descriptions using Llava model.")
    parser.add_argument('--images_dir', type=str, required=True, help="Directory containing input images.")
    parser.add_argument('--caption_dir', type=str, required=True, help="Directory to save generated captions.")

    args = parser.parse_args()

    processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf")
    model = LlavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-7b-hf", device_map="auto")
    
    generate_descriptions_for_directory(args.images_dir, args.caption_dir)

Command:

CUDA_VISIBLE_DEVICES=0,1 python llavaInfer.py \
	--images_dir /path/to/image/folder \
	--caption_dir /path/to/save/caption/folder

Step 2: Use tools/extract_t5_features.py to extract T5 features, resulting in .npz files. Command:

python3 tools/extract_t5_features.py \
--t5_ckpt /path/to/t5-v1_1-xxl \
--caption_folder /path/to/caption/folder \
--save_npz_folder /path/to/save/npz/folder

Step 3: Image Restoration Once you have the required .npz files and the low-quality images, you can proceed with Image Restoration using the inference methods introduced by the authors. Command:

python3 -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 \
    test_1024.py configs/DreamClear/DreamClear_Test.py \
    --dreamclear_ckpt /path/to/DreamClear-1024.pth \
    --swinir_ckpt /path/to/general_swinir_v1.ckpt \
    --vae_ckpt /path/to/sd-vae-ft-ema \
    --lre --cfg_scale 4.5 --color_align wavelet \
    --image_path /path/to/RealLQ250/lq \
    --npz_path /path/to/RealLQ250/npz \
    --save_dir validation

If anyone needs help or details on this implementation, please feel free to reach out!

But with your script I got square output and not properly 4:3 ratio.

shallowdream204 · 2024-11-30T11:57:14Z

We have released more user-friendly inference code. Feel free to test your own images!

shallowdream204 pinned this issue Nov 4, 2024

YHX021014 mentioned this issue Nov 12, 2024

add complete inference code #21

Open

shallowdream204 closed this as completed Dec 1, 2024

shallowdream204 unpinned this issue Dec 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

From Authors: Regarding how to test your own data #17

From Authors: Regarding how to test your own data #17

shallowdream204 commented Nov 4, 2024 •

edited

Loading

ningbende commented Nov 5, 2024

SkaarFacee commented Nov 8, 2024

YHX021014 commented Nov 12, 2024 •

edited

Loading

zelenooki87 commented Nov 19, 2024 •

edited

Loading

shallowdream204 commented Nov 30, 2024

From Authors: Regarding how to test your own data #17

From Authors: Regarding how to test your own data #17

Comments

shallowdream204 commented Nov 4, 2024 • edited Loading

ningbende commented Nov 5, 2024

SkaarFacee commented Nov 8, 2024

YHX021014 commented Nov 12, 2024 • edited Loading

zelenooki87 commented Nov 19, 2024 • edited Loading

shallowdream204 commented Nov 30, 2024

shallowdream204 commented Nov 4, 2024 •

edited

Loading

YHX021014 commented Nov 12, 2024 •

edited

Loading

zelenooki87 commented Nov 19, 2024 •

edited

Loading