Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

From Authors: Regarding how to test your own data #17

Closed
shallowdream204 opened this issue Nov 4, 2024 · 5 comments · May be fixed by #21
Closed

From Authors: Regarding how to test your own data #17

shallowdream204 opened this issue Nov 4, 2024 · 5 comments · May be fixed by #21

Comments

@shallowdream204
Copy link
Owner

shallowdream204 commented Nov 4, 2024

Currently, the inference code in the codebase has not yet been updated to include the extraction process for text prompts and T5 features. The current code only supports super-resolution from 256 to 1024, and support for arbitrary resolutions has not yet been added. I will update these parts as soon as possible; it's a busy time with the CVPR deadline approaching. Thank you all for your patience and understanding! I apologize for any inconvenience this may have caused you.

Updates
We have released more user-friendly inference code. Feel free to test your own images!

@shallowdream204 shallowdream204 pinned this issue Nov 4, 2024
@ningbende
Copy link

Thank you for the update and for your hard work! We completely understand that with the CVPR deadline approaching, it's a particularly busy time for everyone.

@SkaarFacee
Copy link

Have you tested this on any ocr intensive images ? I was very curious of whether you could handle text degradation usually seen in scanned images of bills/textbooks

@YHX021014
Copy link

YHX021014 commented Nov 12, 2024

Hi there,

I have successfully implemented the functionality to test custom data by adding the extraction process for text prompts and T5 features. This now allows for flexible usage beyond the current super-resolution from 256 to 1024.

Here are the steps to achieve this:

Step 1: Use the MLLM LLaVa to generate a description of the image and obtain the caption.
Example code llavaInfer.py:

from PIL import Image
import os
from pathlib import Path
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration
import argparse

def inferLLaVa_oneImage(img_path, prompt):
    image = Image.open(img_path)
    inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda:0")
    for k,v in inputs.items():
        print(k,v.shape)

    with torch.no_grad():
        outputs = model.generate(**inputs, max_new_tokens=100)

    description = processor.batch_decode(outputs, skip_special_tokens=True)
    description = description[0].split("ASSISTANT:")[-1]
    print(description)


def generate_descriptions_for_directory(image_dir, output_dir):
    
    os.makedirs(output_dir, exist_ok=True)
    
    for image_file in Path(image_dir).glob("*.png"): 
        image = Image.open(image_file)
        prompt = "USER: <image>\nDescribe this image and its style in a very detailed manner.\nASSISTANT:"
        inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda:0")

        with torch.no_grad():
            outputs = model.generate(**inputs, max_new_tokens=100)

        description = processor.batch_decode(outputs, skip_special_tokens=True)
        description = description[0].split("ASSISTANT:")[-1]

        output_file = Path(output_dir) / f"{image_file.stem}.txt"
        with open(output_file, "w") as f:
            f.write(description)
        print(f"Generated description for {image_file.name} saved to {output_file}")


if __name__ == '__main__':
    
    parser = argparse.ArgumentParser(description="Generate image descriptions using Llava model.")
    parser.add_argument('--images_dir', type=str, required=True, help="Directory containing input images.")
    parser.add_argument('--caption_dir', type=str, required=True, help="Directory to save generated captions.")

    args = parser.parse_args()

    processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf")
    model = LlavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-7b-hf", device_map="auto")
    
    generate_descriptions_for_directory(args.images_dir, args.caption_dir)

Command:

CUDA_VISIBLE_DEVICES=0,1 python llavaInfer.py \
	--images_dir /path/to/image/folder \
	--caption_dir /path/to/save/caption/folder

Step 2: Use tools/extract_t5_features.py to extract T5 features, resulting in .npz files.
Command:

python3 tools/extract_t5_features.py \
--t5_ckpt /path/to/t5-v1_1-xxl \
--caption_folder /path/to/caption/folder \
--save_npz_folder /path/to/save/npz/folder

Step 3: Image Restoration
Once you have the required .npz files and the low-quality images, you can proceed with Image Restoration using the inference methods introduced by the authors.
Command:

python3 -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 \
    test_1024.py configs/DreamClear/DreamClear_Test.py \
    --dreamclear_ckpt /path/to/DreamClear-1024.pth \
    --swinir_ckpt /path/to/general_swinir_v1.ckpt \
    --vae_ckpt /path/to/sd-vae-ft-ema \
    --lre --cfg_scale 4.5 --color_align wavelet \
    --image_path /path/to/RealLQ250/lq \
    --npz_path /path/to/RealLQ250/npz \
    --save_dir validation

If anyone needs help or details on this implementation, please feel free to reach out!

@zelenooki87
Copy link

zelenooki87 commented Nov 19, 2024

Hi there,

I have successfully implemented the functionality to test custom data by adding the extraction process for text prompts and T5 features. This now allows for flexible usage beyond the current super-resolution from 256 to 1024.

Here are the steps to achieve this:

Step 1: Use the MLLM LLaVa to generate a description of the image and obtain the caption. Example code llavaInfer.py:

from PIL import Image
import os
from pathlib import Path
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration
import argparse

def inferLLaVa_oneImage(img_path, prompt):
    image = Image.open(img_path)
    inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda:0")
    for k,v in inputs.items():
        print(k,v.shape)

    with torch.no_grad():
        outputs = model.generate(**inputs, max_new_tokens=100)

    description = processor.batch_decode(outputs, skip_special_tokens=True)
    description = description[0].split("ASSISTANT:")[-1]
    print(description)


def generate_descriptions_for_directory(image_dir, output_dir):
    
    os.makedirs(output_dir, exist_ok=True)
    
    for image_file in Path(image_dir).glob("*.png"): 
        image = Image.open(image_file)
        prompt = "USER: <image>\nDescribe this image and its style in a very detailed manner.\nASSISTANT:"
        inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda:0")

        with torch.no_grad():
            outputs = model.generate(**inputs, max_new_tokens=100)

        description = processor.batch_decode(outputs, skip_special_tokens=True)
        description = description[0].split("ASSISTANT:")[-1]

        output_file = Path(output_dir) / f"{image_file.stem}.txt"
        with open(output_file, "w") as f:
            f.write(description)
        print(f"Generated description for {image_file.name} saved to {output_file}")


if __name__ == '__main__':
    
    parser = argparse.ArgumentParser(description="Generate image descriptions using Llava model.")
    parser.add_argument('--images_dir', type=str, required=True, help="Directory containing input images.")
    parser.add_argument('--caption_dir', type=str, required=True, help="Directory to save generated captions.")

    args = parser.parse_args()

    processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf")
    model = LlavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-7b-hf", device_map="auto")
    
    generate_descriptions_for_directory(args.images_dir, args.caption_dir)

Command:

CUDA_VISIBLE_DEVICES=0,1 python llavaInfer.py \
	--images_dir /path/to/image/folder \
	--caption_dir /path/to/save/caption/folder

Step 2: Use tools/extract_t5_features.py to extract T5 features, resulting in .npz files. Command:

python3 tools/extract_t5_features.py \
--t5_ckpt /path/to/t5-v1_1-xxl \
--caption_folder /path/to/caption/folder \
--save_npz_folder /path/to/save/npz/folder

Step 3: Image Restoration Once you have the required .npz files and the low-quality images, you can proceed with Image Restoration using the inference methods introduced by the authors. Command:

python3 -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 \
    test_1024.py configs/DreamClear/DreamClear_Test.py \
    --dreamclear_ckpt /path/to/DreamClear-1024.pth \
    --swinir_ckpt /path/to/general_swinir_v1.ckpt \
    --vae_ckpt /path/to/sd-vae-ft-ema \
    --lre --cfg_scale 4.5 --color_align wavelet \
    --image_path /path/to/RealLQ250/lq \
    --npz_path /path/to/RealLQ250/npz \
    --save_dir validation

If anyone needs help or details on this implementation, please feel free to reach out!

But with your script I got square output and not properly 4:3 ratio.

@shallowdream204
Copy link
Owner Author

We have released more user-friendly inference code. Feel free to test your own images!

@shallowdream204 shallowdream204 unpinned this issue Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants