Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typo in CONTRIBUTING.md #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

eltociear
Copy link

reproducability -> reproducibility

reproducability -> reproducibility
@floveqq
Copy link

floveqq commented Oct 14, 2023

so why not open issues?
when will you release the codes and Ferret-Bench?

@HaisongDing
Copy link

Any plans to release the dataset?

@yangJirui
Copy link

how to access the dataset ?

@Hxyou
Copy link

Hxyou commented Oct 31, 2023

The training/inference/eval code has been released. FerretBench Evaluation is also included here. But training data and checkpoints seem not ready yet. Let’s stay tuned!

@LiWentomng
Copy link

@Hxyou Hi, great work. Would you like to release the checkpoint files soon? I can't wait to try it~

@zhyj3038
Copy link

why this method can boost the grouding ability? I think the "hybrid region representation" can just enhance refering abilitys... who can explain it? thanks

@floveqq
Copy link

floveqq commented Dec 10, 2023

Why not open the issues?

@Hxyou
Copy link

Hxyou commented Dec 20, 2023

Thank you for your patience. Ckpts have been released last week! Feel free to try it~

@Hxyou
Copy link

Hxyou commented Dec 20, 2023

why this method can boost the grouding ability? I think the "hybrid region representation" can just enhance refering abilitys... who can explain it? thanks

I think we didn't claim that "hybrid region representation" can help grounding. But instead, in experiments, we ablate whether referring data/task can help grounding when jointly trained, and the answer is yes. I hypothesized that it's because those two tasks all require fine-grained spatial understanding. I.e., by training on one task, the LLM implicitly learns the projection of coordinates and region features to the real location in the image, and thus the other task can also get boosted.

@Hxyou
Copy link

Hxyou commented Dec 20, 2023

Why not open the issues?

I am sorry for that. The repo was set up by the company. Feel free to email us or leave comments in this pull request as if raising issues. We will try our best to answer.

@peiwang062
Copy link

Hi, thanks for sharing the great work! But why not opening issues? But anyway, I tried to reproduce the ferret-7b evaluation results on refcocog, but my reproduced results are pretty bad, 5% vs 84%.

I exactly followed the instruction steps. The vicuna-7b model I used is https://huggingface.co/lmsys/vicuna-7b-v1.3, and the annotation json is from https://huggingface.co/GLIPModel/GLIP/tree/66ee3ae9a3b8cee0cf78f10ef5fc9a3725db02a1/mdetr_annotations. I manually checked the bbox from the produced 0_of_1.jsonl, most of the boxes are wrong. So it is not the problem of eval_refexp.py (by the way, misc line 23 is missing there). I didn’t encounter any issue or error during installing or checkpoint generation steps, so any suggestion what might be the problem causing the mismatching?

Looking forward to the reply. Thanks for sharing this excellent work again!

@Haotian-Zhang
Copy link
Collaborator

Hi, thanks for sharing the great work! But why not opening issues? But anyway, I tried to reproduce the ferret-7b evaluation results on refcocog, but my reproduced results are pretty bad, 5% vs 84%.

I exactly followed the instruction steps. The vicuna-7b model I used is https://huggingface.co/lmsys/vicuna-7b-v1.3, and the annotation json is from https://huggingface.co/GLIPModel/GLIP/tree/66ee3ae9a3b8cee0cf78f10ef5fc9a3725db02a1/mdetr_annotations. I manually checked the bbox from the produced 0_of_1.jsonl, most of the boxes are wrong. So it is not the problem of eval_refexp.py (by the way, misc line 23 is missing there). I didn’t encounter any issue or error during installing or checkpoint generation steps, so any suggestion what might be the problem causing the mismatching?

Looking forward to the reply. Thanks for sharing this excellent work again!

Hi @peiwang062 , thanks for the questions. The visualization of the boxes from the .jsonl file is incorrect because the raw prediction file does not exactly match the image size. The coordinates reflected here are the ones in the range between 0 and 999. In our eval_refexp.py, we do that resize (

resized_box_list = resize_bbox(box_list, img_w, img_h)
) to map back to the original image size. Could you please check that again?

@peiwang062
Copy link

Hi, thanks for sharing the great work! But why not opening issues? But anyway, I tried to reproduce the ferret-7b evaluation results on refcocog, but my reproduced results are pretty bad, 5% vs 84%.
I exactly followed the instruction steps. The vicuna-7b model I used is https://huggingface.co/lmsys/vicuna-7b-v1.3, and the annotation json is from https://huggingface.co/GLIPModel/GLIP/tree/66ee3ae9a3b8cee0cf78f10ef5fc9a3725db02a1/mdetr_annotations. I manually checked the bbox from the produced 0_of_1.jsonl, most of the boxes are wrong. So it is not the problem of eval_refexp.py (by the way, misc line 23 is missing there). I didn’t encounter any issue or error during installing or checkpoint generation steps, so any suggestion what might be the problem causing the mismatching?
Looking forward to the reply. Thanks for sharing this excellent work again!

Hi @peiwang062 , thanks for the questions. The visualization of the boxes from the .jsonl file is incorrect because the raw prediction file does not exactly match the image size. The coordinates reflected here are the ones in the range between 0 and 999. In our eval_refexp.py, we do that resize (

resized_box_list = resize_bbox(box_list, img_w, img_h)

) to map back to the original image size. Could you please check that again?

Hi Haotian, thanks for the quick reply!
For eval.refexp.py, because misc doesn’t exist. I used from torch vision.ops import box_iou to replace your original box_iou and modified Line 168,169 to

        iou = box_iou(predict_boxes, target_bbox)
        mean_iou = box_iou(predict_boxes.mean(0).view(-1, 4), target_bbox)

Other than, I didn’t do anything. Not sure if the problem is here.

@Haotian-Zhang
Copy link
Collaborator

Hi, thanks for sharing the great work! But why not opening issues? But anyway, I tried to reproduce the ferret-7b evaluation results on refcocog, but my reproduced results are pretty bad, 5% vs 84%.
I exactly followed the instruction steps. The vicuna-7b model I used is https://huggingface.co/lmsys/vicuna-7b-v1.3, and the annotation json is from https://huggingface.co/GLIPModel/GLIP/tree/66ee3ae9a3b8cee0cf78f10ef5fc9a3725db02a1/mdetr_annotations. I manually checked the bbox from the produced 0_of_1.jsonl, most of the boxes are wrong. So it is not the problem of eval_refexp.py (by the way, misc line 23 is missing there). I didn’t encounter any issue or error during installing or checkpoint generation steps, so any suggestion what might be the problem causing the mismatching?
Looking forward to the reply. Thanks for sharing this excellent work again!

Hi @peiwang062 , thanks for the questions. The visualization of the boxes from the .jsonl file is incorrect because the raw prediction file does not exactly match the image size. The coordinates reflected here are the ones in the range between 0 and 999. In our eval_refexp.py, we do that resize (

resized_box_list = resize_bbox(box_list, img_w, img_h)

) to map back to the original image size. Could you please check that again?

Hi Haotian, thanks for the quick reply! For eval.refexp.py, because misc doesn’t exist. I used from torch vision.ops import box_iou to replace your original box_iou and modified Line 168,169 to

        iou = box_iou(predict_boxes, target_bbox)
        mean_iou = box_iou(predict_boxes.mean(0).view(-1, 4), target_bbox)

Other than, I didn’t do anything. Not sure if the problem is here.

Thanks for the findings! As the company is on holiday these days, I will ask folks to recover the missing files once they are back. In the meantime, I will follow up with you about this file in the email. Thanks a lot for the help!

@peiwang062
Copy link

Hi, thanks for sharing the great work! But why not opening issues? But anyway, I tried to reproduce the ferret-7b evaluation results on refcocog, but my reproduced results are pretty bad, 5% vs 84%.
I exactly followed the instruction steps. The vicuna-7b model I used is https://huggingface.co/lmsys/vicuna-7b-v1.3, and the annotation json is from https://huggingface.co/GLIPModel/GLIP/tree/66ee3ae9a3b8cee0cf78f10ef5fc9a3725db02a1/mdetr_annotations. I manually checked the bbox from the produced 0_of_1.jsonl, most of the boxes are wrong. So it is not the problem of eval_refexp.py (by the way, misc line 23 is missing there). I didn’t encounter any issue or error during installing or checkpoint generation steps, so any suggestion what might be the problem causing the mismatching?
Looking forward to the reply. Thanks for sharing this excellent work again!

Hi @peiwang062 , thanks for the questions. The visualization of the boxes from the .jsonl file is incorrect because the raw prediction file does not exactly match the image size. The coordinates reflected here are the ones in the range between 0 and 999. In our eval_refexp.py, we do that resize (

resized_box_list = resize_bbox(box_list, img_w, img_h)

) to map back to the original image size. Could you please check that again?

Hi Haotian, thanks for the quick reply! For eval.refexp.py, because misc doesn’t exist. I used from torch vision.ops import box_iou to replace your original box_iou and modified Line 168,169 to

        iou = box_iou(predict_boxes, target_bbox)
        mean_iou = box_iou(predict_boxes.mean(0).view(-1, 4), target_bbox)

Other than, I didn’t do anything. Not sure if the problem is here.

Thanks for the findings! As the company is on holiday these days, I will ask folks to recover the missing files once they are back. In the meantime, I will follow up with you about this file in the email. Thanks a lot for the help!

Thank you so much Haotian! Merry Christmas!

@bensonbs
Copy link

Sorry to ask here, but I need some help.

image

I've followed the issues but am still encountering the same error:

NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.

@Hxyou
Copy link

Hxyou commented Dec 27, 2023

Sorry to ask here, but I need some help.

image I've followed the [issues](https://github.com/lm-sys/FastChat/issues/412) but am still encountering the same error:
NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.

Hi @bensonbs , the error might be triggered by multiple reasons.
Can you first double-check you can successfully run those three commands (controller, gradio web server, model worker) as instructed in the readme without errors?
Then can you show us screenshots of those three programs launched by the three commands when the error in demo happens? This can help to debug where the problem comes from.

@bensonbs
Copy link

Sorry to ask here, but I need some help.
image
I've followed the issues but am still encountering the same error:

NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.

Hi @bensonbs , the error might be triggered by multiple reasons. Can you first double-check you can successfully run those three commands (controller, gradio web server, model worker) as instructed in the readme without errors? Then can you show us screenshots of those three programs launched by the three commands when the error in demo happens? This can help to debug where the problem comes from.

Here is the result. Thank you for your assistance.

(ferret) root@58dfc909b9e7:/share/ml-ferret# python -m ferret.serve.controller --host 0.0.0.0 --port 10000
[2023-12-28 00:46:15,431] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-12-28 00:46:15 | INFO | controller | args: Namespace(host='0.0.0.0', port=10000, dispatch_method='shortest_queue')
2023-12-28 00:46:15 | INFO | controller | Init controller
2023-12-28 00:46:16 | ERROR | stderr | INFO:     Started server process [412]
2023-12-28 00:46:16 | ERROR | stderr | INFO:     Waiting for application startup.
2023-12-28 00:46:16 | ERROR | stderr | INFO:     Application startup complete.
2023-12-28 00:46:16 | ERROR | stderr | INFO:     Uvicorn running on http://0.0.0.0:10000 (Press CTRL+C to quit)
2023-12-28 00:46:20 | INFO | controller | Receive unknown heart beat. http://localhost:40000
2023-12-28 00:46:20 | INFO | stdout | INFO:     127.0.0.1:58750 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-12-28 00:46:20 | INFO | controller | Register a new worker: http://localhost:40000
2023-12-28 00:46:20 | INFO | controller | Register done: http://localhost:40000, {'model_names': ['FERRET-13B-v0'], 'speed': 1, 'queue_length': 0}
2023-12-28 00:46:20 | INFO | stdout | INFO:     127.0.0.1:58764 - "POST /register_worker HTTP/1.1" 200 OK
2023-12-28 00:46:35 | INFO | controller | Receive heart beat. http://localhost:40000
2023-12-28 00:47:23 | INFO | controller | Register a new worker: http://localhost:40000
2023-12-28 00:47:23 | INFO | controller | Register done: http://localhost:40000, {'model_names': ['FERRET-13B-v0'], 'speed': 1, 'queue_length': 0}
2023-12-28 00:47:23 | INFO | stdout | INFO:     127.0.0.1:45306 - "POST /refresh_all_workers HTTP/1.1" 200 OK
2023-12-28 00:47:23 | INFO | stdout | INFO:     127.0.0.1:45312 - "POST /list_models HTTP/1.1" 200 OK
2023-12-28 00:47:31 | INFO | controller | Register a new worker: http://localhost:40000
2023-12-28 00:47:31 | INFO | controller | Register done: http://localhost:40000, {'model_names': ['FERRET-13B-v0'], 'speed': 1, 'queue_length': 0}
2023-12-28 00:47:31 | INFO | stdout | INFO:     127.0.0.1:45318 - "POST /refresh_all_workers HTTP/1.1" 200 OK
2023-12-28 00:47:31 | INFO | stdout | INFO:     127.0.0.1:45324 - "POST /list_models HTTP/1.1" 200 OK
2023-12-28 00:47:35 | INFO | controller | Receive heart beat. http://localhost:40000
2023-12-28 00:47:35 | INFO | stdout | INFO:     127.0.0.1:41430 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-12-28 00:47:43 | INFO | controller | names: ['http://localhost:40000'], queue_lens: [0.0], ret: http://localhost:40000
(ferret) root@58dfc909b9e7:/share/ml-ferret# python -m ferret.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --add_region_feature --port 8501
[2023-12-28 00:47:22,094] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-12-28 00:47:23 | INFO | gradio_web_server | args: Namespace(host='0.0.0.0', port=8501, controller_url='http://localhost:10000', concurrency_count=8, model_list_mode='reload', share=False, moderate=False, embed=False, add_region_feature=True)
2023-12-28 00:47:23 | INFO | gradio_web_server | Models: ['FERRET-13B-v0']
2023-12-28 00:47:23 | INFO | gradio_web_server | Namespace(host='0.0.0.0', port=8501, controller_url='http://localhost:10000', concurrency_count=8, model_list_mode='reload', share=False, moderate=False, embed=False, add_region_feature=True)
2023-12-28 00:47:23 | ERROR | stderr | /root/miniconda3/envs/ferret/lib/python3.10/site-packages/gradio/deprecation.py:43: UserWarning: You have unused kwarg parameters in Textbox, please remove them: {'container': False}
2023-12-28 00:47:23 | ERROR | stderr |   warnings.warn(
2023-12-28 00:47:25 | ERROR | stderr | /root/miniconda3/envs/ferret/lib/python3.10/site-packages/gradio/deprecation.py:43: UserWarning: You have unused kwarg parameters in Dropdown, please remove them: {'container': False}
2023-12-28 00:47:25 | ERROR | stderr |   warnings.warn(
2023-12-28 00:47:25 | INFO | stdout | Running on local URL:  http://0.0.0.0:8501
2023-12-28 00:47:25 | INFO | stdout | 
2023-12-28 00:47:25 | INFO | stdout | To create a public link, set `share=True` in `launch()`.
2023-12-28 00:47:31 | INFO | gradio_web_server | load_demo. ip: 172.21.0.1
2023-12-28 00:47:31 | INFO | gradio_web_server | Models: ['FERRET-13B-v0']
2023-12-28 00:47:34 | INFO | stdout | Init Uploading Images.
2023-12-28 00:47:43 | INFO | gradio_web_server | add_text. ip: 172.21.0.1. len: 13
2023-12-28 00:47:43 | INFO | stdout | No location, copy original image in add_text
2023-12-28 00:47:43 | INFO | gradio_web_server | http_bot. ip: 172.21.0.1
2023-12-28 00:47:43 | INFO | gradio_web_server | model_name: FERRET-13B-v0, worker_addr: http://localhost:40000
2023-12-28 00:47:43 | INFO | stdout | Input Image Size:(512, 512)
2023-12-28 00:47:43 | INFO | stdout | Input Image Size:(512, 512)
2023-12-28 00:47:43 | INFO | gradio_web_server | ==== request ====
{'model': 'FERRET-13B-v0', 'prompt': 'A chat between a human and an AI that understands visuals. In images, [x, y] denotes points: top-left [0, 0], bottom-right [width-1, height-1]. Increasing x moves right; y moves down. Bounding box: [x1, y1, x2, y2]. Image size: 1000x1000. Follow instructions.  USER: <image>\nwhat is this? ASSISTANT:', 'temperature': 0.2, 'top_p': 0.7, 'max_new_tokens': 512, 'stop': '</s>', 'images': "List of 1 images: ['195205623896f712c8831c15be32a339']"}
2023-12-28 00:47:43 | INFO | gradio_web_server | ==== add region_masks_in_prompts to request ====

2023-12-28 00:47:43 | INFO | stdout | Input Image Size:(512, 512)
2023-12-28 00:47:43 | INFO | stdout | Input Prompt: A chat between a human and an AI that understands visuals. In images, [x, y] denotes points: top-left [0, 0], bottom-right [width-1, height-1]. Increasing x moves right; y moves down. Bounding box: [x1, y1, x2, y2]. Image size: 1000x1000. Follow instructions.  USER: <image>
2023-12-28 00:47:43 | INFO | stdout | what is this? ASSISTANT:
(base) root@58dfc909b9e7:/share/ml-ferret# conda activate ferret
(ferret) root@58dfc909b9e7:/share/ml-ferret# CUDA_VISIBLE_DEVICES=0 python -m ferret.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ./checkpoints/FERRET-13B-v0 --add_region_feature
[2023-12-27 16:29:26,569] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-12-27 16:29:27 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40000, worker_address='http://localhost:40000', controller_address='http://localhost:10000', model_path='./checkpoints/FERRET-13B-v0', model_base=None, model_name=None, multi_modal=False, keep_aspect_ratio=False, num_gpus=1, limit_model_concurrency=5, stream_interval=1, no_register=False, load_8bit=False, load_4bit=False, add_region_feature=True, image_w=336, image_h=336)
2023-12-27 16:29:27 | INFO | model_worker | Loading the model FERRET-13B-v0 on worker 9be48e ...
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Loading checkpoint shards:   0%|                                                                                          | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards:  33%|███████████████████████████▎                                                      | 1/3 [00:19<00:39, 19.54s/it]
Loading checkpoint shards:  67%|██████████████████████████████████████████████████████▋                           | 2/3 [00:35<00:17, 17.59s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████| 3/3 [00:45<00:00, 13.85s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████| 3/3 [00:45<00:00, 15.06s/it]
2023-12-27 16:30:14 | ERROR | stderr | 
2023-12-27 16:30:16 | INFO | model_worker | Register to controller
2023-12-27 16:30:16 | ERROR | stderr | INFO:     Started server process [243]
2023-12-27 16:30:16 | ERROR | stderr | INFO:     Waiting for application startup.
2023-12-27 16:30:16 | ERROR | stderr | INFO:     Application startup complete.
2023-12-27 16:30:16 | ERROR | stderr | INFO:     Uvicorn running on http://0.0.0.0:40000 (Press CTRL+C to quit)
2023-12-27 16:31:04 | INFO | stdout | INFO:     127.0.0.1:46046 - "POST /worker_get_status HTTP/1.1" 200 OK
2023-12-27 16:31:18 | INFO | stdout | INFO:     127.0.0.1:33794 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2023-12-27 16:31:18 | INFO | model_worker | Add region_masks to image_args.
2023-12-27 16:37:48 | INFO | stdout | INFO:     127.0.0.1:38944 - "POST /worker_get_status HTTP/1.1" 200 OK
2023-12-27 16:37:59 | INFO | stdout | INFO:     127.0.0.1:56904 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2023-12-27 16:38:00 | INFO | model_worker | Add region_masks to image_args.
2023-12-28 00:46:20 | INFO | model_worker | Register to controller
2023-12-28 00:47:23 | INFO | stdout | INFO:     127.0.0.1:43410 - "POST /worker_get_status HTTP/1.1" 200 OK
2023-12-28 00:47:31 | INFO | stdout | INFO:     127.0.0.1:43426 - "POST /worker_get_status HTTP/1.1" 200 OK
2023-12-28 00:47:43 | INFO | stdout | INFO:     127.0.0.1:48710 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2023-12-28 00:47:43 | INFO | model_worker | Add region_masks to image_args.

@Hxyou
Copy link

Hxyou commented Dec 28, 2023

@bensonbs Your logs look fine to me. It seems model_worker can successfully receive input since Add region_masks to image_args is printed and the generation process doesn't raise any error. I also tried your commands in my sever and it works well.
Can you try to (1). make sure the model is selected in the model list of demo webpage (as highlighted in the figure) (2). click the regenerate button. (3). refresh the demo webpage (4). try different ports in case there is a conflict with other processes.
image

@bensonbs
Copy link

bensonbs commented Dec 28, 2023

@bensonbs Your logs look fine to me. It seems model_worker can successfully receive input since Add region_masks to image_args is printed and the generation process doesn't raise any error. I also tried your commands in my sever and it works well. Can you try to (1). make sure the model is selected in the model list of demo webpage (as highlighted in the figure) (2). click the regenerate button. (3). refresh the demo webpage (4). try different ports in case there is a conflict with other processes. image

After a whole day of trying, I think it's 'doge' causing the trouble.🫠

image

image

@bensonbs
Copy link

root@46290910f996:~# python3 -m ferret.model.apply_delta     --base ./model/vicuna-13b-v1-3     --target model/ferret-13b-v1-3     --delta model/ferret-13b-delta
[2023-12-28 10:44:30,138] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading base model
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/root/ferret/model/apply_delta.py", line 69, in <module>
    apply_delta(args.base_model_path, args.target_model_path, args.delta_path)
  File "/root/ferret/model/apply_delta.py", line 36, in apply_delta
    base = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 441, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 926, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 632, in __getitem__
    raise KeyError(key)
KeyError: 'llava'

@ronnymunthe99
Copy link

ronnymunthe99 commented Dec 28, 2023

root@46290910f996:~# python3 -m ferret.model.apply_delta     --base ./model/vicuna-13b-v1-3     --target model/ferret-13b-v1-3     --delta model/ferret-13b-delta
[2023-12-28 10:44:30,138] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading base model
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/root/ferret/model/apply_delta.py", line 69, in <module>
    apply_delta(args.base_model_path, args.target_model_path, args.delta_path)
  File "/root/ferret/model/apply_delta.py", line 36, in apply_delta
    base = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 441, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 926, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 632, in __getitem__
    raise KeyError(key)
KeyError: 'llava'

image
I also got this error when I tried to use a pre-trained llava model, and I tried changing the model type in config.json from "llava" to "llama", and its works for me

@Hxyou
Copy link

Hxyou commented Dec 28, 2023

root@46290910f996:~# python3 -m ferret.model.apply_delta     --base ./model/vicuna-13b-v1-3     --target model/ferret-13b-v1-3     --delta model/ferret-13b-delta
[2023-12-28 10:44:30,138] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading base model
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/root/ferret/model/apply_delta.py", line 69, in <module>
    apply_delta(args.base_model_path, args.target_model_path, args.delta_path)
  File "/root/ferret/model/apply_delta.py", line 36, in apply_delta
    base = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 441, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 926, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 632, in __getitem__
    raise KeyError(key)
KeyError: 'llava'

@bensonbs It seems that your base model (./model/vicuna-13b-v1-3) actually has its "model_type" in config.json as 'llava'. Can you double-check your vicuna weights? It should use "llama" as its "model_type". When applying delta, we don't need any weights from LLaVA and only need the original vicuna.

@crabmon
Copy link

crabmon commented Jan 9, 2024

i got error while i was running the codes for checkpoints:

python3 -m ferret.model.apply_delta
--base /model/vicuna-7b-v1-3
--target /model/ferret-7b-v1-3
--delta /model/ferret-7b-delta

the error is: pickle.UnpicklingError: invalid load key, 'v'.

i've placed vicuna-7b-v1-3 and ferret-7b-delta both under model folder. i couldnt find ferret-7b-v1-3 so i created an empty folder with the same name. how can i get pass this error?

@Hxyou
Copy link

Hxyou commented Jan 12, 2024

i got error while i was running the codes for checkpoints:

python3 -m ferret.model.apply_delta --base /model/vicuna-7b-v1-3 --target /model/ferret-7b-v1-3 --delta /model/ferret-7b-delta

the error is: pickle.UnpicklingError: invalid load key, 'v'.

i've placed vicuna-7b-v1-3 and ferret-7b-delta both under model folder. i couldnt find ferret-7b-v1-3 so i created an empty folder with the same name. how can i get pass this error?

@crabmon Hi, placing vicuna and ferret-delta in the same model folder and creating an empty folder of ferret-7b-v1-3 works well from my side. Can you provide more details of the error? such as screenshots of command/error.
Screen Shot 2024-01-11 at 10 14 45 PM

@crabmon
Copy link

crabmon commented Jan 12, 2024

hi @Hxyou, thank you for your reply. i managed to get pass that step. now im stucked again. it says NETWORK error due to HIGH traffic.....

from terminal, it says caught unknown error CUDNN error: cudnn_status_internal_error.

any clue?

@Hxyou
Copy link

Hxyou commented Jan 12, 2024

hi @Hxyou, thank you for your reply. i managed to get pass that step. now im stucked again. it says NETWORK error due to HIGH traffic.....

from terminal, it says caught unknown error CUDNN error: cudnn_status_internal_error.

any clue?

@crabmon Can you show us a screenshot of the terminal errors? and What is the GPU you are using? What is the version of pytorch and cuda?

@crabmon
Copy link

crabmon commented Jan 16, 2024

hi @Hxyou, thank you for your reply. i managed to get pass that step. now im stucked again. it says NETWORK error due to HIGH traffic.....
from terminal, it says caught unknown error CUDNN error: cudnn_status_internal_error.
any clue?

@crabmon Can you show us a screenshot of the terminal errors? and What is the GPU you are using? What is the version of pytorch and cuda?

@Hxyou , please refer to the screen capture before.
MicrosoftTeams-image

im using 3090 , Cuda version 12 and Driver Version 525.147.05

@Hxyou
Copy link

Hxyou commented Jan 22, 2024

@crabmon CUDNN_STATUS_INTERNAL_ERROR is typically hard to debug since it gives no real error message. But in many cases, it's due to OOM. 3090 only has 24GB memory and a 13B model often consumes more than 20GB memory. Are you trying a 7b or 13b model?

@orcunderscore
Copy link

orcunderscore commented Jan 23, 2024

Hi there @Hxyou ,

I was wondering about the dataset.

Right I can only find a few datapoints in the eval set, which I assume at least have the same format as the rest of the dataset (?).

I can see that some loading of different datasources is happening in https://github.com/apple/ml-ferret/blob/main/ferret/train/train.py, but since the repos main README explicitly mentions that the dataset has it's own license, I was expecting that I can download it as is somewhere.

Is there a timeline on when it will be released publicly? Or is there a link to download it somewhere or a script to basically reconstruct it using a standalone script and download instructions for the other datasets it is based on?

Thanks!

@Hxyou
Copy link

Hxyou commented Jan 26, 2024

Hi there @Hxyou ,

I was wondering about the dataset.

Right I can only find a few datapoints in the eval set, which I assume at least have the same format as the rest of the dataset (?).

I can see that some loading of different datasources is happening in https://github.com/apple/ml-ferret/blob/main/ferret/train/train.py, but since the repos main README explicitly mentions that the dataset has it's own license, I was expecting that I can download it as is somewhere.

Is there a timeline on when it will be released publicly? Or is there a link to download it somewhere or a script to basically reconstruct it using a standalone script and download instructions for the other datasets it is based on?

Thanks!

@ibims1entwickler Thank you for your interest. Since we released Ferret-Edit evaluation data, the license mainly applies to that data. As for training data, it's still under internal review and we're not able to promise an exact time. I think providing the preparation scripts is a good idea and we will discuss it.

@orcunderscore
Copy link

orcunderscore commented Feb 6, 2024

Hi there @Hxyou ,
I was wondering about the dataset.
Right I can only find a few datapoints in the eval set, which I assume at least have the same format as the rest of the dataset (?).
I can see that some loading of different datasources is happening in https://github.com/apple/ml-ferret/blob/main/ferret/train/train.py, but since the repos main README explicitly mentions that the dataset has it's own license, I was expecting that I can download it as is somewhere.
Is there a timeline on when it will be released publicly? Or is there a link to download it somewhere or a script to basically reconstruct it using a standalone script and download instructions for the other datasets it is based on?
Thanks!

@ibims1entwickler Thank you for your interest. Since we released Ferret-Edit evaluation data, the license mainly applies to that data. As for training data, it's still under internal review and we're not able to promise an exact time. I think providing the preparation scripts is a good idea and we will discuss it.

@Hxyou , could you provide examples for the expected input format for training then (can be mock data).
I am specifically wondering, how data are grouped.

In LLaVA as far as I understand, the input format is a json file like this:

[
  {
    "id": "000000000001",
    "image": "image_name.png",
    "conversations": [
      {
        "from": "human",
        "value": "What do you see happening in this image?\n<image>"
      },
      {
        "from": "gpt",
        "value": "Answer about what happens in the image."
      },
      {
        "from": "human",
        "value": "Another question about the image?"
      },
      {
        "from": "gpt",
        "value": "Answer to the other question."
      },
...
    ]
  },
...
]

So

  • Data are grouped in conversations that belong to an image
  • the image is only mentioned in the first question-prompt

From the inference code in this repo, I can already guess that in FERRET:

  • In the prompt, points should be referred to like this [x, y] <region_fea> and bounding boxes and masks like [x1, y1, x2, y2] <region_fea>.

Open questions:

  • Can you provide examples for the dataformat? [mock example for all possible input modes (point, bbox, mask)]
  • How does the dataloader expect to find the binary masks that are required as input to the model?

Seeing that in the dataloader this might be different depending on the dataset, can you provide examples for one of the datasets that can do points, mask and bbox?
I am trying to train on my own data, but it is really hard to figure out the training input format.

Thank you!

@Hxyou
Copy link

Hxyou commented Feb 7, 2024

@ibims1entwickler Hi, here are some data samples you can refer to for the exact dataformat. https://anonymous.4open.science/r/ferret-anonymous-6773/training_data.md
Note: The box_x1y1x2y2 is the box list where the coordinates are in line with the original image's width and height. The masks are also global image masks in the same shape as the original image's width and height. The number of list in box_x1y1x2y2 and masks corresponds to the number of messages in conversations (if you see an empty list [], it means the corresponding message doesn't include any location).

The binary masks are encoded by cocoapi (https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/mask.py). If the filed masks are provided in the data JSON file, then the dataloader should be able to load it.

@orcunderscore
Copy link

@Hxyou thank you for that, that was really helpful, I'll try to mock something up to start a mock training with that tomorrow. A few follow up questions:

  1. The link shows no example of grounding (i.e. not having the bbox in the "from": "human" but having one in the "from": "gpt") and also no "text-only" conversations. I think I can trivially extend to the grounding example. Should I set "location_instruction": false, for a pure text example?
  2. In the example, you show masks and boxes. What about points? Are they just represented as a mask with one single pixel non-zero? Or are they supposed to be circles with a small radius as mentioned in the paper [this does not happen during inference]?
  3. When I use my own dataset, I'll have to add a loading function for it to LazySupervisedDataset, right?

@Hxyou
Copy link

Hxyou commented Feb 10, 2024

@ibims1entwickler

  1. Here is an example of grounding dataset. For pure text example, I haven't tried, but it seems you should just not have the key image in the data dictionary. image
  2. It's a circle with a span of 5 for point input in both training and inference. Please see
    if len(coor) == 2:
    # Define window size
    span = 5
    # Make sure the window does not exceed array bounds
    x_min = max(0, coor[0] - span)
    x_max = min(raw_w, coor[0] + span + 1)
    y_min = max(0, coor[1] - span)
    y_max = min(raw_h, coor[1] + span + 1)
    coor_mask[int(x_min):int(x_max), int(y_min):int(y_max)] = 1
    assert (coor_mask==1).any(), f"coor: {coor}, raw_w: {raw_w}, raw_h: {raw_h}"
  3. Yes, such a design allows easier customization.

@orcunderscore
Copy link

@Hxyou thank you.
A question about consistency:

  • In the link you sent with the openscience repo, bounding box coordinates were given as int, in your example here as float. So I guess float is fine.
  • In the example you now provided, bounding boxes are shown via [<bbox_location0>], whereas in the openscience link, they were given without the brackets, i.e. just <bbox_location0>. In the training code, you do this:
    coor_i = f'[{int(raw_coor_i[0])}, {int(raw_coor_i[1])}]'
    (i.e. replace it by bboxes), so did you actually use the example you gave in your previous post in training, i.e. would the model learn to predict [[x1, y1, x2, y2]] (with the double brackets)?

@Hxyou
Copy link

Hxyou commented Feb 13, 2024

@ibims1entwickler Sorry, the previous screenshot I provided is not the version we used for training. The following screenshot is correct, which is in the same style as the one in openscience link. Please ignore token_positive, it's just some useless metadata.
Also, Float or int should all be fine.
image

@dddraxxx
Copy link

Again ,is there any plan to release the training dataset?

@sduwhly
Copy link

sduwhly commented Mar 25, 2024

is there any plan to release the training dataset?

@rajneesh-18
Copy link

Ask one question why this is not a open source?

@kuaileqipaoshui
Copy link

嗨,感谢分享出色的工作!但为什么不打开问题?但无论如何,我尝试在 refcocog 上重现 ferret-7b 评估结果,但重现的结果非常糟糕,5% vs 84%。
我完全按照说明步骤操作。我使用的 vicuna-7b 模型是https://huggingface.co/lmsys/vicuna-7b-v1.3,注释 json 来自https://huggingface.co/GLIPModel/GLIP/tree/66ee3ae9a3b8cee0cf78f10ef5fc9a3725db02a1/mdetr_annotations。我手动检查了生成的 0_of_1.jsonl 中的 bbox,大多数框都是错误的。所以这不是 eval_refexp.py 的问题(顺便说一句,那里缺少第 23 行)。在安装或检查点生成步骤中我没有遇到任何问题或错误,所以有什么建议可能导致不匹配的问题吗?
期待回复。再次感谢您分享这项出色的工作!

你好@peiwang062,感谢您的提问。.jsonl 文件中的框的可视化不正确,因为原始预测文件与图像大小不完全匹配。这里反映的坐标是 0 到 999 之间的坐标。在我们的 eval_refexp.py 中,我们执行调整大小(

resized_box_list = resize_bbox(box_list, img_w, img_h)

)映射回原始图像大小。您能再检查一下吗?

嗨,Haotian,谢谢你的快速回复! 对于eval.refexp.py,因为 misc 不存在。我曾经from torch vision.ops import box_iou将你原来的box_iou和修改过的 168,169 行替换为

        iou = box_iou(predict_boxes, target_bbox)
        mean_iou = box_iou(predict_boxes.mean(0).view(-1, 4), target_bbox)

除此之外,我什么也没做。不确定问题是否出在这里。

请问这个box_iou问题解决了吗?我好像并没有看到缺失文件的修改

@killah-t-cell
Copy link

Any updates on the dataset?

@Xiao-wen-Sun
Copy link

Hi, I am looking forward to the missing file, too. :)
Best,
Xiaowen

嗨,感谢分享出色的工作!但为什么不打开问题?但无论如何,我尝试在 refcocog 上重现 ferret-7b 评估结果,但重现的结果非常糟糕,5% vs 84%。
我完全按照说明步骤操作。我使用的 vicuna-7b 模型是https://huggingface.co/lmsys/vicuna-7b-v1.3,注释 json 来自https://huggingface.co/GLIPModel/GLIP/tree/66ee3ae9a3b8cee0cf78f10ef5fc9a3725db02a1/mdetr_annotations。我手动检查了生成的 0_of_1.jsonl 中的 bbox,大多数框都是错误的。所以这不是 eval_refexp.py 的问题(顺便说一句,那里缺少第 23 行)。在安装或检查点生成步骤中我没有遇到任何问题或错误,所以有什么建议可能导致不匹配的问题吗?
期待回复。再次感谢您分享这项出色的工作!

你好@peiwang062,感谢您的提问。.jsonl 文件中的框的可视化不正确,因为原始预测文件与图像大小不完全匹配。这里反映的坐标是 0 到 999 之间的坐标。在我们的 eval_refexp.py 中,我们执行调整大小(

resized_box_list = resize_bbox(box_list, img_w, img_h)

)映射回原始图像大小。您能再检查一下吗?

嗨,Haotian,谢谢你的快速回复! 对于eval.refexp.py,因为 misc 不存在。我曾经from torch vision.ops import box_iou将你原来的box_iou和修改过的 168,169 行替换为

        iou = box_iou(predict_boxes, target_bbox)
        mean_iou = box_iou(predict_boxes.mean(0).view(-1, 4), target_bbox)

除此之外,我什么也没做。不确定问题是否出在这里。

请问这个box_iou问题解决了吗?我好像并没有看到缺失文件的修改

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.