Fix typo in CONTRIBUTING.md #1

eltociear · 2023-10-12T15:12:30Z

reproducability -> reproducibility

floveqq · 2023-10-14T13:42:32Z

so why not open issues?
when will you release the codes and Ferret-Bench?

HaisongDing · 2023-10-26T06:29:30Z

Any plans to release the dataset?

yangJirui · 2023-10-31T05:38:47Z

how to access the dataset ?

Hxyou · 2023-10-31T16:15:46Z

The training/inference/eval code has been released. FerretBench Evaluation is also included here. But training data and checkpoints seem not ready yet. Let’s stay tuned!

LiWentomng · 2023-11-07T15:14:22Z

@Hxyou Hi, great work. Would you like to release the checkpoint files soon? I can't wait to try it~

zhyj3038 · 2023-11-30T09:44:02Z

why this method can boost the grouding ability? I think the "hybrid region representation" can just enhance refering abilitys... who can explain it? thanks

floveqq · 2023-12-10T16:15:40Z

Why not open the issues?

Hxyou · 2023-12-20T22:39:10Z

Thank you for your patience. Ckpts have been released last week! Feel free to try it~

Hxyou · 2023-12-20T22:43:53Z

why this method can boost the grouding ability? I think the "hybrid region representation" can just enhance refering abilitys... who can explain it? thanks

I think we didn't claim that "hybrid region representation" can help grounding. But instead, in experiments, we ablate whether referring data/task can help grounding when jointly trained, and the answer is yes. I hypothesized that it's because those two tasks all require fine-grained spatial understanding. I.e., by training on one task, the LLM implicitly learns the projection of coordinates and region features to the real location in the image, and thus the other task can also get boosted.

Hxyou · 2023-12-20T22:45:25Z

Why not open the issues?

I am sorry for that. The repo was set up by the company. Feel free to email us or leave comments in this pull request as if raising issues. We will try our best to answer.

peiwang062 · 2023-12-24T20:23:59Z

Hi, thanks for sharing the great work! But why not opening issues? But anyway, I tried to reproduce the ferret-7b evaluation results on refcocog, but my reproduced results are pretty bad, 5% vs 84%.

I exactly followed the instruction steps. The vicuna-7b model I used is https://huggingface.co/lmsys/vicuna-7b-v1.3, and the annotation json is from https://huggingface.co/GLIPModel/GLIP/tree/66ee3ae9a3b8cee0cf78f10ef5fc9a3725db02a1/mdetr_annotations. I manually checked the bbox from the produced 0_of_1.jsonl, most of the boxes are wrong. So it is not the problem of eval_refexp.py (by the way, misc line 23 is missing there). I didn’t encounter any issue or error during installing or checkpoint generation steps, so any suggestion what might be the problem causing the mismatching?

Looking forward to the reply. Thanks for sharing this excellent work again!

Haotian-Zhang · 2023-12-24T20:40:11Z

Hi, thanks for sharing the great work! But why not opening issues? But anyway, I tried to reproduce the ferret-7b evaluation results on refcocog, but my reproduced results are pretty bad, 5% vs 84%.

I exactly followed the instruction steps. The vicuna-7b model I used is https://huggingface.co/lmsys/vicuna-7b-v1.3, and the annotation json is from https://huggingface.co/GLIPModel/GLIP/tree/66ee3ae9a3b8cee0cf78f10ef5fc9a3725db02a1/mdetr_annotations. I manually checked the bbox from the produced 0_of_1.jsonl, most of the boxes are wrong. So it is not the problem of eval_refexp.py (by the way, misc line 23 is missing there). I didn’t encounter any issue or error during installing or checkpoint generation steps, so any suggestion what might be the problem causing the mismatching?

Looking forward to the reply. Thanks for sharing this excellent work again!

Hi @peiwang062 , thanks for the questions. The visualization of the boxes from the .jsonl file is incorrect because the raw prediction file does not exactly match the image size. The coordinates reflected here are the ones in the range between 0 and 999. In our eval_refexp.py, we do that resize (

ml-ferret/ferret/eval/eval_refexp.py

Line 59 in 262a943

resized_box_list = resize_bbox(box_list, img_w, img_h)

) to map back to the original image size. Could you please check that again?

peiwang062 · 2023-12-24T21:25:54Z

Hi, thanks for sharing the great work! But why not opening issues? But anyway, I tried to reproduce the ferret-7b evaluation results on refcocog, but my reproduced results are pretty bad, 5% vs 84%.
I exactly followed the instruction steps. The vicuna-7b model I used is https://huggingface.co/lmsys/vicuna-7b-v1.3, and the annotation json is from https://huggingface.co/GLIPModel/GLIP/tree/66ee3ae9a3b8cee0cf78f10ef5fc9a3725db02a1/mdetr_annotations. I manually checked the bbox from the produced 0_of_1.jsonl, most of the boxes are wrong. So it is not the problem of eval_refexp.py (by the way, misc line 23 is missing there). I didn’t encounter any issue or error during installing or checkpoint generation steps, so any suggestion what might be the problem causing the mismatching?
Looking forward to the reply. Thanks for sharing this excellent work again!

Hi @peiwang062 , thanks for the questions. The visualization of the boxes from the .jsonl file is incorrect because the raw prediction file does not exactly match the image size. The coordinates reflected here are the ones in the range between 0 and 999. In our eval_refexp.py, we do that resize (

ml-ferret/ferret/eval/eval_refexp.py

Line 59 in 262a943

resized_box_list = resize_bbox(box_list, img_w, img_h)

) to map back to the original image size. Could you please check that again?

Hi Haotian, thanks for the quick reply!
For eval.refexp.py, because misc doesn’t exist. I used from torch vision.ops import box_iou to replace your original box_iou and modified Line 168,169 to

        iou = box_iou(predict_boxes, target_bbox)
        mean_iou = box_iou(predict_boxes.mean(0).view(-1, 4), target_bbox)

Other than, I didn’t do anything. Not sure if the problem is here.

Haotian-Zhang · 2023-12-24T21:40:22Z

Hi, thanks for sharing the great work! But why not opening issues? But anyway, I tried to reproduce the ferret-7b evaluation results on refcocog, but my reproduced results are pretty bad, 5% vs 84%.
I exactly followed the instruction steps. The vicuna-7b model I used is https://huggingface.co/lmsys/vicuna-7b-v1.3, and the annotation json is from https://huggingface.co/GLIPModel/GLIP/tree/66ee3ae9a3b8cee0cf78f10ef5fc9a3725db02a1/mdetr_annotations. I manually checked the bbox from the produced 0_of_1.jsonl, most of the boxes are wrong. So it is not the problem of eval_refexp.py (by the way, misc line 23 is missing there). I didn’t encounter any issue or error during installing or checkpoint generation steps, so any suggestion what might be the problem causing the mismatching?
Looking forward to the reply. Thanks for sharing this excellent work again!

Hi @peiwang062 , thanks for the questions. The visualization of the boxes from the .jsonl file is incorrect because the raw prediction file does not exactly match the image size. The coordinates reflected here are the ones in the range between 0 and 999. In our eval_refexp.py, we do that resize (

ml-ferret/ferret/eval/eval_refexp.py

Line 59 in 262a943

resized_box_list = resize_bbox(box_list, img_w, img_h)

) to map back to the original image size. Could you please check that again?

Hi Haotian, thanks for the quick reply! For eval.refexp.py, because misc doesn’t exist. I used from torch vision.ops import box_iou to replace your original box_iou and modified Line 168,169 to
        iou = box_iou(predict_boxes, target_bbox)
        mean_iou = box_iou(predict_boxes.mean(0).view(-1, 4), target_bbox)
Other than, I didn’t do anything. Not sure if the problem is here.

Thanks for the findings! As the company is on holiday these days, I will ask folks to recover the missing files once they are back. In the meantime, I will follow up with you about this file in the email. Thanks a lot for the help!

peiwang062 · 2023-12-24T21:43:24Z

Hi, thanks for sharing the great work! But why not opening issues? But anyway, I tried to reproduce the ferret-7b evaluation results on refcocog, but my reproduced results are pretty bad, 5% vs 84%.
I exactly followed the instruction steps. The vicuna-7b model I used is https://huggingface.co/lmsys/vicuna-7b-v1.3, and the annotation json is from https://huggingface.co/GLIPModel/GLIP/tree/66ee3ae9a3b8cee0cf78f10ef5fc9a3725db02a1/mdetr_annotations. I manually checked the bbox from the produced 0_of_1.jsonl, most of the boxes are wrong. So it is not the problem of eval_refexp.py (by the way, misc line 23 is missing there). I didn’t encounter any issue or error during installing or checkpoint generation steps, so any suggestion what might be the problem causing the mismatching?
Looking forward to the reply. Thanks for sharing this excellent work again!

Hi @peiwang062 , thanks for the questions. The visualization of the boxes from the .jsonl file is incorrect because the raw prediction file does not exactly match the image size. The coordinates reflected here are the ones in the range between 0 and 999. In our eval_refexp.py, we do that resize (

ml-ferret/ferret/eval/eval_refexp.py

Line 59 in 262a943

resized_box_list = resize_bbox(box_list, img_w, img_h)

) to map back to the original image size. Could you please check that again?

Hi Haotian, thanks for the quick reply! For eval.refexp.py, because misc doesn’t exist. I used from torch vision.ops import box_iou to replace your original box_iou and modified Line 168,169 to
        iou = box_iou(predict_boxes, target_bbox)
        mean_iou = box_iou(predict_boxes.mean(0).view(-1, 4), target_bbox)
Other than, I didn’t do anything. Not sure if the problem is here.
Thanks for the findings! As the company is on holiday these days, I will ask folks to recover the missing files once they are back. In the meantime, I will follow up with you about this file in the email. Thanks a lot for the help!

Thank you so much Haotian! Merry Christmas!

bensonbs · 2023-12-27T16:35:15Z

Sorry to ask here, but I need some help.

I've followed the issues but am still encountering the same error:

NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.

Hxyou · 2023-12-27T19:14:00Z

Sorry to ask here, but I need some help.
I've followed the [issues](https://github.com/lm-sys/FastChat/issues/412) but am still encountering the same error:
NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.

Hi @bensonbs , the error might be triggered by multiple reasons.
Can you first double-check you can successfully run those three commands (controller, gradio web server, model worker) as instructed in the readme without errors?
Then can you show us screenshots of those three programs launched by the three commands when the error in demo happens? This can help to debug where the problem comes from.

bensonbs · 2023-12-28T00:52:44Z

Sorry to ask here, but I need some help.

I've followed the issues but am still encountering the same error:
NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.
Hi @bensonbs , the error might be triggered by multiple reasons. Can you first double-check you can successfully run those three commands (controller, gradio web server, model worker) as instructed in the readme without errors? Then can you show us screenshots of those three programs launched by the three commands when the error in demo happens? This can help to debug where the problem comes from.

Here is the result. Thank you for your assistance.

(ferret) root@58dfc909b9e7:/share/ml-ferret# python -m ferret.serve.controller --host 0.0.0.0 --port 10000
[2023-12-28 00:46:15,431] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-12-28 00:46:15 | INFO | controller | args: Namespace(host='0.0.0.0', port=10000, dispatch_method='shortest_queue')
2023-12-28 00:46:15 | INFO | controller | Init controller
2023-12-28 00:46:16 | ERROR | stderr | INFO:     Started server process [412]
2023-12-28 00:46:16 | ERROR | stderr | INFO:     Waiting for application startup.
2023-12-28 00:46:16 | ERROR | stderr | INFO:     Application startup complete.
2023-12-28 00:46:16 | ERROR | stderr | INFO:     Uvicorn running on http://0.0.0.0:10000 (Press CTRL+C to quit)
2023-12-28 00:46:20 | INFO | controller | Receive unknown heart beat. http://localhost:40000
2023-12-28 00:46:20 | INFO | stdout | INFO:     127.0.0.1:58750 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-12-28 00:46:20 | INFO | controller | Register a new worker: http://localhost:40000
2023-12-28 00:46:20 | INFO | controller | Register done: http://localhost:40000, {'model_names': ['FERRET-13B-v0'], 'speed': 1, 'queue_length': 0}
2023-12-28 00:46:20 | INFO | stdout | INFO:     127.0.0.1:58764 - "POST /register_worker HTTP/1.1" 200 OK
2023-12-28 00:46:35 | INFO | controller | Receive heart beat. http://localhost:40000
2023-12-28 00:47:23 | INFO | controller | Register a new worker: http://localhost:40000
2023-12-28 00:47:23 | INFO | controller | Register done: http://localhost:40000, {'model_names': ['FERRET-13B-v0'], 'speed': 1, 'queue_length': 0}
2023-12-28 00:47:23 | INFO | stdout | INFO:     127.0.0.1:45306 - "POST /refresh_all_workers HTTP/1.1" 200 OK
2023-12-28 00:47:23 | INFO | stdout | INFO:     127.0.0.1:45312 - "POST /list_models HTTP/1.1" 200 OK
2023-12-28 00:47:31 | INFO | controller | Register a new worker: http://localhost:40000
2023-12-28 00:47:31 | INFO | controller | Register done: http://localhost:40000, {'model_names': ['FERRET-13B-v0'], 'speed': 1, 'queue_length': 0}
2023-12-28 00:47:31 | INFO | stdout | INFO:     127.0.0.1:45318 - "POST /refresh_all_workers HTTP/1.1" 200 OK
2023-12-28 00:47:31 | INFO | stdout | INFO:     127.0.0.1:45324 - "POST /list_models HTTP/1.1" 200 OK
2023-12-28 00:47:35 | INFO | controller | Receive heart beat. http://localhost:40000
2023-12-28 00:47:35 | INFO | stdout | INFO:     127.0.0.1:41430 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2023-12-28 00:47:43 | INFO | controller | names: ['http://localhost:40000'], queue_lens: [0.0], ret: http://localhost:40000

(ferret) root@58dfc909b9e7:/share/ml-ferret# python -m ferret.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --add_region_feature --port 8501
[2023-12-28 00:47:22,094] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-12-28 00:47:23 | INFO | gradio_web_server | args: Namespace(host='0.0.0.0', port=8501, controller_url='http://localhost:10000', concurrency_count=8, model_list_mode='reload', share=False, moderate=False, embed=False, add_region_feature=True)
2023-12-28 00:47:23 | INFO | gradio_web_server | Models: ['FERRET-13B-v0']
2023-12-28 00:47:23 | INFO | gradio_web_server | Namespace(host='0.0.0.0', port=8501, controller_url='http://localhost:10000', concurrency_count=8, model_list_mode='reload', share=False, moderate=False, embed=False, add_region_feature=True)
2023-12-28 00:47:23 | ERROR | stderr | /root/miniconda3/envs/ferret/lib/python3.10/site-packages/gradio/deprecation.py:43: UserWarning: You have unused kwarg parameters in Textbox, please remove them: {'container': False}
2023-12-28 00:47:23 | ERROR | stderr |   warnings.warn(
2023-12-28 00:47:25 | ERROR | stderr | /root/miniconda3/envs/ferret/lib/python3.10/site-packages/gradio/deprecation.py:43: UserWarning: You have unused kwarg parameters in Dropdown, please remove them: {'container': False}
2023-12-28 00:47:25 | ERROR | stderr |   warnings.warn(
2023-12-28 00:47:25 | INFO | stdout | Running on local URL:  http://0.0.0.0:8501
2023-12-28 00:47:25 | INFO | stdout | 
2023-12-28 00:47:25 | INFO | stdout | To create a public link, set `share=True` in `launch()`.
2023-12-28 00:47:31 | INFO | gradio_web_server | load_demo. ip: 172.21.0.1
2023-12-28 00:47:31 | INFO | gradio_web_server | Models: ['FERRET-13B-v0']
2023-12-28 00:47:34 | INFO | stdout | Init Uploading Images.
2023-12-28 00:47:43 | INFO | gradio_web_server | add_text. ip: 172.21.0.1. len: 13
2023-12-28 00:47:43 | INFO | stdout | No location, copy original image in add_text
2023-12-28 00:47:43 | INFO | gradio_web_server | http_bot. ip: 172.21.0.1
2023-12-28 00:47:43 | INFO | gradio_web_server | model_name: FERRET-13B-v0, worker_addr: http://localhost:40000
2023-12-28 00:47:43 | INFO | stdout | Input Image Size:(512, 512)
2023-12-28 00:47:43 | INFO | stdout | Input Image Size:(512, 512)
2023-12-28 00:47:43 | INFO | gradio_web_server | ==== request ====
{'model': 'FERRET-13B-v0', 'prompt': 'A chat between a human and an AI that understands visuals. In images, [x, y] denotes points: top-left [0, 0], bottom-right [width-1, height-1]. Increasing x moves right; y moves down. Bounding box: [x1, y1, x2, y2]. Image size: 1000x1000. Follow instructions.  USER: <image>\nwhat is this? ASSISTANT:', 'temperature': 0.2, 'top_p': 0.7, 'max_new_tokens': 512, 'stop': '</s>', 'images': "List of 1 images: ['195205623896f712c8831c15be32a339']"}
2023-12-28 00:47:43 | INFO | gradio_web_server | ==== add region_masks_in_prompts to request ====

2023-12-28 00:47:43 | INFO | stdout | Input Image Size:(512, 512)
2023-12-28 00:47:43 | INFO | stdout | Input Prompt: A chat between a human and an AI that understands visuals. In images, [x, y] denotes points: top-left [0, 0], bottom-right [width-1, height-1]. Increasing x moves right; y moves down. Bounding box: [x1, y1, x2, y2]. Image size: 1000x1000. Follow instructions.  USER: <image>
2023-12-28 00:47:43 | INFO | stdout | what is this? ASSISTANT:

(base) root@58dfc909b9e7:/share/ml-ferret# conda activate ferret
(ferret) root@58dfc909b9e7:/share/ml-ferret# CUDA_VISIBLE_DEVICES=0 python -m ferret.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ./checkpoints/FERRET-13B-v0 --add_region_feature
[2023-12-27 16:29:26,569] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-12-27 16:29:27 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40000, worker_address='http://localhost:40000', controller_address='http://localhost:10000', model_path='./checkpoints/FERRET-13B-v0', model_base=None, model_name=None, multi_modal=False, keep_aspect_ratio=False, num_gpus=1, limit_model_concurrency=5, stream_interval=1, no_register=False, load_8bit=False, load_4bit=False, add_region_feature=True, image_w=336, image_h=336)
2023-12-27 16:29:27 | INFO | model_worker | Loading the model FERRET-13B-v0 on worker 9be48e ...
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Loading checkpoint shards:   0%|                                                                                          | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards:  33%|███████████████████████████▎                                                      | 1/3 [00:19<00:39, 19.54s/it]
Loading checkpoint shards:  67%|██████████████████████████████████████████████████████▋                           | 2/3 [00:35<00:17, 17.59s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████| 3/3 [00:45<00:00, 13.85s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████| 3/3 [00:45<00:00, 15.06s/it]
2023-12-27 16:30:14 | ERROR | stderr | 
2023-12-27 16:30:16 | INFO | model_worker | Register to controller
2023-12-27 16:30:16 | ERROR | stderr | INFO:     Started server process [243]
2023-12-27 16:30:16 | ERROR | stderr | INFO:     Waiting for application startup.
2023-12-27 16:30:16 | ERROR | stderr | INFO:     Application startup complete.
2023-12-27 16:30:16 | ERROR | stderr | INFO:     Uvicorn running on http://0.0.0.0:40000 (Press CTRL+C to quit)
2023-12-27 16:31:04 | INFO | stdout | INFO:     127.0.0.1:46046 - "POST /worker_get_status HTTP/1.1" 200 OK
2023-12-27 16:31:18 | INFO | stdout | INFO:     127.0.0.1:33794 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2023-12-27 16:31:18 | INFO | model_worker | Add region_masks to image_args.
2023-12-27 16:37:48 | INFO | stdout | INFO:     127.0.0.1:38944 - "POST /worker_get_status HTTP/1.1" 200 OK
2023-12-27 16:37:59 | INFO | stdout | INFO:     127.0.0.1:56904 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2023-12-27 16:38:00 | INFO | model_worker | Add region_masks to image_args.
2023-12-28 00:46:20 | INFO | model_worker | Register to controller
2023-12-28 00:47:23 | INFO | stdout | INFO:     127.0.0.1:43410 - "POST /worker_get_status HTTP/1.1" 200 OK
2023-12-28 00:47:31 | INFO | stdout | INFO:     127.0.0.1:43426 - "POST /worker_get_status HTTP/1.1" 200 OK
2023-12-28 00:47:43 | INFO | stdout | INFO:     127.0.0.1:48710 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2023-12-28 00:47:43 | INFO | model_worker | Add region_masks to image_args.

Hxyou · 2023-12-28T04:26:20Z

@bensonbs Your logs look fine to me. It seems model_worker can successfully receive input since Add region_masks to image_args is printed and the generation process doesn't raise any error. I also tried your commands in my sever and it works well.
Can you try to (1). make sure the model is selected in the model list of demo webpage (as highlighted in the figure) (2). click the regenerate button. (3). refresh the demo webpage (4). try different ports in case there is a conflict with other processes.

bensonbs · 2023-12-28T06:08:46Z

@bensonbs Your logs look fine to me. It seems model_worker can successfully receive input since Add region_masks to image_args is printed and the generation process doesn't raise any error. I also tried your commands in my sever and it works well. Can you try to (1). make sure the model is selected in the model list of demo webpage (as highlighted in the figure) (2). click the regenerate button. (3). refresh the demo webpage (4). try different ports in case there is a conflict with other processes.

After a whole day of trying, I think it's 'doge' causing the trouble.🫠

bensonbs · 2023-12-28T10:45:14Z

root@46290910f996:~# python3 -m ferret.model.apply_delta     --base ./model/vicuna-13b-v1-3     --target model/ferret-13b-v1-3     --delta model/ferret-13b-delta
[2023-12-28 10:44:30,138] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading base model
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/root/ferret/model/apply_delta.py", line 69, in <module>
    apply_delta(args.base_model_path, args.target_model_path, args.delta_path)
  File "/root/ferret/model/apply_delta.py", line 36, in apply_delta
    base = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 441, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 926, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 632, in __getitem__
    raise KeyError(key)
KeyError: 'llava'

ronnymunthe99 · 2023-12-28T11:41:26Z

root@46290910f996:~# python3 -m ferret.model.apply_delta     --base ./model/vicuna-13b-v1-3     --target model/ferret-13b-v1-3     --delta model/ferret-13b-delta
[2023-12-28 10:44:30,138] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading base model
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/root/ferret/model/apply_delta.py", line 69, in <module>
    apply_delta(args.base_model_path, args.target_model_path, args.delta_path)
  File "/root/ferret/model/apply_delta.py", line 36, in apply_delta
    base = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 441, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 926, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 632, in __getitem__
    raise KeyError(key)
KeyError: 'llava'

I also got this error when I tried to use a pre-trained llava model, and I tried changing the model type in config.json from "llava" to "llama", and its works for me

Hxyou · 2023-12-28T17:42:33Z

root@46290910f996:~# python3 -m ferret.model.apply_delta     --base ./model/vicuna-13b-v1-3     --target model/ferret-13b-v1-3     --delta model/ferret-13b-delta
[2023-12-28 10:44:30,138] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading base model
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/root/ferret/model/apply_delta.py", line 69, in <module>
    apply_delta(args.base_model_path, args.target_model_path, args.delta_path)
  File "/root/ferret/model/apply_delta.py", line 36, in apply_delta
    base = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 441, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 926, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 632, in __getitem__
    raise KeyError(key)
KeyError: 'llava'

@bensonbs It seems that your base model (./model/vicuna-13b-v1-3) actually has its "model_type" in config.json as 'llava'. Can you double-check your vicuna weights? It should use "llama" as its "model_type". When applying delta, we don't need any weights from LLaVA and only need the original vicuna.

crabmon · 2024-01-09T07:35:41Z

i got error while i was running the codes for checkpoints:

python3 -m ferret.model.apply_delta
--base /model/vicuna-7b-v1-3
--target /model/ferret-7b-v1-3
--delta /model/ferret-7b-delta

the error is: pickle.UnpicklingError: invalid load key, 'v'.

i've placed vicuna-7b-v1-3 and ferret-7b-delta both under model folder. i couldnt find ferret-7b-v1-3 so i created an empty folder with the same name. how can i get pass this error?

Hxyou · 2024-01-12T03:20:23Z

i got error while i was running the codes for checkpoints:

python3 -m ferret.model.apply_delta --base /model/vicuna-7b-v1-3 --target /model/ferret-7b-v1-3 --delta /model/ferret-7b-delta

the error is: pickle.UnpicklingError: invalid load key, 'v'.

i've placed vicuna-7b-v1-3 and ferret-7b-delta both under model folder. i couldnt find ferret-7b-v1-3 so i created an empty folder with the same name. how can i get pass this error?

@crabmon Hi, placing vicuna and ferret-delta in the same model folder and creating an empty folder of ferret-7b-v1-3 works well from my side. Can you provide more details of the error? such as screenshots of command/error.

crabmon · 2024-01-12T09:29:58Z

hi @Hxyou, thank you for your reply. i managed to get pass that step. now im stucked again. it says NETWORK error due to HIGH traffic.....

from terminal, it says caught unknown error CUDNN error: cudnn_status_internal_error.

any clue?

Hxyou · 2024-01-12T15:45:26Z

hi @Hxyou, thank you for your reply. i managed to get pass that step. now im stucked again. it says NETWORK error due to HIGH traffic.....

from terminal, it says caught unknown error CUDNN error: cudnn_status_internal_error.

any clue?

@crabmon Can you show us a screenshot of the terminal errors? and What is the GPU you are using? What is the version of pytorch and cuda?

crabmon · 2024-01-16T06:24:00Z

hi @Hxyou, thank you for your reply. i managed to get pass that step. now im stucked again. it says NETWORK error due to HIGH traffic.....
from terminal, it says caught unknown error CUDNN error: cudnn_status_internal_error.
any clue?

@crabmon Can you show us a screenshot of the terminal errors? and What is the GPU you are using? What is the version of pytorch and cuda?

@Hxyou , please refer to the screen capture before.

im using 3090 , Cuda version 12 and Driver Version 525.147.05

Hxyou · 2024-01-22T20:59:35Z

@crabmon CUDNN_STATUS_INTERNAL_ERROR is typically hard to debug since it gives no real error message. But in many cases, it's due to OOM. 3090 only has 24GB memory and a 13B model often consumes more than 20GB memory. Are you trying a 7b or 13b model?

orcunderscore · 2024-01-23T15:12:50Z

Hi there @Hxyou ,

I was wondering about the dataset.

Right I can only find a few datapoints in the eval set, which I assume at least have the same format as the rest of the dataset (?).

I can see that some loading of different datasources is happening in https://github.com/apple/ml-ferret/blob/main/ferret/train/train.py, but since the repos main README explicitly mentions that the dataset has it's own license, I was expecting that I can download it as is somewhere.

Is there a timeline on when it will be released publicly? Or is there a link to download it somewhere or a script to basically reconstruct it using a standalone script and download instructions for the other datasets it is based on?

Thanks!

Hxyou · 2024-01-26T03:51:00Z

Hi there @Hxyou ,

I was wondering about the dataset.

Right I can only find a few datapoints in the eval set, which I assume at least have the same format as the rest of the dataset (?).

I can see that some loading of different datasources is happening in https://github.com/apple/ml-ferret/blob/main/ferret/train/train.py, but since the repos main README explicitly mentions that the dataset has it's own license, I was expecting that I can download it as is somewhere.

Is there a timeline on when it will be released publicly? Or is there a link to download it somewhere or a script to basically reconstruct it using a standalone script and download instructions for the other datasets it is based on?

Thanks!

@ibims1entwickler Thank you for your interest. Since we released Ferret-Edit evaluation data, the license mainly applies to that data. As for training data, it's still under internal review and we're not able to promise an exact time. I think providing the preparation scripts is a good idea and we will discuss it.

orcunderscore · 2024-02-06T12:56:43Z

Hi there @Hxyou ,
I was wondering about the dataset.
Right I can only find a few datapoints in the eval set, which I assume at least have the same format as the rest of the dataset (?).
I can see that some loading of different datasources is happening in https://github.com/apple/ml-ferret/blob/main/ferret/train/train.py, but since the repos main README explicitly mentions that the dataset has it's own license, I was expecting that I can download it as is somewhere.
Is there a timeline on when it will be released publicly? Or is there a link to download it somewhere or a script to basically reconstruct it using a standalone script and download instructions for the other datasets it is based on?
Thanks!

@ibims1entwickler Thank you for your interest. Since we released Ferret-Edit evaluation data, the license mainly applies to that data. As for training data, it's still under internal review and we're not able to promise an exact time. I think providing the preparation scripts is a good idea and we will discuss it.

@Hxyou , could you provide examples for the expected input format for training then (can be mock data).
I am specifically wondering, how data are grouped.

In LLaVA as far as I understand, the input format is a json file like this:

[
  {
    "id": "000000000001",
    "image": "image_name.png",
    "conversations": [
      {
        "from": "human",
        "value": "What do you see happening in this image?\n<image>"
      },
      {
        "from": "gpt",
        "value": "Answer about what happens in the image."
      },
      {
        "from": "human",
        "value": "Another question about the image?"
      },
      {
        "from": "gpt",
        "value": "Answer to the other question."
      },
...
    ]
  },
...
]

So

Data are grouped in conversations that belong to an image
the image is only mentioned in the first question-prompt

From the inference code in this repo, I can already guess that in FERRET:

In the prompt, points should be referred to like this [x, y] <region_fea> and bounding boxes and masks like [x1, y1, x2, y2] <region_fea>.

Open questions:

Can you provide examples for the dataformat? [mock example for all possible input modes (point, bbox, mask)]
How does the dataloader expect to find the binary masks that are required as input to the model?

Seeing that in the dataloader this might be different depending on the dataset, can you provide examples for one of the datasets that can do points, mask and bbox?
I am trying to train on my own data, but it is really hard to figure out the training input format.

Thank you!

Hxyou · 2024-02-07T17:07:45Z

@ibims1entwickler Hi, here are some data samples you can refer to for the exact dataformat. https://anonymous.4open.science/r/ferret-anonymous-6773/training_data.md
Note: The box_x1y1x2y2 is the box list where the coordinates are in line with the original image's width and height. The masks are also global image masks in the same shape as the original image's width and height. The number of list in box_x1y1x2y2 and masks corresponds to the number of messages in conversations (if you see an empty list [], it means the corresponding message doesn't include any location).

The binary masks are encoded by cocoapi (https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/mask.py). If the filed masks are provided in the data JSON file, then the dataloader should be able to load it.

orcunderscore · 2024-02-08T15:36:15Z

@Hxyou thank you for that, that was really helpful, I'll try to mock something up to start a mock training with that tomorrow. A few follow up questions:

The link shows no example of grounding (i.e. not having the bbox in the "from": "human" but having one in the "from": "gpt") and also no "text-only" conversations. I think I can trivially extend to the grounding example. Should I set "location_instruction": false, for a pure text example?
In the example, you show masks and boxes. What about points? Are they just represented as a mask with one single pixel non-zero? Or are they supposed to be circles with a small radius as mentioned in the paper [this does not happen during inference]?
When I use my own dataset, I'll have to add a loading function for it to LazySupervisedDataset, right?

Hxyou · 2024-02-10T15:32:06Z

@ibims1entwickler

Here is an example of grounding dataset. For pure text example, I haven't tried, but it seems you should just not have the key image in the data dictionary.

It's a circle with a span of 5 for point input in both training and inference. Please see

ml-ferret/ferret/eval/model_lvis.py

Lines 60 to 69 in 262a943

    
           if len(coor) == 2: 
        
               # Define window size 
        
               span = 5 
        
               # Make sure the window does not exceed array bounds 
        
               x_min = max(0, coor[0] - span) 
        
               x_max = min(raw_w, coor[0] + span + 1) 
        
               y_min = max(0, coor[1] - span) 
        
               y_max = min(raw_h, coor[1] + span + 1) 
        
               coor_mask[int(x_min):int(x_max), int(y_min):int(y_max)] = 1 
        
               assert (coor_mask==1).any(), f"coor: {coor}, raw_w: {raw_w}, raw_h: {raw_h}"

Yes, such a design allows easier customization.

orcunderscore · 2024-02-12T08:02:26Z

@Hxyou thank you.
A question about consistency:

In the link you sent with the openscience repo, bounding box coordinates were given as int, in your example here as float. So I guess float is fine.
In the example you now provided, bounding boxes are shown via [<bbox_location0>], whereas in the openscience link, they were given without the brackets, i.e. just <bbox_location0>. In the training code, you do this:

ml-ferret/ferret/train/train.py

Line 1017 in 262a943

coor_i = f'[{int(raw_coor_i[0])}, {int(raw_coor_i[1])}]'

(i.e. replace it by bboxes), so did you actually use the example you gave in your previous post in training, i.e. would the model learn to predict [[x1, y1, x2, y2]] (with the double brackets)?

Hxyou · 2024-02-13T08:57:30Z

@ibims1entwickler Sorry, the previous screenshot I provided is not the version we used for training. The following screenshot is correct, which is in the same style as the one in openscience link. Please ignore token_positive, it's just some useless metadata.
Also, Float or int should all be fine.

dddraxxx · 2024-03-14T06:51:08Z

Again ,is there any plan to release the training dataset?

sduwhly · 2024-03-25T07:02:36Z

is there any plan to release the training dataset?

rajneesh-18 · 2024-07-11T14:36:54Z

Ask one question why this is not a open source?

kuaileqipaoshui · 2024-10-08T02:45:25Z

嗨，感谢分享出色的工作！但为什么不打开问题？但无论如何，我尝试在 refcocog 上重现 ferret-7b 评估结果，但重现的结果非常糟糕，5% vs 84%。
我完全按照说明步骤操作。我使用的 vicuna-7b 模型是https://huggingface.co/lmsys/vicuna-7b-v1.3，注释 json 来自https://huggingface.co/GLIPModel/GLIP/tree/66ee3ae9a3b8cee0cf78f10ef5fc9a3725db02a1/mdetr_annotations。我手动检查了生成的 0_of_1.jsonl 中的 bbox，大多数框都是错误的。所以这不是 eval_refexp.py 的问题（顺便说一句，那里缺少第 23 行）。在安装或检查点生成步骤中我没有遇到任何问题或错误，所以有什么建议可能导致不匹配的问题吗？
期待回复。再次感谢您分享这项出色的工作！

你好@peiwang062，感谢您的提问。.jsonl 文件中的框的可视化不正确，因为原始预测文件与图像大小不完全匹配。这里反映的坐标是 0 到 999 之间的坐标。在我们的 eval_refexp.py 中，我们执行调整大小（

ml-ferret/ferret/eval/eval_refexp.py

Line 59 in 262a943

resized_box_list = resize_bbox(box_list, img_w, img_h)

）映射回原始图像大小。您能再检查一下吗？

嗨，Haotian，谢谢你的快速回复！对于eval.refexp.py，因为 misc 不存在。我曾经from torch vision.ops import box_iou将你原来的box_iou和修改过的 168,169 行替换为
        iou = box_iou(predict_boxes, target_bbox)
        mean_iou = box_iou(predict_boxes.mean(0).view(-1, 4), target_bbox)
除此之外，我什么也没做。不确定问题是否出在这里。

请问这个box_iou问题解决了吗？我好像并没有看到缺失文件的修改

killah-t-cell · 2024-10-29T22:10:42Z

Any updates on the dataset?

Xiao-wen-Sun · 2024-12-06T15:48:37Z

Hi, I am looking forward to the missing file, too. :)
Best,
Xiaowen

嗨，感谢分享出色的工作！但为什么不打开问题？但无论如何，我尝试在 refcocog 上重现 ferret-7b 评估结果，但重现的结果非常糟糕，5% vs 84%。
我完全按照说明步骤操作。我使用的 vicuna-7b 模型是https://huggingface.co/lmsys/vicuna-7b-v1.3，注释 json 来自https://huggingface.co/GLIPModel/GLIP/tree/66ee3ae9a3b8cee0cf78f10ef5fc9a3725db02a1/mdetr_annotations。我手动检查了生成的 0_of_1.jsonl 中的 bbox，大多数框都是错误的。所以这不是 eval_refexp.py 的问题（顺便说一句，那里缺少第 23 行）。在安装或检查点生成步骤中我没有遇到任何问题或错误，所以有什么建议可能导致不匹配的问题吗？
期待回复。再次感谢您分享这项出色的工作！

你好@peiwang062，感谢您的提问。.jsonl 文件中的框的可视化不正确，因为原始预测文件与图像大小不完全匹配。这里反映的坐标是 0 到 999 之间的坐标。在我们的 eval_refexp.py 中，我们执行调整大小（

ml-ferret/ferret/eval/eval_refexp.py

Line 59 in 262a943

resized_box_list = resize_bbox(box_list, img_w, img_h)

）映射回原始图像大小。您能再检查一下吗？

嗨，Haotian，谢谢你的快速回复！对于eval.refexp.py，因为 misc 不存在。我曾经from torch vision.ops import box_iou将你原来的box_iou和修改过的 168,169 行替换为
        iou = box_iou(predict_boxes, target_bbox)
        mean_iou = box_iou(predict_boxes.mean(0).view(-1, 4), target_bbox)
除此之外，我什么也没做。不确定问题是否出在这里。
请问这个box_iou问题解决了吗？我好像并没有看到缺失文件的修改

Fix typo in CONTRIBUTING.md

949854a

reproducability -> reproducibility

Fix typo in CONTRIBUTING.md #1

Are you sure you want to change the base?

Fix typo in CONTRIBUTING.md #1

Conversation

eltociear commented Oct 12, 2023

floveqq commented Oct 14, 2023

HaisongDing commented Oct 26, 2023

yangJirui commented Oct 31, 2023

Hxyou commented Oct 31, 2023

LiWentomng commented Nov 7, 2023

zhyj3038 commented Nov 30, 2023

floveqq commented Dec 10, 2023

Hxyou commented Dec 20, 2023

Hxyou commented Dec 20, 2023

Hxyou commented Dec 20, 2023

peiwang062 commented Dec 24, 2023

Haotian-Zhang commented Dec 24, 2023

peiwang062 commented Dec 24, 2023

Haotian-Zhang commented Dec 24, 2023

peiwang062 commented Dec 24, 2023

bensonbs commented Dec 27, 2023

Hxyou commented Dec 27, 2023

bensonbs commented Dec 28, 2023

Hxyou commented Dec 28, 2023

bensonbs commented Dec 28, 2023 • edited Loading

bensonbs commented Dec 28, 2023

ronnymunthe99 commented Dec 28, 2023 • edited Loading

Hxyou commented Dec 28, 2023 • edited Loading

crabmon commented Jan 9, 2024

Hxyou commented Jan 12, 2024

crabmon commented Jan 12, 2024

Hxyou commented Jan 12, 2024

crabmon commented Jan 16, 2024

Hxyou commented Jan 22, 2024

orcunderscore commented Jan 23, 2024 • edited Loading

Hxyou commented Jan 26, 2024

orcunderscore commented Feb 6, 2024 • edited Loading

Hxyou commented Feb 7, 2024 • edited Loading

orcunderscore commented Feb 8, 2024

Hxyou commented Feb 10, 2024

orcunderscore commented Feb 12, 2024

Hxyou commented Feb 13, 2024 • edited Loading

dddraxxx commented Mar 14, 2024

sduwhly commented Mar 25, 2024

rajneesh-18 commented Jul 11, 2024

kuaileqipaoshui commented Oct 8, 2024

killah-t-cell commented Oct 29, 2024

Xiao-wen-Sun commented Dec 6, 2024

bensonbs commented Dec 28, 2023 •

edited

Loading

ronnymunthe99 commented Dec 28, 2023 •

edited

Loading

Hxyou commented Dec 28, 2023 •

edited

Loading

orcunderscore commented Jan 23, 2024 •

edited

Loading

orcunderscore commented Feb 6, 2024 •

edited

Loading

Hxyou commented Feb 7, 2024 •

edited

Loading

Hxyou commented Feb 13, 2024 •

edited

Loading