-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix typo in CONTRIBUTING.md #1
base: main
Are you sure you want to change the base?
Conversation
reproducability -> reproducibility
so why not open issues? |
Any plans to release the dataset? |
how to access the dataset ? |
The training/inference/eval code has been released. FerretBench Evaluation is also included here. But training data and checkpoints seem not ready yet. Let’s stay tuned! |
@Hxyou Hi, great work. Would you like to release the checkpoint files soon? I can't wait to try it~ |
why this method can boost the grouding ability? I think the "hybrid region representation" can just enhance refering abilitys... who can explain it? thanks |
Why not open the issues? |
Thank you for your patience. Ckpts have been released last week! Feel free to try it~ |
I think we didn't claim that "hybrid region representation" can help grounding. But instead, in experiments, we ablate whether referring data/task can help grounding when jointly trained, and the answer is yes. I hypothesized that it's because those two tasks all require fine-grained spatial understanding. I.e., by training on one task, the LLM implicitly learns the projection of coordinates and region features to the real location in the image, and thus the other task can also get boosted. |
I am sorry for that. The repo was set up by the company. Feel free to email us or leave comments in this pull request as if raising issues. We will try our best to answer. |
Hi, thanks for sharing the great work! But why not opening issues? But anyway, I tried to reproduce the ferret-7b evaluation results on refcocog, but my reproduced results are pretty bad, 5% vs 84%. I exactly followed the instruction steps. The vicuna-7b model I used is https://huggingface.co/lmsys/vicuna-7b-v1.3, and the annotation json is from https://huggingface.co/GLIPModel/GLIP/tree/66ee3ae9a3b8cee0cf78f10ef5fc9a3725db02a1/mdetr_annotations. I manually checked the bbox from the produced 0_of_1.jsonl, most of the boxes are wrong. So it is not the problem of eval_refexp.py (by the way, misc line 23 is missing there). I didn’t encounter any issue or error during installing or checkpoint generation steps, so any suggestion what might be the problem causing the mismatching? Looking forward to the reply. Thanks for sharing this excellent work again! |
Hi @peiwang062 , thanks for the questions. The visualization of the boxes from the .jsonl file is incorrect because the raw prediction file does not exactly match the image size. The coordinates reflected here are the ones in the range between 0 and 999. In our eval_refexp.py, we do that resize ( ml-ferret/ferret/eval/eval_refexp.py Line 59 in 262a943
|
Hi Haotian, thanks for the quick reply!
Other than, I didn’t do anything. Not sure if the problem is here. |
Thanks for the findings! As the company is on holiday these days, I will ask folks to recover the missing files once they are back. In the meantime, I will follow up with you about this file in the email. Thanks a lot for the help! |
Thank you so much Haotian! Merry Christmas! |
Sorry to ask here, but I need some help. I've followed the issues but am still encountering the same error:
|
Hi @bensonbs , the error might be triggered by multiple reasons. |
Here is the result. Thank you for your assistance.
|
@bensonbs Your logs look fine to me. It seems model_worker can successfully receive input since |
After a whole day of trying, I think it's 'doge' causing the trouble.🫠 |
|
@bensonbs It seems that your base model (./model/vicuna-13b-v1-3) actually has its "model_type" in config.json as 'llava'. Can you double-check your vicuna weights? It should use "llama" as its "model_type". When applying delta, we don't need any weights from LLaVA and only need the original vicuna. |
i got error while i was running the codes for checkpoints: python3 -m ferret.model.apply_delta the error is: pickle.UnpicklingError: invalid load key, 'v'. i've placed vicuna-7b-v1-3 and ferret-7b-delta both under model folder. i couldnt find ferret-7b-v1-3 so i created an empty folder with the same name. how can i get pass this error? |
@crabmon Hi, |
hi @Hxyou, thank you for your reply. i managed to get pass that step. now im stucked again. it says NETWORK error due to HIGH traffic..... from terminal, it says caught unknown error CUDNN error: cudnn_status_internal_error. any clue? |
@crabmon Can you show us a screenshot of the terminal errors? and What is the GPU you are using? What is the version of pytorch and cuda? |
@Hxyou , please refer to the screen capture before. im using 3090 , Cuda version 12 and Driver Version 525.147.05 |
@crabmon CUDNN_STATUS_INTERNAL_ERROR is typically hard to debug since it gives no real error message. But in many cases, it's due to OOM. 3090 only has 24GB memory and a 13B model often consumes more than 20GB memory. Are you trying a 7b or 13b model? |
Hi there @Hxyou , I was wondering about the dataset. Right I can only find a few datapoints in the eval set, which I assume at least have the same format as the rest of the dataset (?). I can see that some loading of different datasources is happening in https://github.com/apple/ml-ferret/blob/main/ferret/train/train.py, but since the repos main README explicitly mentions that the dataset has it's own license, I was expecting that I can download it as is somewhere. Is there a timeline on when it will be released publicly? Or is there a link to download it somewhere or a script to basically reconstruct it using a standalone script and download instructions for the other datasets it is based on? Thanks! |
@ibims1entwickler Thank you for your interest. Since we released Ferret-Edit evaluation data, the license mainly applies to that data. As for training data, it's still under internal review and we're not able to promise an exact time. I think providing the preparation scripts is a good idea and we will discuss it. |
@Hxyou , could you provide examples for the expected input format for training then (can be mock data). In LLaVA as far as I understand, the input format is a json file like this:
So
From the inference code in this repo, I can already guess that in FERRET:
Open questions:
Seeing that in the dataloader this might be different depending on the dataset, can you provide examples for one of the datasets that can do points, mask and bbox? Thank you! |
@ibims1entwickler Hi, here are some data samples you can refer to for the exact dataformat. https://anonymous.4open.science/r/ferret-anonymous-6773/training_data.md The binary masks are encoded by cocoapi (https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/mask.py). If the filed |
@Hxyou thank you for that, that was really helpful, I'll try to mock something up to start a mock training with that tomorrow. A few follow up questions:
|
@ibims1entwickler
|
@Hxyou thank you.
|
Again ,is there any plan to release the training dataset? |
is there any plan to release the training dataset? |
Ask one question why this is not a open source? |
请问这个box_iou问题解决了吗?我好像并没有看到缺失文件的修改 |
Any updates on the dataset? |
Hi, I am looking forward to the missing file, too. :)
|
reproducability -> reproducibility