-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Apple's Depth-Pro #34583
base: main
Are you sure you want to change the base?
Conversation
I have implemented the foundational components of the model and manually loaded the weights to ensure that the architecture aligns with the original design and produces consistent output. Below is a concise overview of the class hierarchy. I would greatly appreciate your feedback or any suggestions for improvements:
I have a couple of questions:
|
cc @pcuenca as well! |
Hi @geetu040! Thanks for working on this model! Regarding model outputs they should be written if you want to add a new argument or write better docs. In case of intermediate outputs you can store them in |
@qubvel @pcuenca Thanks, I have updated the code for hidden_states. I still need an opinion with The existing class DepthEstimatorOutput(ModelOutput):
loss: Optional[torch.FloatTensor] = None
predicted_depth: torch.FloatTensor = None
hidden_states: Optional[Tuple[torch.FloatTensor, ...]] = None
attentions: Optional[Tuple[torch.FloatTensor, ...]] = None Q1: Do I create a new class |
Thanks @geetu040 Q1: class DepthProDepthEstimatorOutput(DepthEstimatorOutput):
fov: Optional[torch.FloatTensor] = None This output can be returned in both cases: Q2: Yeah, this can be a parameter of the config, but also should be an argument in Please, let me know if you have more questions! |
This needs to be done during |
OK, got it! Then it should be done with config! And anyone can just load a model as following: model = DepthProForDepthEstimation(checkpoint, fov_model=True)
# or
model = DepthProForDepthEstimation(checkpoint, fov_model=False) With such initialization |
I was wondering can we also give this option to the users to decide which scales to use, for example, a user tells in config to use these custom scales
@qubvel I have looked into the code how this can be implemented, it is do-able and I can easily make this option available and I would prefer that, but I have to ask you as well, do you think this option should be given to the users? |
Hi @geetu040, we try to avoid overcomplicated code with lots of parameters, the general rule is to get rid of different code paths / unused params that are not different across pretrained checkpoints. For this particular case, feel free to add it, but only in case it will not introduce extra complexity to the modeling code. |
Hi @qubvel I have a question about the image processor. the source code from this causes the two outputs to be slightly different from each other. do you suggest I stay with the convention and ignore the minor difference in output or I make the implementation exactly like the source code, I am not very sure how to do this because the original Here are the outputs Different in Outputs there is a slight difference, this happens because of how the image is pre-processed before being given to the model Source code results
HF code results
Difference in Output Image visually no difference in the 2 images |
Also how does the weight conversion work? I have created the script for weight conversion, but when and who uploads that on huggingface? because I would need these converted weights for examples in docstring. |
Thanks for applying the changes! 🤗 We’re all set to proceed with the model. Before merging, there’s just one thing left: we usually transfer the checkpoint to the organization that released the original model, in this case - Apple, but only if you’re okay with that. If you're fine with it, we can go ahead with the transfer. However, before that we’ll need to:
Let us know how you’d like to proceed! 😊 |
I am okay with that.
I have updated the hub repository name: https://huggingface.co/geetu040/depth-pro-hf |
And Thank you @qubvel for all the fine detailed reviews. |
Thanks, can you please also update repo_id to "apple" everywhere? Like "apple/depth-pro-hf" |
updated |
@qubvel, |
Yes, let's add |
I've updated the test and its working fine now |
Thanks! woking on checkpoint transfer |
run-slow: depth_pro |
This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs: models: ['models/depth_pro'] |
@qubvel Shouldn't the checkpoint be |
@geetu040 I followed the same pattern for the model family. Let me check if there's anything we can do. |
Otherwise I can change the checkpoint in the code, and it would also require to be changed in the model card. |
https://huggingface.co/apple/depth-pro-hf now redirects to |
But good point about the model card, I'll change so it's less confusing. Edit: updated. |
sure, updated! |
run-slow: depth_pro |
This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs: models: ['models/depth_pro'] |
@qubvel, the tests did not run in the last 2 attempts. Error Log
Looks like something is wrong with the workflow itself. |
@geetu040 yes, smth wrong with CI, waiting for the team to fix it |
What does this PR do?
Fixes #34020
This PR adds Apple's Depth Pro model to Hugging Face Transformers. Depth Pro is a foundation model for zero-shot metric monocular depth estimation. It leverages a multi-scale vision transformer optimized for dense predictions. It downsamples an image at several scales. At each scale, it is split into patches, which are processed by a ViT-based (Dinov2) patch encoder, with weights shared across scales. Patches are merged into feature maps, upsampled, and fused via a DPT decoder.
Relevant Links
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@amyeroberts, @qubvel