Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Updated BridgeTower Image processor #32384

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

Sai-Suraj-27
Copy link
Contributor

@Sai-Suraj-27 Sai-Suraj-27 commented Aug 1, 2024

What does this PR do?

Updated BridgeTower Image processor.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@amyeroberts @ArthurZucker

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@amyeroberts
Copy link
Collaborator

@Sai-Suraj-27 This affects the image processing tests for bridgetower, which will need to be updated to reflect this fix

@Sai-Suraj-27
Copy link
Contributor Author

Sai-Suraj-27 commented Aug 3, 2024

@Sai-Suraj-27 This affects the image processing tests for bridgetower, which will need to be updated to reflect this fix

Thanks for the review. Yes, I am looking into it. Will try to fix and update the PR 👍🏻
Edit: I updated the tests and now they are all passing. But I am not 100% sure, if I did it the correct way. Let me know if there is any problem.

@Sai-Suraj-27 Sai-Suraj-27 changed the title fix: Fixed an incorrect variable assignment fix: Updated image_processing_bridgetower tests Aug 5, 2024
@amyeroberts
Copy link
Collaborator

Hi @Sai-Suraj-27, thanks for iterating on this!

Looking at the changes in the test, I realized that this fix (which is addressing a bug in the code) would be introducing a change in the default behaviour of the image processor, and therefore would produce different outputs for the user. Previously, because do_center_crop was never defaulting to self.do_center_crop if unset, we were never cropping the image, and the flag in the preprocessor config had no effect.

What we should do is:

  • remove the do_center_crop flag from the image processor
  • remove crop_size logic from the image processor
  • Add a deprecation warning in the crop method to say that it's going to be removed in two releases time

@Sai-Suraj-27
Copy link
Contributor Author

Sai-Suraj-27 commented Aug 6, 2024

Thanks for the review and explanation @amyeroberts. I've made the required changes. Please look into it and let me know if I missed anything.

@Sai-Suraj-27 Sai-Suraj-27 changed the title fix: Updated image_processing_bridgetower tests fix: Updated bridgetower Image processor Aug 6, 2024
@Sai-Suraj-27 Sai-Suraj-27 changed the title fix: Updated bridgetower Image processor fix: Updated BridgeTower Image processor Aug 6, 2024
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this! Just a few places we need to be careful to safely deprecate to avoid surprises for the user

@@ -119,6 +122,8 @@ def get_expected_values(self, image_inputs, batched=False):
return expected_height, expected_width

def expected_output_image_shape(self, images):
if self.do_center_crop:
return self.num_channels, self.crop_size["height"], self.crop_size["width"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reflect the crop behaviour of the image processor

Suggested change
return self.num_channels, self.crop_size["height"], self.crop_size["width"]
return self.num_channels, self.size["shortest_edge"], self.size["shortest_edge"]

@@ -58,6 +59,7 @@ def __init__(
self.rescale_factor = rescale_factor
self.do_normalize = do_normalize
self.do_center_crop = do_center_crop
self.crop_size = crop_size if crop_size is not None else {"height": 288, "width": 288}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.crop_size = crop_size if crop_size is not None else {"height": 288, "width": 288}

@@ -73,6 +75,7 @@ def prepare_image_processor_dict(self):
"do_normalize": self.do_normalize,
"do_resize": self.do_resize,
"size": self.size,
"crop_size": self.crop_size,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"crop_size": self.crop_size,

@@ -296,10 +296,9 @@ def center_crop(
The channel dimension format of the input image. If not provided, it will be inferred from the input
image.
"""
output_size = size["shortest_edge"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, what we want to do is update the docstring to reflect the true behaviour of the method, but still take the same size["shortest_edge"] argument. Otherwise this would be a breaking change and previous configs for the model would have a different behaviour. In fact - I think existing configs would break as they don't have "height" or "width" keys for size

@@ -42,6 +42,7 @@ def __init__(
rescale_factor: Union[int, float] = 1 / 255,
do_normalize: bool = True,
do_center_crop: bool = True,
crop_size: Dict[str, int] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
crop_size: Dict[str, int] = None,

@@ -277,6 +268,7 @@ def center_crop(
The channel dimension format of the input image. If not provided, it will be inferred from the input
image.
"""
warnings.warn("The center_crop method is deprecated and will be removed in two releases time.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should specify the exact release version.

Comment on lines 387 to 388
do_center_crop: Optional[bool] = None,
crop_size: Dict[str, int] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't just remove these like this as this will break things for users. We need to still accept them and emit a warning if they are set

@@ -126,7 +126,6 @@ class PaliGemmaPreTrainedModel(PreTrainedModel):
_no_split_modules = ["PaliGemmaMultiModalProjector"]
_skip_keys_device_placement = "past_key_values"
_supports_flash_attn_2 = False
_supports_cache_class = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't remove this here

@Rocketknight1
Copy link
Member

cc @qubvel can you take a look at this while Amy is out?

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, btw I am not sure I understand from reading the interactions what this fixes? It's also kinda breaking as you remove something but more importantly could you explain the motivations?

@Sai-Suraj-27
Copy link
Contributor Author

Sai-Suraj-27 commented Jan 22, 2025

Thanks for the PR, btw I am not sure I understand from reading the interactions what this fixes? It's also kinda breaking as you remove something but more importantly could you explain the motivations?

Thanks for looking into the PR. Line 458, here is the initial reason I made the PR.

image_mean = image_mean if image_mean is not None else self.image_mean
image_std = image_std if image_std is not None else self.image_std
do_pad = do_pad if do_pad is not None else self.do_pad
do_center_crop if do_center_crop is not None else self.do_center_crop

Initially i just changed the line to:

do_center_crop = do_center_crop if do_center_crop is not None else self.do_center_crop

But tht affected the image processing tests for bridgetower, which lead to these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants