Update docs and docstrings related to Llama3VisionTransform #2382

Ankur-singh · 2025-02-11T19:55:23Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Please link to any issues this PR addresses. N/A

Changelog

What are the changes made in this PR?

fixed docs for Multimodal transforms
fixed docstrings for Llama3VisionTransform

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

I did not change any public API
I have added an example to docs or docstrings

pytorch-bot · 2025-02-11T19:55:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2382

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

joecummings

@pbontrager Can you take a look? Seems right to me, but need an extra set of eyes.

pbontrager

Thanks for catching all these inconsistencies 🫡

pbontrager · 2025-02-12T15:07:07Z

docs/source/basics/model_transforms.rst

@@ -52,7 +52,7 @@ These are intended to be drop-in replacements for tokenizers in multimodal datas
    print(transform.decode(tokenized_dict["tokens"], skip_special_tokens=False))
    # '<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n<|image|><|image|>What is common in these two images?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nA robot is in both images.<|eot_id|>'
    print(tokenized_dict["encoder_input"]["images"][0].shape)  # (num_tiles, num_channels, tile_height, tile_width)
-    # torch.Size([4, 3, 224, 224])
+    # torch.Size([1, 3, 224, 224])


What's wrong with 4 here? The default for max_num_tiles is 4 and this example doesn't specify that the image is small. It might be worth adding max_num_tiles=4 to LLama3VisionTransform on line 46.

The example makes use of 224x224 images in Messages (lines 35 & 36). So when I tried running it, I got torch.Size([1, 3, 224, 224])

Oh I missed that. Yes 1 would be correct here, though less instructive than if we used a larger image.

I'll update the image size to be 560x560

pbontrager · 2025-02-12T15:10:09Z

docs/source/basics/multimodal_datasets.rst

@@ -42,14 +42,15 @@ in the text, ``"<image>"`` for where to place the image tokens. This will get re

 .. code-block:: python

-    from torchtune.models.llama3_2_vision import llama3_2_vision_transform
+    from torchtune.models.llama3_2_vision import Llama3VisionTransform
    from torchtune.datasets.multimodal import multimodal_chat_dataset

    model_transform = Llama3VisionTransform(


I think the error was actually the other way. Llama3VisionTransform is supposed to be llama3_2_vision_transform and then the other changes aren't necessary.

pbontrager · 2025-02-12T15:10:45Z

torchtune/datasets/multimodal/_multimodal.py

@@ -120,14 +120,14 @@ def multimodal_chat_dataset(

    ::

-        >>> from torchtune.datasets.multimodal import multimodal_chat_dataset
-        >>> from torchtune.models.llama3_2_vision import llama3_2_vision_transform
+        >>> from torchtune.models.llama3_2_vision import Llama3VisionTransform
        >>> from torchtune.datasets.multimodal import multimodal_chat_dataset
        >>> model_transform = Llama3VisionTransform(


Same comment from above.

pbontrager · 2025-02-12T15:11:17Z

torchtune/datasets/multimodal/_multimodal.py

@@ -120,14 +120,14 @@ def multimodal_chat_dataset(

    ::

-        >>> from torchtune.datasets.multimodal import multimodal_chat_dataset
-        >>> from torchtune.models.llama3_2_vision import llama3_2_vision_transform
+        >>> from torchtune.models.llama3_2_vision import Llama3VisionTransform


Why isn't multimodal_chat_dataset needed anymore?

I was imported twice

Ankur-singh · 2025-02-14T05:50:29Z

@pbontrager made the requested changes. Please let me know if there are any more changes.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 11, 2025

joecummings reviewed Feb 11, 2025

View reviewed changes

pbontrager reviewed Feb 12, 2025

View reviewed changes

Ankur-singh added 2 commits February 13, 2025 21:48

Update docs and docstrings related to Llama3VisionTransform

d2eb1bc

continue using llama3_2_vision_transform

d495f59

Ankur-singh force-pushed the docstring-minor-fixes branch from 8fd5598 to d495f59 Compare February 14, 2025 05:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update docs and docstrings related to Llama3VisionTransform #2382

Update docs and docstrings related to Llama3VisionTransform #2382

Ankur-singh commented Feb 11, 2025

pytorch-bot bot commented Feb 11, 2025 •

edited

Loading

joecummings left a comment

pbontrager left a comment

pbontrager Feb 12, 2025

Ankur-singh Feb 12, 2025

pbontrager Feb 12, 2025

Ankur-singh Feb 14, 2025

pbontrager Feb 12, 2025

pbontrager Feb 12, 2025

pbontrager Feb 12, 2025

Ankur-singh Feb 12, 2025 •

edited

Loading

Ankur-singh commented Feb 14, 2025

Update docs and docstrings related to Llama3VisionTransform #2382

Are you sure you want to change the base?

Update docs and docstrings related to Llama3VisionTransform #2382

Conversation

Ankur-singh commented Feb 11, 2025

Context

Changelog

Test plan

UX

pytorch-bot bot commented Feb 11, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2382

joecummings left a comment

Choose a reason for hiding this comment

pbontrager left a comment

Choose a reason for hiding this comment

pbontrager Feb 12, 2025

Choose a reason for hiding this comment

Ankur-singh Feb 12, 2025

Choose a reason for hiding this comment

pbontrager Feb 12, 2025

Choose a reason for hiding this comment

Ankur-singh Feb 14, 2025

Choose a reason for hiding this comment

pbontrager Feb 12, 2025

Choose a reason for hiding this comment

pbontrager Feb 12, 2025

Choose a reason for hiding this comment

pbontrager Feb 12, 2025

Choose a reason for hiding this comment

Ankur-singh Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

Ankur-singh commented Feb 14, 2025

pytorch-bot bot commented Feb 11, 2025 •

edited

Loading

Ankur-singh Feb 12, 2025 •

edited

Loading