Update model builders #2282

Ankur-singh · 2025-01-19T22:39:31Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (Refactor)

Please link to any issues this PR addresses #2270

Changelog

What are the changes made in this PR?

Updated all component_builder to pass nn.ModuleList to TransformerDecoder instead of (layer + num_layers).
Updated test for T5 model

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

I did not change any public API
I have added an example to docs or docstrings

pytorch-bot · 2025-01-19T22:39:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2282

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Ankur-singh · 2025-01-19T22:48:45Z

@ebsmothers for now, I have only updated the component_builders for all the models. In my local testing, removing this else block

torchtune/torchtune/modules/transformer.py

Line 375 in 779569e

else:

passes all the tests. If this PR looks good, I will:

Update VisionTransformer class and component builders for vision models
Delete the _get_clones function

torchtune/models/gemma2/_component_builders.py

torchtune/models/gemma/_component_builders.py

torchtune/models/llama2/_component_builders.py

RdoubleA · 2025-01-20T17:55:54Z

Just one comment about make sure RoPE is instantiated correctly, but the rest looks good

ebsmothers · 2025-01-20T18:47:54Z

torchtune/models/t5/_encoder.py

+        self.layers = (
+            layers if isinstance(layers, nn.ModuleList) else nn.ModuleList(layers)
+        )


In this case it's not a big deal since T5 is not yet widely-used in the repo, but fyi we do have to be careful about deprecating num_layers as it can break people. The proper thing to do is continue supporting it for one release and mark as deprecated (similar to what we have here for functions/classes)

Thanks for pointing it out. I will re-introduce support for num_layers and add deprecated decorator with message "num_layers argument will be deprecated in upcoming release."

@ebsmothers Do you want me to add deprecated decorator to TransformerDecoder class as well?

@Ankur-singh actually I wouldn’t worry about adding the decorator to the class. I think we need a separate utility to log only when the to-be-deprecated argument is passed. Otherwise everyone will see the warning about deprecation of num_layers even if they aren’t using it

tests/torchtune/models/t5/test_t5_encoder.py

ebsmothers · 2025-01-20T18:53:45Z

tests/torchtune/models/t5/test_t5_encoder.py

-                    [0.3383, 0.3150],
-                    [0.3727, 0.2892],
-                    [0.3996, 0.2653],
+                    [0.4958, 0.4845],


I'm a bit surprised that the expected values need to change here. I would have thought that uniform initialization iterating over model.parameters() shouldn't be affected by whether we use _get_clones or not

After making all the changes, I tried pytest tests with:

with _get_clones: success

Without _get_clones: failed

Hence, I thought the difference is because of _get_clones. Ran the following script to test the hypothesis:

import torch from torchtune.modules.transformer import _get_clones from torchtune.modules.peft import LoRALinear def main(): loras_loop = [None] * 4 for i in range(4): loras_loop[i] = LoRALinear(in_dim=16, out_dim=16, rank=4, alpha=1.0) loras_cloned = _get_clones( LoRALinear(in_dim=16, out_dim=16, rank=4, alpha=1.0), 4 ) loop_max_diff = torch.max(torch.abs(loras_loop[0].lora_a.weight - loras_loop[3].lora_a.weight)) cloned_max_diff = torch.max(torch.abs(loras_cloned[0].lora_a.weight - loras_cloned[3].lora_a.weight)) print(f"Max diff between layers using for-loop: {loop_max_diff}") print(f"Max diff between layers using _get_clones: {cloned_max_diff}") input = torch.randn(1, 16) output1 = input.clone() for layer in loras_loop: output1 = layer(output1) output2 = input.clone() for layer in loras_cloned: output2 = layer(output2) cloned_max_diff = torch.max(torch.abs(output1 - output2)) print(f"Max diff between outputs from two approach: {cloned_max_diff}") if __name__ == "__main__": main() # ---- # Max diff between layers using for-loop: 0.4660979211330414 # Max diff between layers using _get_clones: 0.0 # Max diff between outputs from two approach: 0.3515825569629669

Yeah I suspect this is due to something with the random seed. Let me take a closer look to confirm but otherwise I think updating the expected values is fine

Also dug into this one a bit more: the changes to T5 change the order in which modules are registered, so when we do

for param in model.parameters(): param.data.uniform_(0, 1)

we again wind up with different state for the rng. So again it's fine to change the expected values here

ebsmothers

Thanks for making these changes! I left a few comments (mainly in response to some of @RdoubleA's points about RoPE -- TLDR is don't worry too much for this PR provided everything works). Otherwise no major concerns though

Ankur-singh · 2025-01-21T15:34:07Z

@ebsmothers & @RdoubleA I have made the requested changes.

GPU test fails with the error message:

FAILED tests/recipes/dev/test_generate_v2.py::TestGenerateV2::test_llama2_generate_results - AssertionError: assert 'Country maior Connection Kohćutsójcustomulas Sometimes Security' in 'INFO     torchtune.utils._logging:_utils.py:28 Running InferenceRecipe with resolved config:\n\ncheckpointer:\n  _component_: torchtune.training.FullModelTorchTuneCheckpointer\n  checkpoint_dir: /tmp/test-artifacts\n  checkpoint_files:\n  - /tmp/test-artifacts/small-ckpt-tune-03082024.pt\n  model_type: LLAMA2\n  output_dir: /tmp/pytest-of-ec2-user/pytest-0/test_llama2_generate_results0\ndevice: cpu\ndtype: fp32\nlog_level: INFO\nmax_new_tokens: 10\nmodel:\n  _component_: torchtune.models.llama2.llama2\n  embed_dim: 256\n  max_seq_len: 2048\n  norm_eps: 1.0e-05\n  num_heads: 16\n  num_kv_heads: 8\n  num_layers: 4\n  vocab_size: 32000\noutput_dir: /tmp/pytest-of-ec2-user/pytest-0/test_llama2_generate_results0\nprompt:\n  system: You are a helpful and creative AI assistant.\n  user: What is the capital of France?\nseed: 123\ntemperature: 0.6\ntokenizer:\n  _component_: torchtune.models.llama2.llama2_tokenizer\n  max_seq_len: 2048\n  path: /tmp/test-artifacts/tokenizer.model\ntop_k: 300\n\nINFO     torchtune.utils._logging:generate_v2.py:94 Model was initialized with precision torch.float32.\nINFO     torchtune.utils._logging:generate_v2.py:208 \n\nPietroместkap щotimes rivers cache НиtringindexPathNAME\n\nINFO     torchtune.utils._logging:generate_v2.py:112 Time for inference: 0.08 sec total, 135.68 tokens/sec\nINFO     torchtune.utils._logging:generate_v2.py:115 Bandwidth achieved: 9.92 GiB/s\n'
====== 1 failed, 761 passed, 7 skipped, 14 warnings in 2548.47s (0:42:28) ======

It's because of

torchtune/tests/recipes/dev/test_generate_v2.py

Line 58 in 779569e

expected_output = (

I'm assuming it's because of initialization (refer #2282 (comment)). I had to update the out tensors for T5 encoder as well. I think we will have to take a closer look sometime in future.

ebsmothers · 2025-01-22T21:32:49Z

Hi @Ankur-singh thanks for your patience, I wanted to sanity check the failing test to make sure it's expected. In our generate_v2 recipe we set the seed during init here. This means we construct the model after setting seed, and so when you change how we initialize the model (i.e. we now do some random initialization for every single layer, instead of just one layer that we then copy), the RNG state will be in a different place by the time we get to our call to generate, hence our sampling for generation will yield different results. You can confirm this by moving the call to training.set_seed e.g. here to force model initialization to happen first -- you will see that whether you run on main or on this PR you will get the same result.

Anyways TLDR for this PR is that you are not breaking anything, you can safely just update the expected value for this test. Separately we can think about whether it's clearer to call set_seed at the beginning of generate to ensure that we get deterministic behavior irrespective of model initialization. cc @joecummings in case he has any thoughts on this.

Ankur-singh · 2025-01-23T01:22:42Z

Hi @ebsmothers thanks for clarifying. I'm a bit confused, after initializing the model, we load the state_dict from checkpoint. This should basically overwrite any previous initialization, right? Furthermore, during testing, we are loading some dummy model weights. This should make the model weights deterministic, irrespective of the initialization method and random_seed. Is my understanding correct so far?

random_seed only comes to picture in generate method when sampling the next token. So at least during testing, as long as we have the same random_seed and set_seed is called before the generation starts, we should be getting the same output. Only thing that could lead to different generation would be change in checkpoint weights.

I also tried out this small script to see if calling a random operation before setting the seed affects the outcome (looks like it does not):

 import torch

# Case 1: No random operations before setting seed
torch.manual_seed(42)
x1 = torch.randn(3)
print("Case 1:", x1)

_ = torch.randn(5)  # Consumes RNG state
_ = torch.randn(5) 

# Case 2: Random operation before setting seed
torch.manual_seed(42)
x2 = torch.randn(3)
print("Case 2:", x2)
print(f"Case 1 and Case 2 are equal: {torch.allclose(x1, x2)}")

-----
# Output:
# Case 1: tensor([0.3367, 0.1288, 0.2345])
# Case 2: tensor([0.3367, 0.1288, 0.2345])
# Case 1 and Case 2 are equal: True

TLDR, as long as we are loading the model weight from same checkpoint and calling set_seed before start of generation (either in __init__ or generate method) we should be getting the same output. Am I missing something here?

PS: replacing the text here

torchtune/tests/recipes/dev/test_generate_v2.py

Line 59 in d7afc40

"Country maior Connection Kohćutsójcustomulas Sometimes Security"

with "Pietroместkap щotimes rivers cache НиtringindexPathNAME" gets a green light. Tested it by running

pytest tests/recipes/dev/test_generate_v2.py --ignore tests/torchtune/modules/_export --with-integration --durations=20 -vv

ebsmothers · 2025-01-23T05:17:38Z

@Ankur-singh yeah it's a bit tricky, let me try to address a couple of these points.

I'm a bit confused, after initializing the model, we load the state_dict from checkpoint. This should basically overwrite any previous initialization, right? Furthermore, during testing, we are loading some dummy model weights. This should make the model weights deterministic, irrespective of the initialization method and random_seed. Is my understanding correct so far?

This is the correct understanding. The model weights will wind up being the same regardless of the random seed because the state dict weights override any random initialization. The one caveat is that if the initialization differs, the random state will not be the same. This is why the generate_v2 test results change. We currently do things in the following order: (1) set the seed, (2) initialize the model, (3) perform some other operation requiring randomness (in this case sampling from logits).

Looking at your code snippet, I think things are not quite in the right order -- hence why you're getting the same results both times. Instead it should look like this:

 import torch

# Case 1: No random operations *AFTER* setting seed
torch.manual_seed(42)
x1 = torch.randn(3)
print("Case 1:", x1)


# Case 2: Random operation *AFTER* setting seed
torch.manual_seed(42)
_ = torch.randn(5)  # Consumes RNG state
_ = torch.randn(5) # *THESE NOW OCCUR AFTER TORCH.MANUAL_SEED*
x2 = torch.randn(3)
print("Case 2:", x2)
print(f"Case 1 and Case 2 are equal: {torch.allclose(x1, x2)}")

-----
# Output:
# Case 1: tensor([0.3367, 0.1288, 0.2345])
# Case 2: tensor([0.5349, 0.8094, 1.1103])
# Case 1 and Case 2 are equal: False

Anytime you call torch.manual_seed you are basically resetting the RNG state. So if you generate random numbers, then call torch.manual_seed, the RNG state will then be as though you had never generated said numbers. That's why your snippet shows the same results, whereas if I call randn after manual_seed, the results differ.

Going back to the order of (1), (2), (3) for generate_v2.py I described above: the implications of this are that any random operation after calling set_seed (even if its results are later overwritten) will impact the RNG state, and without a subsequent call to set_seed to reset things we will get different results for any subsequent random operations. This is why I mentioned moving the set_seed call here -- this basically swaps (1) and (2), so that there is no longer any model initialization (i.e. there are no longer any random operations) after the call to set_seed. By doing this, you will see that the generation results look the same regardless of how the model is initialized.

Hope this helps clarify things a bit! Please let me know if anything remains unclear.

Ankur-singh · 2025-01-23T06:14:17Z

@ebsmothers thank you so much. My mental model for random_state is completely shattered. I can see how this seemingly minor change (swapping 1 & 2) can leading to different results. I really appreciate you taking the time and sharing these insightful nuggets.

Regarding this PR at hand, do you want me to change the generatev2 output in the test file or do you want me wait until we finalize the approach?

ebsmothers · 2025-01-23T13:42:51Z

@Ankur-singh you can go ahead and change the output, we can decide whether it’s worth moving our call to set_seed at a later date (since it’s not a bug, more of a quality-of-life thing)

Ankur-singh · 2025-01-26T00:01:38Z

@ebsmothers I have updated the value of expected_output with "Pietroместkap щotimes rivers cache НиtringindexPathNAME". Tested it by running pytest tests/recipes/dev/test_generate_v2.py --ignore tests/torchtune/modules/_export --with-integration --durations=20 -vv

(tune) ankur@nuc:~/github/torchtune$ pytest tests/recipes/dev/test_generate_v2.py --ignore tests/torchtune/modules/_export --with-integration --durations=20 -vv
Expected artifacts for test run are:
small-ckpt-tune-03082024.pt
small-ckpt-meta-03082024.pt
small-ckpt-hf-03082024.pt
small-ckpt-tune-llama3-05052024.pt
small-ckpt-hf-reward-07122024.pt
small-ckpt-meta-vision-10172024.pt
small-ckpt-hf-vision-10172024.pt
tokenizer.model
tokenizer_llama3.model
File already exists locally: /tmp/test-artifacts/small-ckpt-tune-03082024.pt
File already exists locally: /tmp/test-artifacts/small-ckpt-meta-03082024.pt
File already exists locally: /tmp/test-artifacts/small-ckpt-hf-03082024.pt
File already exists locally: /tmp/test-artifacts/small-ckpt-tune-llama3-05052024.pt
File already exists locally: /tmp/test-artifacts/small-ckpt-hf-reward-07122024.pt
File already exists locally: /tmp/test-artifacts/small-ckpt-meta-vision-10172024.pt
File already exists locally: /tmp/test-artifacts/small-ckpt-hf-vision-10172024.pt
File already exists locally: /tmp/test-artifacts/tokenizer.model
File already exists locally: /tmp/test-artifacts/tokenizer_llama3.model
================================================================= test session starts =================================================================
platform linux -- Python 3.10.16, pytest-7.4.0, pluggy-1.5.0 -- /home/ankur/miniforge3/envs/tune/bin/python3.10
cachedir: .pytest_cache
rootdir: /home/ankur/github/torchtune
configfile: pyproject.toml
plugins: cov-6.0.0, mock-3.14.0, integration-0.2.3
collected 2 items                                                                                                                                     

tests/recipes/dev/test_generate_v2.py::TestGenerateV2::test_llama2_generate_results PASSED                                                      [ 50%]
tests/recipes/dev/test_generate_v2.py::TestGenerateV2::test_llama2_fail_on_bad_input PASSED                                                     [100%]

================================================================ slowest 20 durations =================================================================
0.55s call     tests/recipes/dev/test_generate_v2.py::TestGenerateV2::test_llama2_generate_results
0.00s setup    tests/recipes/dev/test_generate_v2.py::TestGenerateV2::test_llama2_generate_results
0.00s setup    tests/recipes/dev/test_generate_v2.py::TestGenerateV2::test_llama2_fail_on_bad_input
0.00s teardown tests/recipes/dev/test_generate_v2.py::TestGenerateV2::test_llama2_fail_on_bad_input
0.00s teardown tests/recipes/dev/test_generate_v2.py::TestGenerateV2::test_llama2_generate_results
0.00s call     tests/recipes/dev/test_generate_v2.py::TestGenerateV2::test_llama2_fail_on_bad_input
================================================================== 2 passed in 0.58s ==================================================================

codecov-commenter · 2025-01-26T00:22:44Z

Codecov Report

Attention: Patch coverage is 62.67606% with 53 lines in your changes missing coverage. Please review.

Project coverage is 64.09%. Comparing base (779569e) to head (291a681).
Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
torchtune/models/mistral/_component_builders.py	42.85%	16 Missing ⚠️
torchtune/models/gemma/_component_builders.py	42.85%	8 Missing ⚠️
torchtune/models/llama2/_component_builders.py	73.33%	8 Missing ⚠️
torchtune/models/llama3/_component_builders.py	46.66%	8 Missing ⚠️
torchtune/models/t5/_encoder.py	41.66%	7 Missing ⚠️
torchtune/models/gemma2/_component_builders.py	0.00%	2 Missing ⚠️
torchtune/models/llama3_1/_component_builders.py	0.00%	2 Missing ⚠️
tests/recipes/dev/test_generate_v2.py	0.00%	1 Missing ⚠️
torchtune/models/llama3_2/_component_builders.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2282      +/-   ##
==========================================
- Coverage   65.34%   64.09%   -1.25%     
==========================================
  Files         358      353       -5     
  Lines       21207    20726     -481     
==========================================
- Hits        13857    13285     -572     
- Misses       7350     7441      +91

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…merDecoder

…ora_gemma functions

…lora_gemma2 functions

…a_llama2, llama2_classifier and lora_llama2_classifier functions

…formerDecoder

…nsformerDecoder

…erDecoder

…rmerDecoder

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 19, 2025

RdoubleA reviewed Jan 20, 2025

View reviewed changes

torchtune/models/gemma2/_component_builders.py Outdated Show resolved Hide resolved

torchtune/models/gemma/_component_builders.py Show resolved Hide resolved

torchtune/models/llama2/_component_builders.py Show resolved Hide resolved

ebsmothers reviewed Jan 20, 2025

View reviewed changes

tests/torchtune/models/t5/test_t5_encoder.py Outdated Show resolved Hide resolved

ebsmothers reviewed Jan 20, 2025

View reviewed changes

Ankur-singh mentioned this pull request Jan 20, 2025

Inconsistent initialization of RoPE embedding across component builders #2283

Open

RdoubleA mentioned this pull request Jan 21, 2025

v0.6.0 tracker #2232

Open

Ankur-singh added 11 commits January 25, 2025 22:52

Refactor llama2 component_builder to pass multiple layers to Transfor…

122cd32

…merDecoder

Refactor TransformerDecoder to support multiple layers in gemma and l…

defc90d

…ora_gemma functions

Refactor TransformerDecoder to support multiple layers in gemma2 and …

1200ad5

…lora_gemma2 functions

Refactor TransformerDecoder to support multiple layers in llama2, lor…

8b5d789

…a_llama2, llama2_classifier and lora_llama2_classifier functions

Refactor T5 encoder and component builder to support multiple layers

1d37dda

Refactor llama3 family component builders to pass ModuleList in Trans…

3999c4a

…formerDecoder

Refactor mistral and lora_mistral functions to pass ModuleList to Tra…

faa38e5

…nsformerDecoder

Refactor phi3 and lora_phi3 functions to pass ModuleList in Transform…

525573a

…erDecoder

Refactor qwen2 and lora_qwen2 functions to pass ModuleList in Transfo…

01903f5

…rmerDecoder

add back support for num_layers in T5Encoder and other minor changes

084ac50

update expected output, thumbs up from pytest

b9a8284

Ankur-singh force-pushed the update-model-builders branch from 291a681 to b9a8284 Compare January 26, 2025 06:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update model builders #2282

Update model builders #2282

Ankur-singh commented Jan 19, 2025

pytorch-bot bot commented Jan 19, 2025 •

edited

Loading

Ankur-singh commented Jan 19, 2025

RdoubleA commented Jan 20, 2025

ebsmothers Jan 20, 2025

Ankur-singh Jan 20, 2025

Ankur-singh Jan 20, 2025

ebsmothers Jan 21, 2025

ebsmothers Jan 20, 2025

Ankur-singh Jan 20, 2025

ebsmothers Jan 21, 2025

ebsmothers Jan 22, 2025

ebsmothers left a comment

Ankur-singh commented Jan 21, 2025

ebsmothers commented Jan 22, 2025

Ankur-singh commented Jan 23, 2025 •

edited

Loading

ebsmothers commented Jan 23, 2025

Ankur-singh commented Jan 23, 2025

ebsmothers commented Jan 23, 2025

Ankur-singh commented Jan 26, 2025

codecov-commenter commented Jan 26, 2025

Update model builders #2282

Are you sure you want to change the base?

Update model builders #2282

Conversation

Ankur-singh commented Jan 19, 2025

Context

Changelog

Test plan

UX

pytorch-bot bot commented Jan 19, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2282

Ankur-singh commented Jan 19, 2025

RdoubleA commented Jan 20, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebsmothers left a comment

Choose a reason for hiding this comment

Ankur-singh commented Jan 21, 2025

ebsmothers commented Jan 22, 2025

Ankur-singh commented Jan 23, 2025 • edited Loading

ebsmothers commented Jan 23, 2025

Ankur-singh commented Jan 23, 2025

ebsmothers commented Jan 23, 2025

Ankur-singh commented Jan 26, 2025

codecov-commenter commented Jan 26, 2025

Codecov Report

pytorch-bot bot commented Jan 19, 2025 •

edited

Loading

Ankur-singh commented Jan 23, 2025 •

edited

Loading