-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored modules/tokenizers to be a subdir of modules/transforms #2231
base: main
Are you sure you want to change the base?
Refactored modules/tokenizers to be a subdir of modules/transforms #2231
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2231
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit aaf416f with merge base 75965d4 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2231 +/- ##
==========================================
- Coverage 65.34% 64.10% -1.24%
==========================================
Files 358 354 -4
Lines 21207 20674 -533
==========================================
- Hits 13857 13254 -603
- Misses 7350 7420 +70 ☔ View full report in Codecov by Sentry. |
@joecummings looks like there is a merge conflict but I am unable to see it. How do I fix it? |
@Ankur-singh sorry we missed this comment until now. In my Github UI I see the following just above the "Add a comment" box (might be slightly different depending on your permissions). So seems like the merge conflict is in |
e6fff5f
to
779569e
Compare
@ebsmothers fixed merge conflict and successfully ran |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Ankur-singh for the PR! Overall I think the changes make sense, main comment is around backwards compatibility. This PR will definitely break public APIs (e.g. from torchtune.modules.tokenizers import TikTokenBaseTokenizer
will break). I think the fix is to keep torchtune/modules/tokenizers/__init__.py
with e.g. from torchtune.modules.transforms.tokenizers import TikTokenBaseTokenizer
. Then add a deprecation flag announcing the change from torchtune.modules.tokenizers
to torchtune.modules.tokenizers.transforms
import path. And you can add a comment to torchtune/modules/tokenizers/__init__.py
explaining that these imports are for backwards compatibility purposes and will be removed in v0.7. This way people can still import from torchtune.modules.tokenizers in the short term.
thanks for the feedback @ebsmothers. I'll make the required changes. This is my first time contributing to a big project (like Torchtune), so I’m still getting the hang of things. In the smaller projects I contributed to before, backward compatibility and versioning weren’t as big of a concern. Please bear with me as I work through any rookie mistakes—I’m already learning a lot and appreciate your guidance! |
356cacc
to
e451834
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Ankur-singh for sticking with this one! I know big horizontal refactors like this can be a bit of a pain so definitely appreciate your diligence. Just a couple more small comments and I think we're good here. Can you also check docs/source/basics/custom_components.rst
and docs/source/basics/model_transforms.rst
? I think there's still a modules.tokenizers
import path in each of those files.
@ebsmothers I’m having a blast contributing to Torchtune and learning a ton about engineering best practices along the way. I’ve made the necessary changes! I do tend to miss the occasional small detail, but I’ll be more mindful going forward. Thanks for your patience—I really appreciate it! |
Context
What is the purpose of this PR? Is it to
Please link to any issues this PR addresses. #1301
Changelog
What are the changes made in this PR?
modules/tokenizers
to be a subdir ofmodules/transforms
Test plan
Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.
pre-commit install
)pytest tests
pytest tests -m integration_test
UX
If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example