Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#14080: Preprocess weights for Conv2D on Device #16750

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

sankarmanoj-tt
Copy link
Contributor

Ticket

#14080

Problem description

Currently weights preprocessing takes place on the host, on a single thread. This is slow, especially when there is a large weights matrix, and Debug mode is enabled.

What's changed

The weights are loaded to the device in the same format as PyTorch. All other processing, including permute, padding, etc are done on the Device.

Checklist

  • Post commit CI passes
  • Blackhole Post commit (if applicable)
  • Model regression CI testing passes (if applicable)
  • Device performance regression CI testing passes (if applicable)
  • (For models and ops writers) Full new models tests passes
  • New/Existing tests provide coverage for changes

@sankarmanoj-tt sankarmanoj-tt force-pushed the smanoj/conv_device_weights branch from fe919e2 to e00b7ec Compare January 15, 2025 10:53
@sankarmanoj-tt sankarmanoj-tt force-pushed the smanoj/conv_device_weights branch 2 times, most recently from 6351705 to 7662eba Compare January 29, 2025 13:34
@sankarmanoj-tt sankarmanoj-tt marked this pull request as ready for review January 30, 2025 12:05
@sankarmanoj-tt sankarmanoj-tt force-pushed the smanoj/conv_device_weights branch from 7f3a9c0 to c5c4540 Compare January 31, 2025 08:38
@sankarmanoj-tt
Copy link
Contributor Author

@sankarmanoj-tt TODO: Re-enable transpose cast

@sankarmanoj-tt sankarmanoj-tt force-pushed the smanoj/conv_device_weights branch from c5c4540 to 6c6da4d Compare February 3, 2025 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants