-
Notifications
You must be signed in to change notification settings - Fork 15
Conversation
declare an empty accum tensor outside the for loop. the old way of having out and out1 results in two copies of the array which results in more memory use. at 9km this added 6gb to peak mem usage
for more information, see https://pre-commit.ci
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #84 +/- ##
========================================
Coverage 99.85% 99.85%
========================================
Files 23 23
Lines 1374 1374
========================================
Hits 1372 1372
Misses 2 2 ☔ View full report in Codecov by Sentry. |
great work. Is this from a training run or inference run? |
Inference. Havent tried in training bc this only happens when num_chunks > 1 |
Absolutely, would be just interesting to check. |
didnt see any difference during training. Makes sense since we dont use chunking |
Thanks for this improvement @cathalobrien , please sign the CLA (see above...) |
done, thanks @floriankrb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, nice catch :)
This change increases the memory saved from using chunking in the mapper. At the moment we use two arrays to accumulate chunks, this replaces it with a single array. At 9km this reduces peak memory usage by 6GB.
Below I have pictures of memory usage during the chunking part of the decoder at 9km
Before
Notice the zig-zag pattern. This is from the 'out1' tensor being constantly created and freed each chunk.
After
Now the zig-zag pattern is gone and peak memory usage has decreased by 6GB