Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

Feature/reduce-decoder-mem-usage #84

Merged
merged 5 commits into from
Dec 13, 2024
Merged

Conversation

cathalobrien
Copy link
Contributor

This change increases the memory saved from using chunking in the mapper. At the moment we use two arrays to accumulate chunks, this replaces it with a single array. At 9km this reduces peak memory usage by 6GB.

Below I have pictures of memory usage during the chunking part of the decoder at 9km

Before

Screenshot 2024-11-21 at 13 55 24

Notice the zig-zag pattern. This is from the 'out1' tensor being constantly created and freed each chunk.

After

Screenshot 2024-11-21 at 13 55 37

Now the zig-zag pattern is gone and peak memory usage has decreased by 6GB

declare an empty accum tensor outside the for loop. the old way of having out and out1 results in two copies of the array which results in more memory use. at 9km this added 6gb to peak mem usage
@FussyDuck
Copy link

FussyDuck commented Nov 21, 2024

CLA assistant check
All committers have signed the CLA.

@codecov-commenter
Copy link

codecov-commenter commented Nov 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.85%. Comparing base (fd2bcf1) to head (0fb033a).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop      #84   +/-   ##
========================================
  Coverage    99.85%   99.85%           
========================================
  Files           23       23           
  Lines         1374     1374           
========================================
  Hits          1372     1372           
  Misses           2        2           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cathalobrien cathalobrien self-assigned this Nov 21, 2024
@ssmmnn11
Copy link
Member

great work. Is this from a training run or inference run?

@cathalobrien
Copy link
Contributor Author

great work. Is this from a training run or inference run?

Inference. Havent tried in training bc this only happens when num_chunks > 1

@ssmmnn11
Copy link
Member

Absolutely, would be just interesting to check.

@cathalobrien
Copy link
Contributor Author

didnt see any difference during training. Makes sense since we dont use chunking

@cathalobrien cathalobrien requested a review from japols November 28, 2024 12:42
@cathalobrien cathalobrien marked this pull request as ready for review November 28, 2024 12:42
@floriankrb
Copy link
Member

Thanks for this improvement @cathalobrien , please sign the CLA (see above...)

image

@cathalobrien
Copy link
Contributor Author

Thanks for this improvement @cathalobrien , please sign the CLA (see above...)

done, thanks @floriankrb

Copy link
Member

@japols japols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, nice catch :)

@mchantry mchantry merged commit 90ac4df into develop Dec 13, 2024
123 checks passed
@mchantry mchantry deleted the feature/reduce-decoder-mem-usage branch December 13, 2024 08:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants