Conformer OOM fix #549

chandramouli-sastry · 2023-10-15T01:14:00Z

Contains changes to fix #497

I replaced the lazy linear layers with linear layers.
Along with few minor changes/improvements throughout the conformer model.py, I configured the scaled-dot-product-attention backend to use the math backend and this helped fix the out of memory error.
I updated the comparator and confirmed that the jax/pytorch implementations are identical:
Running with torch.compile=True still gives OOM and this only fixes OOM for torch.compile=False (tested with NAdamW and AdamW) and I disabled compilation for librispeech_conformers until this can be fixed.
For reference, I also profiled the run for 500 steps -- not sure if this is in the ballpark :)

dev -> main

github-actions · 2023-10-15T01:14:16Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

msaroufim · 2023-10-18T22:38:10Z

algorithmic_efficiency/workloads/librispeech_conformer/librispeech_pytorch/workload.py

    torch.backends.cudnn.benchmark = False
+    torch.backends.cuda.enable_flash_sdp(False)
+    torch.backends.cuda.enable_mem_efficient_sdp(False)


curious that math sdp was more memory efficient than memory efficient attention

It is likely more memory efficient but it seems like it requires a different memory configuration setting to sidestep the requirement of torch.cuda.empty_cache() after every update step.

yeah i feel like this is some corner case sdpa bug but should be fine to merge still

msaroufim · 2023-10-18T23:02:47Z

algorithmic_efficiency/workloads/librispeech_conformer/librispeech_pytorch/models.py

@@ -81,7 +84,10 @@ def __init__(self, encoder_dim: int = 0, input_dropout_rate: float = 0.0):
    self.conv2 = Conv2dSubsampling(
        input_channels=encoder_dim, output_channels=encoder_dim)

-    self.linear = nn.LazyLinear(out_features=self.encoder_dim, bias=True)
+    self.linear = nn.Linear(
+        in_features=self.encoder_dim * num_bins // 4,


what's the reasoning for this?

Each of the two subsampling layers reduce the number mel-spectrogram features by half.

msaroufim · 2023-10-18T23:07:20Z

algorithmic_efficiency/workloads/librispeech_conformer/librispeech_pytorch/models.py

    self.qs = QueryScaler(dim=config.encoder_dim // config.num_attention_heads)

-  def _scaled_in_proj_weight(self):


ok i assume that the new implementation is numerically equivalent, SDPA is probably the right bet

I also tried the default multihead self-attention without query scaler and it still couldn't run successfully without adjusting the attention backends. If this is useful for your debugging, I can create a separate branch with this setup?

so the previous implementation was quite long sdpa is something that's maintained in pytorch so this actually makes things better

The previous implementation was long because it was extending the nn.MultiheadAttention by changing the in-projection weights/biases in the forward pass but the attention was still managed entirely by pytorch.

priyakasimbeg · 2023-10-19T17:58:08Z

Confirmed that this works for 60K run:

(60000, {'train/ctc_loss': Array(0.09971161, dtype=float32), 'train/wer': 0.036610075914423744, 'validation/ctc_loss': 
Array(0.30221355, don/wer': type=float32), 'validation/wer': 0.08587637121438702, 'validation/num_examples': 5348,  'test/ctc_loss': Array(0.16302903, dtype=float32), 'teest/num_st/wer': 0.05093237974018627, 'test/num_examples': 2472,  'score': 46081.4025554657, 'total_duration': 50274.87093257904, 'accumulated_submissted_evalion_time':  46081.4025554657, 'accumulated_eval_time': 4190.642651796341, 'accumulated_logging_time': 1.6540186405181885,  'global_step': 60000c_loss':, 'preemption_count': 0})], 'global_step': 60000} 
 r': 0.08I1019 15:00:12.858658 139843127400256 submission_runner.py:550] Timing: 46081.4025554657                                                  um_exampI1019 15:00:12.858741 139843127400256 submission_runner.py:552] Total number of evals: 33                                                  l_time':I1019 15:00:12.858824 139843127400256 submission_runner.py:553] ====================

priyakasimbeg and others added 3 commits September 25, 2023 18:03

Merge pull request mlcommons#511 from mlcommons/dev

ddf5e14

dev -> main

conformer oom fixes

d047d05

style fix

28a1ff0

chandramouli-sastry requested a review from a team as a code owner October 15, 2023 01:14

msaroufim approved these changes Oct 18, 2023

View reviewed changes

priyakasimbeg self-requested a review October 20, 2023 04:35

priyakasimbeg approved these changes Oct 20, 2023

View reviewed changes

priyakasimbeg merged commit 25fb3a0 into mlcommons:dev Oct 20, 2023
16 checks passed

github-actions bot locked and limited conversation to collaborators Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conformer OOM fix #549

Conformer OOM fix #549

chandramouli-sastry commented Oct 15, 2023

github-actions bot commented Oct 15, 2023

msaroufim Oct 18, 2023

chandramouli-sastry Oct 19, 2023

msaroufim Oct 19, 2023

msaroufim Oct 18, 2023

chandramouli-sastry Oct 19, 2023

msaroufim Oct 18, 2023

chandramouli-sastry Oct 19, 2023

msaroufim Oct 19, 2023

chandramouli-sastry Oct 19, 2023 •

edited

Loading

priyakasimbeg commented Oct 19, 2023 •

edited

Loading

		self.qs = QueryScaler(dim=config.encoder_dim // config.num_attention_heads)

		def _scaled_in_proj_weight(self):

Conformer OOM fix #549

Conformer OOM fix #549

Conversation

chandramouli-sastry commented Oct 15, 2023

github-actions bot commented Oct 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chandramouli-sastry Oct 19, 2023 • edited Loading

Choose a reason for hiding this comment

priyakasimbeg commented Oct 19, 2023 • edited Loading

chandramouli-sastry Oct 19, 2023 •

edited

Loading

priyakasimbeg commented Oct 19, 2023 •

edited

Loading