Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace failing for PyTorch model - [TEN404] Internal tensorizer error: SundaSizeTiling:tuple index out of range #1101

Open
mansourkheffache opened this issue Jan 30, 2025 · 6 comments

Comments

@mansourkheffache
Copy link

Hello,

I am trying to compile a PyTorch model for AWS Inferentia but it's currently failing with status 70. The main error seems to be:

[TEN404] Internal tensorizer error: SundaSizeTiling:tuple index out of range 

Running on an AWS inf2 instance with Ubuntu 22, installed dependencies as described in the docs

The model I am trying to compile is Kokoro-82M

I found some similar threads but no concrete pointers whether this was a bug or an issue with the setup.

You can find below the data I gathered and steps to reproduce it.

Model Sample Testing:

>>> import torch
>>> from kokoro import KModel # https://github.com/hexgrad/kokoro -- pip install kokoro
>>> model = KModel() # this automatically loads the model weights
>>> tokens = 'hˈɛl ðˈɛɹ!' # this is a tokenization of "hello there!"
>>> style = torch.tensor([[-2.8992e-01,  1.7215e-01, -1.7180e-01,  8.7975e-02, -2.5824e-01,
         -9.6632e-03, -1.3282e-01, -2.1664e-01, -1.8300e-01, -2.6747e-01,
         -1.5194e-03, -2.0507e-01,  1.8936e-01,  3.8610e-01,  4.9778e-01,
         -3.4260e-01,  9.4393e-02, -9.3796e-02,  1.6613e-01, -4.1674e-01,
         -3.7484e-01, -1.9175e-01,  1.4054e-01, -2.1984e-02,  5.9730e-02,
          4.2467e-01, -3.0832e-01,  1.7591e-01,  8.6656e-02,  2.2660e-01,
          1.6077e-01, -2.9118e-01,  4.1010e-02, -1.6368e-01,  1.4374e-01,
          2.8426e-01, -1.3034e-01,  1.1111e-01,  8.1979e-02,  2.8049e-01,
         -8.9360e-02, -3.0852e-01, -3.7002e-02,  2.0198e-01,  2.1839e-02,
          1.1471e-02,  4.7733e-02, -3.6180e-02, -4.1632e-04,  2.8486e-01,
          7.4442e-02,  9.5243e-03,  3.3247e-01, -2.0037e-01,  6.1736e-02,
          1.5094e-01,  1.2034e-01, -3.6872e-02, -1.4080e-02,  3.4697e-02,
          1.3188e-01, -1.7103e-01, -1.4973e-01, -1.4219e-01,  4.9256e-02,
          2.4330e-01, -9.7371e-02,  2.3016e-01,  4.8954e-02,  1.2720e-01,
          1.7822e-01,  9.9102e-02,  3.0066e-01, -3.9374e-01, -1.7884e-01,
         -1.5500e-01,  5.7391e-02,  2.6462e-01,  5.1546e-03, -1.1642e-01,
          1.0069e-02,  1.1563e-01, -2.0279e-01, -2.8824e-02,  5.4539e-02,
          2.2006e-01, -3.2412e-01, -1.7539e-01, -5.5234e-02, -1.4311e-01,
          7.0508e-02, -3.4502e-01,  1.8749e-01, -2.3652e-02, -2.3478e-01,
         -1.6032e-02, -4.4279e-02, -5.4287e-02,  8.2940e-02, -8.2286e-02,
         -1.4792e-01, -2.3400e-01,  4.3200e-02, -9.2596e-02,  1.6887e-01,
          3.3251e-01,  2.4134e-01,  1.8307e-01,  1.5571e-01, -2.4938e-01,
         -6.3343e-02, -5.0948e-03,  1.7809e-01, -1.3545e-01, -1.5980e-01,
          1.2566e-01,  4.6417e-02,  5.2512e-03,  1.8019e-02, -3.2414e-02,
         -1.9864e-01,  2.2142e-01, -8.2012e-02,  2.1262e-01, -2.4275e-02,
         -2.2881e-01,  2.2020e-01, -2.8108e-01, -1.3484e-01,  4.5363e-01,
          8.7013e-01,  2.1801e-01, -2.7301e-02, -5.4362e-02,  2.9353e-01,
          3.9879e-01, -2.5110e-01, -1.6035e-01,  4.1697e-01, -2.4482e-01,
         -1.3425e-01, -1.2676e-02,  6.6520e-02, -4.6501e-01,  2.0488e-01,
          3.9926e-01, -4.7691e-01, -3.4637e-01,  9.0265e-02,  9.7278e-02,
          7.2439e-02,  1.6098e-01,  1.0920e-01,  3.5271e-01, -1.2691e-01,
         -2.0448e-01, -3.9414e-01,  3.1441e-01,  5.6079e-01, -5.7375e-02,
          4.7578e-01,  3.7382e-01,  3.5392e-01,  4.1442e-01,  5.2495e-01,
          6.9648e-01,  1.6890e-01, -8.8573e-03, -4.9823e-02, -3.6581e-01,
         -2.7975e-02,  5.5050e-01, -2.4615e-01,  4.5922e-01, -4.5916e-01,
         -8.4627e-01, -1.2762e-01,  1.2965e-01,  2.7245e-01, -4.0454e-01,
          7.8306e-01,  2.9571e-01,  1.5830e-01,  1.3328e-01,  6.2693e-01,
         -1.0019e-01,  2.6467e-01, -6.2279e-01, -4.0670e-01, -3.1145e-01,
         -4.0506e-02,  1.1507e-01, -4.4824e-02,  7.5859e-01, -5.3282e-02,
         -4.3168e-01, -7.7676e-02,  6.5377e-01,  1.9338e-01,  2.4988e-01,
         -2.9817e-01, -6.5683e-01, -4.6994e-01, -1.1368e-01,  8.5291e-01,
          8.3759e-02,  2.6421e-01,  5.7293e-01, -2.2111e-01, -2.8017e-01,
         -1.2888e-03, -1.4551e-01, -9.2455e-02, -3.7464e-02,  2.4965e-01,
         -2.7919e-01,  1.1545e-01, -5.8069e-01, -6.3421e-01,  6.4000e-01,
         -9.7455e-02,  2.6253e-01, -7.9687e-01, -2.2683e-01, -3.0717e-01,
          1.6216e-01,  3.4397e-02, -1.8823e-01,  1.9812e-01, -9.4713e-02,
          4.8587e-01,  2.5306e-01, -2.4051e-01,  5.3429e-01, -3.7170e-02,
         -4.5766e-01,  2.8827e-01, -5.6328e-01, -1.5588e-01,  3.3532e-01,
         -6.0780e-02,  4.2470e-01,  3.1437e-01,  8.0616e-01, -3.3934e-02,
          2.6732e-01, -4.7927e-01,  6.5193e-01, -3.2750e-01, -1.9029e-01,
         -4.7950e-01,  3.2726e-01, -7.5715e-02, -4.1908e-01, -4.1493e-01,
          8.2071e-02]])
>>> speed = 1
>>> sample_input = (tokens, style, speed)
>>> output = model(*sample_input) # this works fine and the output is correct
>>> output.shape
torch.Size([35400])

Tracing Output:

>>> torch_neuronx.trace(model, sample_input)
2025-01-30 23:50:00.000636:  6412  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --framework=XLA /tmp/ubuntu/neuroncc_compile_workdir/dd838796-f5dc-4d99-8d17-0c225ca7b18b/model.MODULE_11533760926461001556+e30acd3a.hlo_module.pb --output /tmp/ubuntu/neuroncc_compile_workdir/dd838796-f5dc-4d99-8d17-0c225ca7b18b/model.MODULE_11533760926461001556+e30acd3a.neff --target=trn1 --verbose=35
.
Compiler status PASS
2025-01-30 23:50:02.000383:  6412  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --framework=XLA /tmp/ubuntu/neuroncc_compile_workdir/c1872ef0-0ca1-4031-9bf6-3b130f07dc91/model.MODULE_457786941100183860+e30acd3a.hlo_module.pb --output /tmp/ubuntu/neuroncc_compile_workdir/c1872ef0-0ca1-4031-9bf6-3b130f07dc91/model.MODULE_457786941100183860+e30acd3a.neff --target=trn1 --verbose=35
.
Compiler status PASS
2025-01-30 23:50:04.000377:  6412  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --framework=XLA /tmp/ubuntu/neuroncc_compile_workdir/a5bf3eda-d525-43b9-8276-42b71631ae60/model.MODULE_3944879240155622146+e30acd3a.hlo_module.pb --output /tmp/ubuntu/neuroncc_compile_workdir/a5bf3eda-d525-43b9-8276-42b71631ae60/model.MODULE_3944879240155622146+e30acd3a.neff --target=trn1 --verbose=35
.root = neuronxcc/starfish/penguin/targets/sunda/passes/SundaSizeTiling.py
root = neuronxcc/starfish/penguin/targets/sunda/passes
root = neuronxcc/starfish/penguin/targets/sunda
root = neuronxcc/starfish/penguin/targets
root = neuronxcc/starfish/penguin
root = neuronxcc/starfish

2025-01-30 23:50:18.000228:  6412  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/a5bf3eda-d525-43b9-8276-42b71631ae60/model.MODULE_3944879240155622146+e30acd3a.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/a5bf3eda-d525-43b9-8276-42b71631ae60/model.MODULE_3944879240155622146+e30acd3a.neff', '--target=trn1', '--verbose=35']: 2025-01-30T23:50:18Z [TEN404] Internal tensorizer error: SundaSizeTiling:tuple index out of range - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.

2025-01-30 23:50:18.000228:  6412  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/ubuntu/neuroncc_compile_workdir/a5bf3eda-d525-43b9-8276-42b71631ae60/model.MODULE_3944879240155622146+e30acd3a.hlo_module.pb after 0 retries.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 589, in trace
    neff_filename, metaneff, flattener, packer, weights = _trace(
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 646, in _trace
    hlo_artifacts = generate_hlo(
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 450, in generate_hlo
    ) = xla_trace(
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch_neuronx/xla_impl/hlo_conversion.py", line 138, in xla_trace
    outputs = func(*example_inputs)
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/kokoro/model.py", line 76, in forward
    d = self.predictor.text_encoder(d_en, s, input_lengths, text_mask)
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/kokoro/modules.py", line 175, in forward
    x_pad[:, :, :x.shape[-1]] = x
RuntimeError: Bad StatusOr access: INTERNAL: RunNeuronCCImpl: error condition error != 0: <class 'subprocess.CalledProcessError'>: Command '['neuronx-cc', 'compile', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/a5bf3eda-d525-43b9-8276-42b71631ae60/model.MODULE_3944879240155622146+e30acd3a.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/a5bf3eda-d525-43b9-8276-42b71631ae60/model.MODULE_3944879240155622146+e30acd3a.neff', '--target=trn1', '--verbose=35']' returned non-zero exit status 70.

Model Eval:

>>> modal.eval()
KModel(
  (bert): CustomAlbert(
    (embeddings): AlbertEmbeddings(
      (word_embeddings): Embedding(178, 128, padding_idx=0)
      (position_embeddings): Embedding(512, 128)
      (token_type_embeddings): Embedding(2, 128)
      (LayerNorm): LayerNorm((128,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0, inplace=False)
    )
    (encoder): AlbertTransformer(
      (embedding_hidden_mapping_in): Linear(in_features=128, out_features=768, bias=True)
      (albert_layer_groups): ModuleList(
        (0): AlbertLayerGroup(
          (albert_layers): ModuleList(
            (0): AlbertLayer(
              (full_layer_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (attention): AlbertAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (attention_dropout): Dropout(p=0, inplace=False)
                (output_dropout): Dropout(p=0, inplace=False)
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              )
              (ffn): Linear(in_features=768, out_features=2048, bias=True)
              (ffn_output): Linear(in_features=2048, out_features=768, bias=True)
              (activation): NewGELUActivation()
              (dropout): Dropout(p=0, inplace=False)
            )
          )
        )
      )
    )
    (pooler): Linear(in_features=768, out_features=768, bias=True)
    (pooler_activation): Tanh()
  )
  (bert_encoder): Linear(in_features=768, out_features=512, bias=True)
  (predictor): ProsodyPredictor(
    (text_encoder): DurationEncoder(
      (lstms): ModuleList(
        (0): LSTM(640, 256, batch_first=True, dropout=0.2, bidirectional=True)
        (1): AdaLayerNorm(
          (fc): Linear(in_features=128, out_features=1024, bias=True)
        )
        (2): LSTM(640, 256, batch_first=True, dropout=0.2, bidirectional=True)
        (3): AdaLayerNorm(
          (fc): Linear(in_features=128, out_features=1024, bias=True)
        )
        (4): LSTM(640, 256, batch_first=True, dropout=0.2, bidirectional=True)
        (5): AdaLayerNorm(
          (fc): Linear(in_features=128, out_features=1024, bias=True)
        )
      )
    )
    (lstm): LSTM(640, 256, batch_first=True, bidirectional=True)
    (duration_proj): LinearNorm(
      (linear_layer): Linear(in_features=512, out_features=50, bias=True)
    )
    (shared): LSTM(640, 256, batch_first=True, bidirectional=True)
    (F0): ModuleList(
      (0): AdainResBlk1d(
        (actv): LeakyReLU(negative_slope=0.2)
        (upsample): UpSample1d()
        (conv1): Conv1d(512, 512, kernel_size=(3,), stride=(1,), padding=(1,))
        (conv2): Conv1d(512, 512, kernel_size=(3,), stride=(1,), padding=(1,))
        (norm1): AdaIN1d(
          (norm): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (fc): Linear(in_features=128, out_features=1024, bias=True)
        )
        (norm2): AdaIN1d(
          (norm): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (fc): Linear(in_features=128, out_features=1024, bias=True)
        )
        (dropout): Dropout(p=0.2, inplace=False)
        (pool): Identity()
      )
      (1): AdainResBlk1d(
        (actv): LeakyReLU(negative_slope=0.2)
        (upsample): UpSample1d()
        (conv1): Conv1d(512, 256, kernel_size=(3,), stride=(1,), padding=(1,))
        (conv2): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,))
        (norm1): AdaIN1d(
          (norm): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (fc): Linear(in_features=128, out_features=1024, bias=True)
        )
        (norm2): AdaIN1d(
          (norm): InstanceNorm1d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (fc): Linear(in_features=128, out_features=512, bias=True)
        )
        (conv1x1): Conv1d(512, 256, kernel_size=(1,), stride=(1,), bias=False)
        (dropout): Dropout(p=0.2, inplace=False)
        (pool): ConvTranspose1d(512, 512, kernel_size=(3,), stride=(2,), padding=(1,), output_padding=(1,), groups=512)
      )
      (2): AdainResBlk1d(
        (actv): LeakyReLU(negative_slope=0.2)
        (upsample): UpSample1d()
        (conv1): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,))
        (conv2): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,))
        (norm1): AdaIN1d(
          (norm): InstanceNorm1d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (fc): Linear(in_features=128, out_features=512, bias=True)
        )
        (norm2): AdaIN1d(
          (norm): InstanceNorm1d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (fc): Linear(in_features=128, out_features=512, bias=True)
        )
        (dropout): Dropout(p=0.2, inplace=False)
        (pool): Identity()
      )
    )
    (N): ModuleList(
      (0): AdainResBlk1d(
        (actv): LeakyReLU(negative_slope=0.2)
        (upsample): UpSample1d()
        (conv1): Conv1d(512, 512, kernel_size=(3,), stride=(1,), padding=(1,))
        (conv2): Conv1d(512, 512, kernel_size=(3,), stride=(1,), padding=(1,))
        (norm1): AdaIN1d(
          (norm): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (fc): Linear(in_features=128, out_features=1024, bias=True)
        )
        (norm2): AdaIN1d(
          (norm): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (fc): Linear(in_features=128, out_features=1024, bias=True)
        )
        (dropout): Dropout(p=0.2, inplace=False)
        (pool): Identity()
      )
      (1): AdainResBlk1d(
        (actv): LeakyReLU(negative_slope=0.2)
        (upsample): UpSample1d()
        (conv1): Conv1d(512, 256, kernel_size=(3,), stride=(1,), padding=(1,))
        (conv2): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,))
        (norm1): AdaIN1d(
          (norm): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (fc): Linear(in_features=128, out_features=1024, bias=True)
        )
        (norm2): AdaIN1d(
          (norm): InstanceNorm1d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (fc): Linear(in_features=128, out_features=512, bias=True)
        )
        (conv1x1): Conv1d(512, 256, kernel_size=(1,), stride=(1,), bias=False)
        (dropout): Dropout(p=0.2, inplace=False)
        (pool): ConvTranspose1d(512, 512, kernel_size=(3,), stride=(2,), padding=(1,), output_padding=(1,), groups=512)
      )
      (2): AdainResBlk1d(
        (actv): LeakyReLU(negative_slope=0.2)
        (upsample): UpSample1d()
        (conv1): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,))
        (conv2): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,))
        (norm1): AdaIN1d(
          (norm): InstanceNorm1d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (fc): Linear(in_features=128, out_features=512, bias=True)
        )
        (norm2): AdaIN1d(
          (norm): InstanceNorm1d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (fc): Linear(in_features=128, out_features=512, bias=True)
        )
        (dropout): Dropout(p=0.2, inplace=False)
        (pool): Identity()
      )
    )
    (F0_proj): Conv1d(256, 1, kernel_size=(1,), stride=(1,))
    (N_proj): Conv1d(256, 1, kernel_size=(1,), stride=(1,))
  )
  (text_encoder): TextEncoder(
    (embedding): Embedding(178, 512)
    (cnn): ModuleList(
      (0-2): 3 x Sequential(
        (0): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
        (1): LayerNorm()
        (2): LeakyReLU(negative_slope=0.2)
        (3): Dropout(p=0.2, inplace=False)
      )
    )
    (lstm): LSTM(512, 256, batch_first=True, bidirectional=True)
  )
  (decoder): Decoder(
    (encode): AdainResBlk1d(
      (actv): LeakyReLU(negative_slope=0.2)
      (upsample): UpSample1d()
      (conv1): Conv1d(514, 1024, kernel_size=(3,), stride=(1,), padding=(1,))
      (conv2): Conv1d(1024, 1024, kernel_size=(3,), stride=(1,), padding=(1,))
      (norm1): AdaIN1d(
        (norm): InstanceNorm1d(514, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (fc): Linear(in_features=128, out_features=1028, bias=True)
      )
      (norm2): AdaIN1d(
        (norm): InstanceNorm1d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (fc): Linear(in_features=128, out_features=2048, bias=True)
      )
      (conv1x1): Conv1d(514, 1024, kernel_size=(1,), stride=(1,), bias=False)
      (dropout): Dropout(p=0.0, inplace=False)
      (pool): Identity()
    )
    (decode): ModuleList(
      (0-2): 3 x AdainResBlk1d(
        (actv): LeakyReLU(negative_slope=0.2)
        (upsample): UpSample1d()
        (conv1): Conv1d(1090, 1024, kernel_size=(3,), stride=(1,), padding=(1,))
        (conv2): Conv1d(1024, 1024, kernel_size=(3,), stride=(1,), padding=(1,))
        (norm1): AdaIN1d(
          (norm): InstanceNorm1d(1090, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (fc): Linear(in_features=128, out_features=2180, bias=True)
        )
        (norm2): AdaIN1d(
          (norm): InstanceNorm1d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (fc): Linear(in_features=128, out_features=2048, bias=True)
        )
        (conv1x1): Conv1d(1090, 1024, kernel_size=(1,), stride=(1,), bias=False)
        (dropout): Dropout(p=0.0, inplace=False)
        (pool): Identity()
      )
      (3): AdainResBlk1d(
        (actv): LeakyReLU(negative_slope=0.2)
        (upsample): UpSample1d()
        (conv1): Conv1d(1090, 512, kernel_size=(3,), stride=(1,), padding=(1,))
        (conv2): Conv1d(512, 512, kernel_size=(3,), stride=(1,), padding=(1,))
        (norm1): AdaIN1d(
          (norm): InstanceNorm1d(1090, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (fc): Linear(in_features=128, out_features=2180, bias=True)
        )
        (norm2): AdaIN1d(
          (norm): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
          (fc): Linear(in_features=128, out_features=1024, bias=True)
        )
        (conv1x1): Conv1d(1090, 512, kernel_size=(1,), stride=(1,), bias=False)
        (dropout): Dropout(p=0.0, inplace=False)
        (pool): ConvTranspose1d(1090, 1090, kernel_size=(3,), stride=(2,), padding=(1,), output_padding=(1,), groups=1090)
      )
    )
    (F0_conv): Conv1d(1, 1, kernel_size=(3,), stride=(2,), padding=(1,))
    (N_conv): Conv1d(1, 1, kernel_size=(3,), stride=(2,), padding=(1,))
    (asr_res): Sequential(
      (0): Conv1d(512, 64, kernel_size=(1,), stride=(1,))
    )
    (generator): Generator(
      (m_source): SourceModuleHnNSF(
        (l_sin_gen): SineGen()
        (l_linear): Linear(in_features=9, out_features=1, bias=True)
        (l_tanh): Tanh()
      )
      (f0_upsamp): Upsample(scale_factor=300.0, mode='nearest')
      (noise_convs): ModuleList(
        (0): Conv1d(22, 256, kernel_size=(12,), stride=(6,), padding=(3,))
        (1): Conv1d(22, 128, kernel_size=(1,), stride=(1,))
      )
      (noise_res): ModuleList(
        (0): AdaINResBlock1(
          (convs1): ModuleList(
            (0): Conv1d(256, 256, kernel_size=(7,), stride=(1,), padding=(3,))
            (1): Conv1d(256, 256, kernel_size=(7,), stride=(1,), padding=(9,), dilation=(3,))
            (2): Conv1d(256, 256, kernel_size=(7,), stride=(1,), padding=(15,), dilation=(5,))
          )
          (convs2): ModuleList(
            (0-2): 3 x Conv1d(256, 256, kernel_size=(7,), stride=(1,), padding=(3,))
          )
          (adain1): ModuleList(
            (0-2): 3 x AdaIN1d(
              (norm): InstanceNorm1d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (fc): Linear(in_features=128, out_features=512, bias=True)
            )
          )
          (adain2): ModuleList(
            (0-2): 3 x AdaIN1d(
              (norm): InstanceNorm1d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (fc): Linear(in_features=128, out_features=512, bias=True)
            )
          )
          (alpha1): ParameterList(
              (0): Parameter containing: [torch.float32 of size 1x256x1]
              (1): Parameter containing: [torch.float32 of size 1x256x1]
              (2): Parameter containing: [torch.float32 of size 1x256x1]
          )
          (alpha2): ParameterList(
              (0): Parameter containing: [torch.float32 of size 1x256x1]
              (1): Parameter containing: [torch.float32 of size 1x256x1]
              (2): Parameter containing: [torch.float32 of size 1x256x1]
          )
        )
        (1): AdaINResBlock1(
          (convs1): ModuleList(
            (0): Conv1d(128, 128, kernel_size=(11,), stride=(1,), padding=(5,))
            (1): Conv1d(128, 128, kernel_size=(11,), stride=(1,), padding=(15,), dilation=(3,))
            (2): Conv1d(128, 128, kernel_size=(11,), stride=(1,), padding=(25,), dilation=(5,))
          )
          (convs2): ModuleList(
            (0-2): 3 x Conv1d(128, 128, kernel_size=(11,), stride=(1,), padding=(5,))
          )
          (adain1): ModuleList(
            (0-2): 3 x AdaIN1d(
              (norm): InstanceNorm1d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (fc): Linear(in_features=128, out_features=256, bias=True)
            )
          )
          (adain2): ModuleList(
            (0-2): 3 x AdaIN1d(
              (norm): InstanceNorm1d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (fc): Linear(in_features=128, out_features=256, bias=True)
            )
          )
          (alpha1): ParameterList(
              (0): Parameter containing: [torch.float32 of size 1x128x1]
              (1): Parameter containing: [torch.float32 of size 1x128x1]
              (2): Parameter containing: [torch.float32 of size 1x128x1]
          )
          (alpha2): ParameterList(
              (0): Parameter containing: [torch.float32 of size 1x128x1]
              (1): Parameter containing: [torch.float32 of size 1x128x1]
              (2): Parameter containing: [torch.float32 of size 1x128x1]
          )
        )
      )
      (ups): ModuleList(
        (0): ConvTranspose1d(512, 256, kernel_size=(20,), stride=(10,), padding=(5,))
        (1): ConvTranspose1d(256, 128, kernel_size=(12,), stride=(6,), padding=(3,))
      )
      (resblocks): ModuleList(
        (0): AdaINResBlock1(
          (convs1): ModuleList(
            (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,))
            (1): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(3,), dilation=(3,))
            (2): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(5,), dilation=(5,))
          )
          (convs2): ModuleList(
            (0-2): 3 x Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,))
          )
          (adain1): ModuleList(
            (0-2): 3 x AdaIN1d(
              (norm): InstanceNorm1d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (fc): Linear(in_features=128, out_features=512, bias=True)
            )
          )
          (adain2): ModuleList(
            (0-2): 3 x AdaIN1d(
              (norm): InstanceNorm1d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (fc): Linear(in_features=128, out_features=512, bias=True)
            )
          )
          (alpha1): ParameterList(
              (0): Parameter containing: [torch.float32 of size 1x256x1]
              (1): Parameter containing: [torch.float32 of size 1x256x1]
              (2): Parameter containing: [torch.float32 of size 1x256x1]
          )
          (alpha2): ParameterList(
              (0): Parameter containing: [torch.float32 of size 1x256x1]
              (1): Parameter containing: [torch.float32 of size 1x256x1]
              (2): Parameter containing: [torch.float32 of size 1x256x1]
          )
        )
        (1): AdaINResBlock1(
          (convs1): ModuleList(
            (0): Conv1d(256, 256, kernel_size=(7,), stride=(1,), padding=(3,))
            (1): Conv1d(256, 256, kernel_size=(7,), stride=(1,), padding=(9,), dilation=(3,))
            (2): Conv1d(256, 256, kernel_size=(7,), stride=(1,), padding=(15,), dilation=(5,))
          )
          (convs2): ModuleList(
            (0-2): 3 x Conv1d(256, 256, kernel_size=(7,), stride=(1,), padding=(3,))
          )
          (adain1): ModuleList(
            (0-2): 3 x AdaIN1d(
              (norm): InstanceNorm1d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (fc): Linear(in_features=128, out_features=512, bias=True)
            )
          )
          (adain2): ModuleList(
            (0-2): 3 x AdaIN1d(
              (norm): InstanceNorm1d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (fc): Linear(in_features=128, out_features=512, bias=True)
            )
          )
          (alpha1): ParameterList(
              (0): Parameter containing: [torch.float32 of size 1x256x1]
              (1): Parameter containing: [torch.float32 of size 1x256x1]
              (2): Parameter containing: [torch.float32 of size 1x256x1]
          )
          (alpha2): ParameterList(
              (0): Parameter containing: [torch.float32 of size 1x256x1]
              (1): Parameter containing: [torch.float32 of size 1x256x1]
              (2): Parameter containing: [torch.float32 of size 1x256x1]
          )
        )
        (2): AdaINResBlock1(
          (convs1): ModuleList(
            (0): Conv1d(256, 256, kernel_size=(11,), stride=(1,), padding=(5,))
            (1): Conv1d(256, 256, kernel_size=(11,), stride=(1,), padding=(15,), dilation=(3,))
            (2): Conv1d(256, 256, kernel_size=(11,), stride=(1,), padding=(25,), dilation=(5,))
          )
          (convs2): ModuleList(
            (0-2): 3 x Conv1d(256, 256, kernel_size=(11,), stride=(1,), padding=(5,))
          )
          (adain1): ModuleList(
            (0-2): 3 x AdaIN1d(
              (norm): InstanceNorm1d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (fc): Linear(in_features=128, out_features=512, bias=True)
            )
          )
          (adain2): ModuleList(
            (0-2): 3 x AdaIN1d(
              (norm): InstanceNorm1d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (fc): Linear(in_features=128, out_features=512, bias=True)
            )
          )
          (alpha1): ParameterList(
              (0): Parameter containing: [torch.float32 of size 1x256x1]
              (1): Parameter containing: [torch.float32 of size 1x256x1]
              (2): Parameter containing: [torch.float32 of size 1x256x1]
          )
          (alpha2): ParameterList(
              (0): Parameter containing: [torch.float32 of size 1x256x1]
              (1): Parameter containing: [torch.float32 of size 1x256x1]
              (2): Parameter containing: [torch.float32 of size 1x256x1]
          )
        )
        (3): AdaINResBlock1(
          (convs1): ModuleList(
            (0): Conv1d(128, 128, kernel_size=(3,), stride=(1,), padding=(1,))
            (1): Conv1d(128, 128, kernel_size=(3,), stride=(1,), padding=(3,), dilation=(3,))
            (2): Conv1d(128, 128, kernel_size=(3,), stride=(1,), padding=(5,), dilation=(5,))
          )
          (convs2): ModuleList(
            (0-2): 3 x Conv1d(128, 128, kernel_size=(3,), stride=(1,), padding=(1,))
          )
          (adain1): ModuleList(
            (0-2): 3 x AdaIN1d(
              (norm): InstanceNorm1d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (fc): Linear(in_features=128, out_features=256, bias=True)
            )
          )
          (adain2): ModuleList(
            (0-2): 3 x AdaIN1d(
              (norm): InstanceNorm1d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (fc): Linear(in_features=128, out_features=256, bias=True)
            )
          )
          (alpha1): ParameterList(
              (0): Parameter containing: [torch.float32 of size 1x128x1]
              (1): Parameter containing: [torch.float32 of size 1x128x1]
              (2): Parameter containing: [torch.float32 of size 1x128x1]
          )
          (alpha2): ParameterList(
              (0): Parameter containing: [torch.float32 of size 1x128x1]
              (1): Parameter containing: [torch.float32 of size 1x128x1]
              (2): Parameter containing: [torch.float32 of size 1x128x1]
          )
        )
        (4): AdaINResBlock1(
          (convs1): ModuleList(
            (0): Conv1d(128, 128, kernel_size=(7,), stride=(1,), padding=(3,))
            (1): Conv1d(128, 128, kernel_size=(7,), stride=(1,), padding=(9,), dilation=(3,))
            (2): Conv1d(128, 128, kernel_size=(7,), stride=(1,), padding=(15,), dilation=(5,))
          )
          (convs2): ModuleList(
            (0-2): 3 x Conv1d(128, 128, kernel_size=(7,), stride=(1,), padding=(3,))
          )
          (adain1): ModuleList(
            (0-2): 3 x AdaIN1d(
              (norm): InstanceNorm1d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (fc): Linear(in_features=128, out_features=256, bias=True)
            )
          )
          (adain2): ModuleList(
            (0-2): 3 x AdaIN1d(
              (norm): InstanceNorm1d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (fc): Linear(in_features=128, out_features=256, bias=True)
            )
          )
          (alpha1): ParameterList(
              (0): Parameter containing: [torch.float32 of size 1x128x1]
              (1): Parameter containing: [torch.float32 of size 1x128x1]
              (2): Parameter containing: [torch.float32 of size 1x128x1]
          )
          (alpha2): ParameterList(
              (0): Parameter containing: [torch.float32 of size 1x128x1]
              (1): Parameter containing: [torch.float32 of size 1x128x1]
              (2): Parameter containing: [torch.float32 of size 1x128x1]
          )
        )
        (5): AdaINResBlock1(
          (convs1): ModuleList(
            (0): Conv1d(128, 128, kernel_size=(11,), stride=(1,), padding=(5,))
            (1): Conv1d(128, 128, kernel_size=(11,), stride=(1,), padding=(15,), dilation=(3,))
            (2): Conv1d(128, 128, kernel_size=(11,), stride=(1,), padding=(25,), dilation=(5,))
          )
          (convs2): ModuleList(
            (0-2): 3 x Conv1d(128, 128, kernel_size=(11,), stride=(1,), padding=(5,))
          )
          (adain1): ModuleList(
            (0-2): 3 x AdaIN1d(
              (norm): InstanceNorm1d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (fc): Linear(in_features=128, out_features=256, bias=True)
            )
          )
          (adain2): ModuleList(
            (0-2): 3 x AdaIN1d(
              (norm): InstanceNorm1d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
              (fc): Linear(in_features=128, out_features=256, bias=True)
            )
          )
          (alpha1): ParameterList(
              (0): Parameter containing: [torch.float32 of size 1x128x1]
              (1): Parameter containing: [torch.float32 of size 1x128x1]
              (2): Parameter containing: [torch.float32 of size 1x128x1]
          )
          (alpha2): ParameterList(
              (0): Parameter containing: [torch.float32 of size 1x128x1]
              (1): Parameter containing: [torch.float32 of size 1x128x1]
              (2): Parameter containing: [torch.float32 of size 1x128x1]
          )
        )
      )
      (conv_post): Conv1d(128, 22, kernel_size=(7,), stride=(1,), padding=(3,))
      (reflection_pad): ReflectionPad1d((1, 0))
      (stft): TorchSTFT()
    )
  )
)
@aws-rishyraj
Copy link
Contributor

Hi @mansourkheffache,

I took a brief look at the model code, and I see the model expects str and int types as input. torch_neuronx.trace only accepts models that take in PyTorch Tensors or List/Tuples of Tensors as units of input. I suggest to try tracing only the large pieces of the model that do take in tensors (ex kokoro_model.bert) to start with, and slowly start to trace more components of the model.

@mansourkheffache
Copy link
Author

Hi @aws-rishyraj ,

Thank you for taking the time and the recommendation.

I just updated the module locally and changed the signature of the forward function to be:

    def forward(self, input_tokens: torch.LongTensor, ref_s: torch.FloatTensor) -> torch.FloatTensor:

Tested running the model with the new set up and it worked fine, giving the same output as the initial implementation, however still getting the exact same error when trying the trace method:

>>> neuron_model = torch_neuronx.trace(model, sample_input)
2025-01-31 19:56:43.000756:  6124  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --framework=XLA /tmp/ubuntu/neuroncc_compile_workdir/dde7e8a0-a941-4b04-af89-b441127e76af/model.MODULE_11533760926461001556+e30acd3a.hlo_module.pb --output /tmp/ubuntu/neuroncc_compile_workdir/dde7e8a0-a941-4b04-af89-b441127e76af/model.MODULE_11533760926461001556+e30acd3a.neff --target=trn1 --verbose=35
.
Compiler status PASS
2025-01-31 19:56:45.000522:  6124  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --framework=XLA /tmp/ubuntu/neuroncc_compile_workdir/6a4dada9-8a98-4cbc-a841-a4cbd9d98611/model.MODULE_457786941100183860+e30acd3a.hlo_module.pb --output /tmp/ubuntu/neuroncc_compile_workdir/6a4dada9-8a98-4cbc-a841-a4cbd9d98611/model.MODULE_457786941100183860+e30acd3a.neff --target=trn1 --verbose=35
.
Compiler status PASS
2025-01-31 19:56:47.000508:  6124  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --framework=XLA /tmp/ubuntu/neuroncc_compile_workdir/75a5ece5-577b-469f-a79b-2611ece84898/model.MODULE_3944879240155622146+e30acd3a.hlo_module.pb --output /tmp/ubuntu/neuroncc_compile_workdir/75a5ece5-577b-469f-a79b-2611ece84898/model.MODULE_3944879240155622146+e30acd3a.neff --target=trn1 --verbose=35
.root = neuronxcc/starfish/penguin/targets/sunda/passes/SundaSizeTiling.py
root = neuronxcc/starfish/penguin/targets/sunda/passes
root = neuronxcc/starfish/penguin/targets/sunda
root = neuronxcc/starfish/penguin/targets
root = neuronxcc/starfish/penguin
root = neuronxcc/starfish

2025-01-31 19:57:01.000199:  6124  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/75a5ece5-577b-469f-a79b-2611ece84898/model.MODULE_3944879240155622146+e30acd3a.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/75a5ece5-577b-469f-a79b-2611ece84898/model.MODULE_3944879240155622146+e30acd3a.neff', '--target=trn1', '--verbose=35']: 2025-01-31T19:57:01Z [TEN404] Internal tensorizer error: SundaSizeTiling:tuple index out of range - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.

2025-01-31 19:57:01.000199:  6124  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/ubuntu/neuroncc_compile_workdir/75a5ece5-577b-469f-a79b-2611ece84898/model.MODULE_3944879240155622146+e30acd3a.hlo_module.pb after 0 retries.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 589, in trace
    neff_filename, metaneff, flattener, packer, weights = _trace(
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 646, in _trace
    hlo_artifacts = generate_hlo(
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 450, in generate_hlo
    ) = xla_trace(
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch_neuronx/xla_impl/hlo_conversion.py", line 138, in xla_trace
    outputs = func(*example_inputs)
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/kokoro/model.py", line 80, in forward
    d = self.predictor.text_encoder(d_en, s, input_lengths, text_mask)
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/kokoro/modules.py", line 175, in forward
    x_pad[:, :, :x.shape[-1]] = x
RuntimeError: Bad StatusOr access: INTERNAL: RunNeuronCCImpl: error condition error != 0: <class 'subprocess.CalledProcessError'>: Command '['neuronx-cc', 'compile', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/75a5ece5-577b-469f-a79b-2611ece84898/model.MODULE_3944879240155622146+e30acd3a.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/75a5ece5-577b-469f-a79b-2611ece84898/model.MODULE_3944879240155622146+e30acd3a.neff', '--target=trn1', '--verbose=35']' returned non-zero exit status 70.

Any input or idea what I could try to fix this?

@mansourkheffache
Copy link
Author

@aws-rishyraj please let me know if you need more data or debug logs from the command.

@aws-rishyraj
Copy link
Contributor

Hi @mansourkheffache,

It looks like the underlying tracing process is unable to trace the full model, and is fragmenting the graph which is more susceptible to compilation issues.

Generally speaking, models that use pure torch tensor operations and have no conditional logic are the most compatible with torch_neuronx.trace. The kokoro model does not look like it fits that criteria, so some extra work needs to be done to get it to run on Neuron.

This is why I suggest compiling a small piece of the model and slowly add more pieces of the model.
Example:

kokoro_model.bert = torch_neuronx.trace(kokoro_model.bert, mock_inputs_for_kokoro_bert)

@mansourkheffache
Copy link
Author

Hi @aws-rishyraj ,

That makes a lot of sense. I will give it a shot. Thanks a lot for the guidance!

@mansourkheffache
Copy link
Author

Just a quick update on this, after some digging, I found that HuggingFace already provides some models already compiled with AWS Inferentia through HF Optimum Neuron, including ALBERT which is the model the bert module is based upon: https://huggingface.co/docs/optimum-neuron/package_reference/configuration#supported-architectures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants