Make Flux Transformer RoPE use a custom IREE kernel #871

sogartar · 2025-01-27T15:26:16Z

We assume that the custom kernel would yield better performance instead of using PyTorch ops.

sogartar · 2025-01-27T15:28:37Z

To match the required dimension order, axis permutation is performed.
I am not sure if this can be optimized out. Needs some investigation of
the compilation passes.

This change introduces a significant (10X) deterioration in performance.
Baseline performance is 552 ms.
With this change it is 6131 ms.

sogartar · 2025-01-31T15:13:51Z

The large performance problem has been addressed by iree-org/iree#19822.

renxida · 2025-01-31T16:50:34Z

sharktank/sharktank/layers/rotary_embedding.py

+
+
+def compute_rotary_embedding_table(
+    positions: torch.Tensor,


Nit: would it make more sense to just rename _compute_rotary_embedding_table?

I want to use this function outside of the class.

Yes, i mean, instead of:

copying compute_rotary_embedding_table so we can use it outside the class

make old _compute_rotary_embedding_table redirect to compute_rotary_embedding_table

Just

rename _compute_rotary_embedding_table to compute_rotary_embedding_table and use it outside the class

change all referenes to _compute_rotary_embedding_table to use compute_rotary_embedding_table instead

The latter requires an IDE and is slightly more work, but does not leave a stub/redirect function behind.

The class variant got updated to handle the hugging face case so it has more hair now.

We assume that the custom kernel would yield better performance instead of using PyTorch ops.

sogartar · 2025-02-01T01:11:09Z

This PR is waiting on iree-org/iree#19829.

sogartar mentioned this pull request Jan 27, 2025

Flux Dev transformer RoPE IREE custom kernel bad performance iree-org/iree#19822

Open

sogartar force-pushed the flux-transformer-rope-with-kernel branch 2 times, most recently from 4ff5c3e to 3b751f4 Compare January 28, 2025 23:40

sogartar requested review from KyleHerndon, rsuderman and IanNod January 31, 2025 15:11

sogartar marked this pull request as ready for review January 31, 2025 15:12

renxida approved these changes Jan 31, 2025

View reviewed changes

renxida reviewed Jan 31, 2025

View reviewed changes

Make Flux Transformer RoPE use a custom IREE kernel

48e1b04

We assume that the custom kernel would yield better performance instead of using PyTorch ops.

sogartar force-pushed the flux-transformer-rope-with-kernel branch from 3b751f4 to 48e1b04 Compare January 31, 2025 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Flux Transformer RoPE use a custom IREE kernel #871

Make Flux Transformer RoPE use a custom IREE kernel #871

sogartar commented Jan 27, 2025

sogartar commented Jan 27, 2025

sogartar commented Jan 31, 2025

renxida Jan 31, 2025

sogartar Jan 31, 2025

renxida Jan 31, 2025

sogartar Feb 3, 2025

sogartar commented Feb 1, 2025

Make Flux Transformer RoPE use a custom IREE kernel #871

Are you sure you want to change the base?

Make Flux Transformer RoPE use a custom IREE kernel #871

Conversation

sogartar commented Jan 27, 2025

sogartar commented Jan 27, 2025

sogartar commented Jan 31, 2025

renxida Jan 31, 2025

Choose a reason for hiding this comment

sogartar Jan 31, 2025

Choose a reason for hiding this comment

renxida Jan 31, 2025

Choose a reason for hiding this comment

sogartar Feb 3, 2025

Choose a reason for hiding this comment

sogartar commented Feb 1, 2025