Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add cross_entropy to torchcompile_cat executor #1655

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

riccardofelluga
Copy link
Collaborator

What does this PR do?

In an attempt to fix #1654 and partially #1552, this PR adds the necessary ops in the torchcompile_cat list to capture HF CausalLMLoss.

Before merging this PR needs testing with other models, I will write a message with the results of the benchmarks.

@IvanYashchuk
Copy link
Collaborator

This executor is meant for fusions with a cat operation. Cross entropy is usually quite far from RoPE (which uses cat). How does adding cross_entopy to torchcompile_cat help?

@riccardofelluga
Copy link
Collaborator Author

@IvanYashchuk Adding cross entropy does indeed not help with rope but it allows thunder to use a very efficient fused cross entropy triton kernel which perf currently cannot be matched with the other executors(not even apex). From lines 211 and 212, it does seem that while the set of ops was originally though for rope, it is suggested that other ops could have been added in the future making this a fusion executor that comes in rescue of nvFuser when it can improve performance. Do you think that it would be better to create another executor entry for inductor cross entropy only?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nvFuser using more memory than inductor for HF CausalLMLoss
2 participants