-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warn if wrong type is given for Llama export for XNNPACK #8195
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8195
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 6d60669 with merge base a3455d9 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
8db749c
to
98692bf
Compare
98692bf
to
6d60669
Compare
wait a sec, how does Llama3.2 1B/3B bf16, unquantized format work for XNNPACK? |
@mergennachin I don't think that should work... |
@mergennachin unquantized bf16 llama3.2 1B/3B does not work with XNNPACK, but it does work without XNNPACK. Looks like somebody put -X in https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#option-a-download-and-export-llama32-1b3b-model in error. |
Actually I just tried reproducing it. It's the other way around. For unquantized bf16 1B/3B bf16 model: With -X and with -d bf16 flags during export, it is successfully able to generate '.pte' file and the llama runner is able to execute. Here's even an internal CI test: https://fburl.com/code/4nvjis6p Without -X and with -d bf16 flag during export, it is also successfully able to generate '.pte' file but the llama runner is error-ing out with
|
hmm, this is strange. XNNPACK did not support bfloat16 last I checked, and I elected to support bfloat16 with portable/optimized ops instead of adding that support to XNNPACK. something is wrong. |
I would guess that probably the -X flag just doesn't do anything, but then why does the exported model work with it but not without? needs investigation |
I tried as well; works both with and without the -X flag for me on the executorch main branch, commit hash 9020fd2 |
@swolchok yea this makes sense, if unquantized bf16, applying -X would do nothing as our partitioner would recognize none of the ops as lowerable, so this should do nothing. There would only be a problem with quantized bf16, because again XNNPACK would be unable to lower anything, and the resulting quantization ops would throw an error at the to_executorch stage. |
Followup to #7775
Test Plan: