Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn if wrong type is given for Llama export for XNNPACK #8195

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mergennachin
Copy link
Contributor

@mergennachin mergennachin commented Feb 4, 2025

Followup to #7775

Test Plan:

(executorch) mnachin@mnachin-mbp executorch % python -m examples.models.llama.export_llama \
    --checkpoint /Users/mnachin/models/Llama-3.2-1B/original/consolidated.00.pth \
        -p /Users/mnachin/models/Llama-3.2-1B/original/params.json \
        -kv \
        --use_sdpa_with_kv_cache \
        -X \
        -qmode 8da4w \
        --group_size 128 \
        -d bf16 \
        --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' \
        --embedding-quantize 4,32 \
        --output_name="Llama-8B.pte"
import error: No module named 'triton'
Traceback (most recent call last):
  File "/Users/mnachin/miniconda/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/mnachin/miniconda/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/mnachin/executorch/examples/models/llama/export_llama.py", line 32, in <module>
    main()  # pragma: no cover
  File "/Users/mnachin/executorch/examples/models/llama/export_llama.py", line 28, in main
    export_llama(args)
  File "/Users/mnachin/executorch/examples/models/llama/export_llama_lib.py", line 542, in export_llama
    builder = _export_llama(args)
  File "/Users/mnachin/executorch/examples/models/llama/export_llama_lib.py", line 685, in _export_llama
    _validate_args(args)
  File "/Users/mnachin/executorch/examples/models/llama/export_llama_lib.py", line 679, in _validate_args
    raise ValueError(
ValueError: XNNPACK supports either fp32 or fp16 dtypes only for now. Given bf16.

Copy link

pytorch-bot bot commented Feb 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8195

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 6d60669 with merge base a3455d9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 4, 2025
@mergennachin
Copy link
Contributor Author

@mcr229
Copy link
Contributor

mcr229 commented Feb 5, 2025

@mergennachin I don't think that should work...

@mergennachin
Copy link
Contributor Author

I don't think that should work...

@mcr229 - no, that works. we've tested that during Meta Connect launch. Keep an mind, I am specifically talking about original unquantized bf16 Llama3.2 1B/3B

cc @swolchok

@swolchok
Copy link
Contributor

swolchok commented Feb 5, 2025

@mergennachin unquantized bf16 llama3.2 1B/3B does not work with XNNPACK, but it does work without XNNPACK. Looks like somebody put -X in https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#option-a-download-and-export-llama32-1b3b-model in error.

@mergennachin
Copy link
Contributor Author

@swolchok

Actually I just tried reproducing it. It's the other way around. For unquantized bf16 1B/3B bf16 model:

With -X and with -d bf16 flags during export, it is successfully able to generate '.pte' file and the llama runner is able to execute. Here's even an internal CI test: https://fburl.com/code/4nvjis6p

Without -X and with -d bf16 flag during export, it is also successfully able to generate '.pte' file but the llama runner is error-ing out with

E 00:00:01.963468 executorch:tensor_util.h:482] Check failed (t.scalar_type() == dtype): Expected to find Half type, but tensor has type BFloat16

@swolchok
Copy link
Contributor

swolchok commented Feb 5, 2025

@swolchok

Actually I just tried reproducing it. It's the other way around. For unquantized bf16 1B/3B bf16 model:

With -X and with -d bf16 flags during export, it is successfully able to generate '.pte' file and the llama runner is able to execute. Here's even an internal CI test: https://fburl.com/code/4nvjis6p

Without -X and with -d bf16 flag during export, it is also successfully able to generate '.pte' file but the llama runner is error-ing out with

E 00:00:01.963468 executorch:tensor_util.h:482] Check failed (t.scalar_type() == dtype): Expected to find Half type, but tensor has type BFloat16

hmm, this is strange. XNNPACK did not support bfloat16 last I checked, and I elected to support bfloat16 with portable/optimized ops instead of adding that support to XNNPACK. something is wrong.

@swolchok
Copy link
Contributor

swolchok commented Feb 5, 2025

I would guess that probably the -X flag just doesn't do anything, but then why does the exported model work with it but not without? needs investigation

@swolchok
Copy link
Contributor

swolchok commented Feb 5, 2025

Actually I just tried reproducing it. It's the other way around. For unquantized bf16 1B/3B bf16 model:

I tried as well; works both with and without the -X flag for me on the executorch main branch, commit hash 9020fd2

@mcr229
Copy link
Contributor

mcr229 commented Feb 6, 2025

@swolchok yea this makes sense, if unquantized bf16, applying -X would do nothing as our partitioner would recognize none of the ops as lowerable, so this should do nothing. There would only be a problem with quantized bf16, because again XNNPACK would be unable to lower anything, and the resulting quantization ops would throw an error at the to_executorch stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants