Warn if wrong type is given for Llama export for XNNPACK #8195

mergennachin · 2025-02-04T21:05:34Z

Followup to #7775

Test Plan:

(executorch) mnachin@mnachin-mbp executorch % python -m examples.models.llama.export_llama \
    --checkpoint /Users/mnachin/models/Llama-3.2-1B/original/consolidated.00.pth \
        -p /Users/mnachin/models/Llama-3.2-1B/original/params.json \
        -kv \
        --use_sdpa_with_kv_cache \
        -X \
        -qmode 8da4w \
        --group_size 128 \
        -d bf16 \
        --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' \
        --embedding-quantize 4,32 \
        --output_name="Llama-8B.pte"

import error: No module named 'triton'
Traceback (most recent call last):
  File "/Users/mnachin/miniconda/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/mnachin/miniconda/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/mnachin/executorch/examples/models/llama/export_llama.py", line 32, in <module>
    main()  # pragma: no cover
  File "/Users/mnachin/executorch/examples/models/llama/export_llama.py", line 28, in main
    export_llama(args)
  File "/Users/mnachin/executorch/examples/models/llama/export_llama_lib.py", line 542, in export_llama
    builder = _export_llama(args)
  File "/Users/mnachin/executorch/examples/models/llama/export_llama_lib.py", line 685, in _export_llama
    _validate_args(args)
  File "/Users/mnachin/executorch/examples/models/llama/export_llama_lib.py", line 679, in _validate_args
    raise ValueError(
ValueError: XNNPACK supports either fp32 or fp16 dtypes only for now. Given bf16.

pytorch-bot · 2025-02-04T21:05:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8195

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCM Infra failures during checkout of PyTorch

✅ No Failures

As of commit 6d60669 with merge base a3455d9 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Followup to #7775

mergennachin · 2025-02-04T21:39:32Z

@mcr229 @digantdesai

wait a sec, how does Llama3.2 1B/3B bf16, unquantized format work for XNNPACK?

https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#option-a-download-and-export-llama32-1b3b-model

mcr229 · 2025-02-05T20:01:10Z

@mergennachin I don't think that should work...

mergennachin · 2025-02-05T20:07:26Z

I don't think that should work...

@mcr229 - no, that works. we've tested that during Meta Connect launch. Keep an mind, I am specifically talking about original unquantized bf16 Llama3.2 1B/3B

cc @swolchok

swolchok · 2025-02-05T20:18:00Z

@mergennachin unquantized bf16 llama3.2 1B/3B does not work with XNNPACK, but it does work without XNNPACK. Looks like somebody put -X in https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#option-a-download-and-export-llama32-1b3b-model in error.

mergennachin · 2025-02-05T20:41:44Z

@swolchok

Actually I just tried reproducing it. It's the other way around. For unquantized bf16 1B/3B bf16 model:

With -X and with -d bf16 flags during export, it is successfully able to generate '.pte' file and the llama runner is able to execute. Here's even an internal CI test: https://fburl.com/code/4nvjis6p

Without -X and with -d bf16 flag during export, it is also successfully able to generate '.pte' file but the llama runner is error-ing out with

E 00:00:01.963468 executorch:tensor_util.h:482] Check failed (t.scalar_type() == dtype): Expected to find Half type, but tensor has type BFloat16

swolchok · 2025-02-05T20:44:27Z

@swolchok

Actually I just tried reproducing it. It's the other way around. For unquantized bf16 1B/3B bf16 model:

With -X and with -d bf16 flags during export, it is successfully able to generate '.pte' file and the llama runner is able to execute. Here's even an internal CI test: https://fburl.com/code/4nvjis6p

Without -X and with -d bf16 flag during export, it is also successfully able to generate '.pte' file but the llama runner is error-ing out with
E 00:00:01.963468 executorch:tensor_util.h:482] Check failed (t.scalar_type() == dtype): Expected to find Half type, but tensor has type BFloat16

hmm, this is strange. XNNPACK did not support bfloat16 last I checked, and I elected to support bfloat16 with portable/optimized ops instead of adding that support to XNNPACK. something is wrong.

swolchok · 2025-02-05T20:48:38Z

I would guess that probably the -X flag just doesn't do anything, but then why does the exported model work with it but not without? needs investigation

swolchok · 2025-02-05T22:01:13Z

Actually I just tried reproducing it. It's the other way around. For unquantized bf16 1B/3B bf16 model:

I tried as well; works both with and without the -X flag for me on the executorch main branch, commit hash 9020fd2

mcr229 · 2025-02-06T20:01:58Z

@swolchok yea this makes sense, if unquantized bf16, applying -X would do nothing as our partitioner would recognize none of the ops as lowerable, so this should do nothing. There would only be a problem with quantized bf16, because again XNNPACK would be unable to lower anything, and the resulting quantization ops would throw an error at the to_executorch stage.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 4, 2025

mergennachin requested review from mcr229 and digantdesai February 4, 2025 21:05

mergennachin added the topic: not user facing label Feb 4, 2025

mcr229 approved these changes Feb 4, 2025

View reviewed changes

mergennachin force-pushed the fp16_bf16_xnnpack branch from 8db749c to 98692bf Compare February 4, 2025 21:14

Warn if wrong type is given for Llama export for XNNPACK

6d60669

Followup to #7775

mergennachin force-pushed the fp16_bf16_xnnpack branch from 98692bf to 6d60669 Compare February 4, 2025 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warn if wrong type is given for Llama export for XNNPACK #8195

Warn if wrong type is given for Llama export for XNNPACK #8195

mergennachin commented Feb 4, 2025 •

edited

Loading

pytorch-bot bot commented Feb 4, 2025 •

edited

Loading

mergennachin commented Feb 4, 2025

mcr229 commented Feb 5, 2025

mergennachin commented Feb 5, 2025

swolchok commented Feb 5, 2025 •

edited

Loading

mergennachin commented Feb 5, 2025

swolchok commented Feb 5, 2025 •

edited

Loading

swolchok commented Feb 5, 2025

swolchok commented Feb 5, 2025

mcr229 commented Feb 6, 2025

Warn if wrong type is given for Llama export for XNNPACK #8195

Are you sure you want to change the base?

Warn if wrong type is given for Llama export for XNNPACK #8195

Conversation

mergennachin commented Feb 4, 2025 • edited Loading

pytorch-bot bot commented Feb 4, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8195

❗ 1 Active SEVs

✅ No Failures

mergennachin commented Feb 4, 2025

mcr229 commented Feb 5, 2025

mergennachin commented Feb 5, 2025

swolchok commented Feb 5, 2025 • edited Loading

mergennachin commented Feb 5, 2025

swolchok commented Feb 5, 2025 • edited Loading

swolchok commented Feb 5, 2025

swolchok commented Feb 5, 2025

mcr229 commented Feb 6, 2025

mergennachin commented Feb 4, 2025 •

edited

Loading

pytorch-bot bot commented Feb 4, 2025 •

edited

Loading

swolchok commented Feb 5, 2025 •

edited

Loading

swolchok commented Feb 5, 2025 •

edited

Loading