Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPTQ] Vision Model Support #850

Closed
wants to merge 51 commits into from
Closed
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
98b284b
WIP
kylesayrs Oct 16, 2024
e3a98cc
WIP: begin quantize_weight
kylesayrs Oct 16, 2024
bc9b3bc
WIP
kylesayrs Oct 16, 2024
b77c7bf
WIP
kylesayrs Oct 16, 2024
7be5aed
wip
kylesayrs Oct 16, 2024
e01094f
compilable
kylesayrs Oct 16, 2024
ad9f5a8
compilable
kylesayrs Oct 16, 2024
e4ee0af
wip
kylesayrs Oct 16, 2024
d9ba539
add example
kylesayrs Oct 16, 2024
83a5762
wip
kylesayrs Oct 16, 2024
7f49ab4
runnable
kylesayrs Oct 16, 2024
ac0d926
batching
kylesayrs Oct 21, 2024
6304973
calibration forward context
kylesayrs Oct 21, 2024
868a480
fix stuff
kylesayrs Oct 21, 2024
86c8a06
wip
kylesayrs Oct 21, 2024
1305173
use hooks list
kylesayrs Oct 21, 2024
e6adc5a
layer compressor
kylesayrs Oct 22, 2024
f65f832
style
kylesayrs Oct 22, 2024
1e22569
use layer compressor
kylesayrs Oct 22, 2024
9324695
replicate dtypes
kylesayrs Oct 22, 2024
eef4fb6
write weight changes
kylesayrs Oct 22, 2024
485813a
revert example
kylesayrs Oct 22, 2024
6006155
organization
kylesayrs Oct 22, 2024
c10d2ee
add create_single_batch_dataloader
kylesayrs Oct 22, 2024
6371193
add back empty_cache until I can justify removing it
kylesayrs Oct 22, 2024
92315a5
better type hinting, faster mask applying
kylesayrs Oct 22, 2024
8903fbf
Merge remote-tracking branch 'origin' into kylesayrs/gptq-hooks
kylesayrs Oct 22, 2024
8a25c68
remove breakpoint
kylesayrs Oct 22, 2024
6cd0d6c
apply style, add true_sequential docstring
kylesayrs Oct 22, 2024
0e0c586
update docstring
kylesayrs Oct 22, 2024
d23aabb
use private attrs
kylesayrs Oct 22, 2024
355074b
more docstring
kylesayrs Oct 23, 2024
bf2184d
docstrings
kylesayrs Oct 23, 2024
0b418c7
docstrings
kylesayrs Oct 23, 2024
56cceea
docstrings
kylesayrs Oct 23, 2024
7c7e3bc
move hooksmixin to separate file
kylesayrs Oct 23, 2024
2d52183
docstrings
kylesayrs Oct 23, 2024
d6ff46a
Merge branch 'main' into kylesayrs/gptq-hooks
kylesayrs Oct 23, 2024
9081f12
fix docstring, better arguments grouping
kylesayrs Oct 23, 2024
96e9496
use LayerCompressorMixin
kylesayrs Oct 24, 2024
7fbf8b1
docstrings
kylesayrs Oct 24, 2024
3d3af2a
add back hessian hook to support bs1
kylesayrs Oct 24, 2024
b3021ab
wip
kylesayrs Oct 25, 2024
8508b63
accumulate
kylesayrs Oct 25, 2024
3ff271d
virtualize batches for layers
kylesayrs Oct 25, 2024
d6c6dc3
maybe works, but padding is wrong
kylesayrs Oct 25, 2024
c4d2dde
revert weird batching, support image text datasets
kylesayrs Oct 29, 2024
670b35e
remove breakpoint
kylesayrs Oct 29, 2024
3892b90
add example script
kylesayrs Oct 29, 2024
2beb59a
remove tokenizer args
kylesayrs Nov 5, 2024
4a336fe
fix shapes
kylesayrs Nov 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions examples/quantization_w4a16/llama3_example.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
import torch
from datasets import load_dataset
from transformers import AutoTokenizer

from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot

# Select model and load it.
MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
#MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
#MODEL_ID = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
MODEL_ID = "meta-llama/Llama-3.2-1B-Instruct"

model = SparseAutoModelForCausalLM.from_pretrained(
MODEL_ID,
Expand All @@ -20,8 +23,8 @@

# Select number of samples. 512 samples is a good place to start.
# Increasing the number of samples can improve accuracy.
NUM_CALIBRATION_SAMPLES = 512
MAX_SEQUENCE_LENGTH = 2048
NUM_CALIBRATION_SAMPLES = 512 // 6
MAX_SEQUENCE_LENGTH = 2048 // 2

# Load dataset and preprocess.
ds = load_dataset(DATASET_ID, split=DATASET_SPLIT)
Expand All @@ -41,10 +44,11 @@ def preprocess(example):


# Tokenize inputs.
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
def tokenize(sample):
return tokenizer(
sample["text"],
padding=False,
padding=True,
max_length=MAX_SEQUENCE_LENGTH,
truncation=True,
add_special_tokens=False,
Expand All @@ -55,7 +59,7 @@ def tokenize(sample):

# Configure the quantization algorithm to run.
# * quantize the weights to 4 bit with GPTQ with a group size 128
recipe = GPTQModifier(targets="Linear", scheme="W4A16", ignore=["lm_head"])
recipe = GPTQModifier(targets="Linear", scheme="W4A16", ignore=["lm_head"], percdamp=0.01)

# Apply algorithms.
oneshot(
Expand Down
Loading