Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VLM Support via GPTQ Hooks and Data Pipelines #914

Merged
merged 345 commits into from
Jan 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
345 commits
Select commit Hold shift + click to select a range
fc2488f
update hooks in tests
kylesayrs Nov 15, 2024
d0dc807
integrate with wanda
kylesayrs Nov 15, 2024
55f69d6
integrate with magnitude and constant
kylesayrs Nov 15, 2024
59ffe44
integrate with SparseGPTModifier
kylesayrs Nov 15, 2024
21fe61b
add hooksmixin to modifier
kylesayrs Nov 15, 2024
ba01137
Merge remote-tracking branch 'origin' into kylesayrs/HooksMixin
kylesayrs Nov 15, 2024
3771a89
Merge remote-tracking branch 'origin' into kylesayrs/HooksMixin
kylesayrs Nov 18, 2024
ccc5458
Merge branch 'kylesayrs/HooksMixin' into kylesayrs/gptq-partition
kylesayrs Nov 19, 2024
a5635a1
merge
kylesayrs Nov 19, 2024
83ed409
small updates
kylesayrs Nov 19, 2024
7fd142b
Merge branch 'main' into kylesayrs/HooksMixin
kylesayrs Nov 19, 2024
d104282
WIP
kylesayrs Nov 20, 2024
236a47a
WIP
kylesayrs Nov 21, 2024
188896e
able to run without hooks
kylesayrs Nov 21, 2024
8ef9c23
issue with different sizes
kylesayrs Nov 21, 2024
1362ca2
able to run through pixtral without issue and using real proxy tensor…
kylesayrs Nov 21, 2024
0539df7
nits
kylesayrs Nov 25, 2024
a734393
Merge remote-tracking branch 'origin' into kylesayrs/HooksMixin
kylesayrs Nov 25, 2024
ea10aed
Merge branch 'kylesayrs/HooksMixin' into kylesayrs/gptq-partition
kylesayrs Nov 25, 2024
ed96ee4
fix all variable
kylesayrs Nov 25, 2024
5f26711
tmp
kylesayrs Nov 25, 2024
ebc2c41
wip
kylesayrs Nov 26, 2024
922b407
wip
kylesayrs Nov 26, 2024
0577f36
testing with lots of models
kylesayrs Nov 26, 2024
3830696
preliminary data pipeline
kylesayrs Nov 26, 2024
1ecaa39
WIP
kylesayrs Nov 26, 2024
9aa9679
delete unnecessary files
kylesayrs Nov 26, 2024
7e6fe17
Merge remote-tracking branch 'origin' into kylesayrs/gptq-partition
kylesayrs Nov 26, 2024
034c0b1
Merge branch 'kylesayrs/gptq-hooks' into kylesayrs/gptq-partition
kylesayrs Nov 26, 2024
a62617c
clean up CustomDataset
kylesayrs Nov 28, 2024
57b5e02
chchchchanges
kylesayrs Nov 29, 2024
fa317fd
wip: use rename to processor, going through tests
kylesayrs Dec 2, 2024
f3f5875
remove labels from calibration dataset rather than assuming that all …
kylesayrs Dec 2, 2024
58c3afe
cleanup
kylesayrs Dec 2, 2024
72aecfc
cleanup, etc
kylesayrs Dec 2, 2024
77217fb
Merge remote-tracking branch 'origin' into kylesayrs/cleanup-custom-d…
kylesayrs Dec 2, 2024
4461a3e
fix typehinting
kylesayrs Dec 2, 2024
fb33001
add typechecking imports
kylesayrs Dec 2, 2024
bf4744a
remove sparseml utilities
kylesayrs Dec 3, 2024
62ae31d
Merge branch 'kylesayrs/remove-sparseml-utilities' into kylesayrs/cle…
kylesayrs Dec 3, 2024
7e516c1
use in model_load
kylesayrs Dec 3, 2024
d69106e
Merge branch 'main' into kylesayrs/calculate_offload_default_gpus
kylesayrs Dec 3, 2024
9e33641
remove use of RECIPE FILE NAME
kylesayrs Dec 3, 2024
58c0fba
rename to RECIPE_FILE_NAME, avoid circular import
kylesayrs Dec 3, 2024
b28aaae
Merge branch 'kylesayrs/remove-sparseml-utilities' into kylesayrs/cle…
kylesayrs Dec 3, 2024
8d13013
image dataset collation
kylesayrs Dec 3, 2024
17cf9f3
Merge branch 'kylesayrs/cleanup-custom-dataset' into kylesayrs/gptq-p…
kylesayrs Dec 3, 2024
163ee8f
cleanup, do not handle case where processor is None
kylesayrs Dec 3, 2024
1180b34
remove qa ignore
kylesayrs Dec 3, 2024
ad20ae7
Merge branch 'kylesayrs/remove-sparseml-utilities' into kylesayrs/cle…
kylesayrs Dec 3, 2024
c431958
add documentation
kylesayrs Dec 3, 2024
b48d55d
add data collator arg
kylesayrs Dec 3, 2024
2d201e0
Merge branch 'kylesayrs/cleanup-custom-dataset' into kylesayrs/gptq-p…
kylesayrs Dec 3, 2024
0ed5c2c
use default factor
kylesayrs Dec 3, 2024
ca61e90
Merge branch 'kylesayrs/cleanup-custom-dataset' into kylesayrs/gptq-p…
kylesayrs Dec 3, 2024
41dd463
wip mllama
kylesayrs Dec 4, 2024
8527e0e
cleanup
kylesayrs Dec 4, 2024
0a8a03f
merge-implement hessian offloading
kylesayrs Dec 4, 2024
fc044e2
better concrete arg handling
kylesayrs Dec 4, 2024
4576712
validate flickr
kylesayrs Dec 4, 2024
5276c58
discover bug, tests and multimodal working
kylesayrs Dec 4, 2024
dffcbc3
dataset split fallbacks
kylesayrs Dec 4, 2024
b3cb229
Merge branch 'kylesayrs/cleanup-custom-dataset' into kylesayrs/gptq-p…
kylesayrs Dec 4, 2024
779c9a2
Merge branch 'kylesayrs/dataset-split-fallbacks' into kylesayrs/clean…
kylesayrs Dec 4, 2024
85e3f59
Merge branch 'kylesayrs/cleanup-custom-dataset' into kylesayrs/gptq-p…
kylesayrs Dec 4, 2024
e9f150d
move typing
kylesayrs Dec 4, 2024
d061567
cleanup, depreciate remove_columns argument
kylesayrs Dec 4, 2024
55a31ca
silently assign tokenizer to processor
kylesayrs Dec 5, 2024
c14e40e
Merge branch 'kylesayrs/cleanup-custom-dataset' into kylesayrs/gptq-p…
kylesayrs Dec 5, 2024
1aba16d
replace tokenizer with processor
kylesayrs Dec 5, 2024
135e459
Merge branch 'kylesayrs/processor-replaces-tokenizer' into kylesayrs/…
kylesayrs Dec 5, 2024
dde2fa7
Merge branch 'kylesayrs/cleanup-custom-dataset' into kylesayrs/gptq-p…
kylesayrs Dec 5, 2024
89bda30
defer data collator changes
kylesayrs Dec 5, 2024
0fa4102
reduce warnings
kylesayrs Dec 5, 2024
bc505bf
typehinting, add not-implemented error
kylesayrs Dec 5, 2024
c91ba77
remove todos
kylesayrs Dec 5, 2024
e916936
Delete mllama.py
kylesayrs Dec 5, 2024
0a573a1
update dataset manager api in tests
kylesayrs Dec 5, 2024
853c0a8
typehinting, add not-implemented error
kylesayrs Dec 5, 2024
234ef79
remove todos
kylesayrs Dec 5, 2024
8972dd5
update dataset manager api in tests
kylesayrs Dec 5, 2024
acb1a18
Delete examples/multimodal_vision/qwen_vl2.py
kylesayrs Dec 5, 2024
56b5d12
Delete examples/multimodal_vision/mllama.py
kylesayrs Dec 5, 2024
57c293e
WIP: add pixtral
kylesayrs Dec 5, 2024
537c5ab
pixtral working
kylesayrs Dec 5, 2024
15b3508
move to data pipeline
kylesayrs Dec 6, 2024
42b5fc0
disable_hf_hook context
kylesayrs Dec 6, 2024
bc33e8e
woof
kylesayrs Dec 6, 2024
ca72bbb
change desc
kylesayrs Dec 6, 2024
293640a
fix docstring
kylesayrs Dec 6, 2024
17b3a70
rely on compressed tensors, support offloading
kylesayrs Dec 6, 2024
5e185f2
sequential targets
kylesayrs Dec 6, 2024
4d82180
support match_layers_params
kylesayrs Dec 6, 2024
6a1b2c2
make _update_size private and inferred
kylesayrs Dec 6, 2024
f9ab6fc
make a module
kylesayrs Dec 6, 2024
0dc74dd
fallback
kylesayrs Dec 6, 2024
9e07188
implement basic pipeline
kylesayrs Dec 6, 2024
ed099ef
balance between gpus
kylesayrs Dec 6, 2024
4bbbc49
add proper ignore list
kylesayrs Dec 6, 2024
ae74f45
treat offloaded modules as leaves, treat ignore as sequential target
kylesayrs Dec 7, 2024
31eeb8c
redisable piecewise for vision datasets
kylesayrs Dec 7, 2024
1b24090
implement pipeline fallback
kylesayrs Dec 9, 2024
d97ef2b
Merge remote-tracking branch 'origin' into kylesayrs/processor-replac…
kylesayrs Dec 9, 2024
e87e019
remove subbatch event
kylesayrs Dec 9, 2024
d5c08fb
input device inference
kylesayrs Dec 9, 2024
39ed8ca
do not disable hf hook during tracing
kylesayrs Dec 9, 2024
47ca742
Merge remote-tracking branch 'origin' into kylesayrs/gptq-partition
kylesayrs Dec 9, 2024
c1f5cb2
Merge remote-tracking branch 'origin' into kylesayrs/cleanup-custom-d…
kylesayrs Dec 9, 2024
4711e9f
remove import
kylesayrs Dec 9, 2024
e468197
use find_nodes
kylesayrs Dec 9, 2024
f8591ca
rename piecewise to sequential
kylesayrs Dec 9, 2024
cea02d2
add docstring
kylesayrs Dec 9, 2024
f1f6c0f
begin sequential pipeline testing
kylesayrs Dec 9, 2024
3b0b49f
remove todos, add tests for sequential pipeline
kylesayrs Dec 10, 2024
2c035b3
move function placement
kylesayrs Dec 10, 2024
b93868d
slight partition algorithm change
kylesayrs Dec 10, 2024
146e4be
revert llama3 example
kylesayrs Dec 10, 2024
0e4d8f3
Merge branch 'main' into kylesayrs/dataset-split-fallbacks
kylesayrs Dec 10, 2024
b8e867d
Merge branch 'main' into kylesayrs/processor-replaces-tokenizer
kylesayrs Dec 10, 2024
ccb007f
remove test, fix default in order to fix tests
kylesayrs Dec 10, 2024
e1055b0
bump memory requirements
kylesayrs Dec 11, 2024
70421ed
fix memory and offloading issues
kylesayrs Dec 12, 2024
b102bf5
add missing cache file
kylesayrs Dec 12, 2024
229d3ae
make mllama tracable
kylesayrs Dec 12, 2024
4e0b118
write using comprehesion
kylesayrs Dec 12, 2024
7dc4d2a
fix hessian requirements
kylesayrs Dec 12, 2024
377b2a4
implement offloading for tuple
kylesayrs Dec 12, 2024
adb1627
add save
kylesayrs Dec 12, 2024
ab3fc81
change num samples
kylesayrs Dec 12, 2024
1bf683e
implement intermediates offloading for dataclasses
kylesayrs Dec 12, 2024
8918917
Merge branch 'main' into kylesayrs/processor-replaces-tokenizer
kylesayrs Dec 12, 2024
b75fe15
wrap ignore but do not treat as sequential target
kylesayrs Dec 13, 2024
aa4a23d
tracable pixtral/mistral
kylesayrs Dec 13, 2024
aa532b5
remove double saving
kylesayrs Dec 13, 2024
19e4f97
revert dampening frac
kylesayrs Dec 13, 2024
f95b77f
do not cache model outputs to save memory
kylesayrs Dec 13, 2024
2d890db
fix dataclass case, add tests
kylesayrs Dec 13, 2024
7e69b9d
Merge remote-tracking branch 'origin' into kylesayrs/gptq-partition
kylesayrs Dec 13, 2024
4a22032
Remove docstring
kylesayrs Dec 13, 2024
8d72269
Merge branch 'main' into kylesayrs/processor-replaces-tokenizer
kylesayrs Dec 14, 2024
a71352a
move IntermediatesCache location
kylesayrs Dec 14, 2024
2d249a2
add fake_sequential
kylesayrs Dec 14, 2024
995cb2d
rename fake_sequential to layer_sequential
kylesayrs Dec 14, 2024
e4bca34
pipeline inference
kylesayrs Dec 14, 2024
4a046a5
update docstrings
kylesayrs Dec 14, 2024
f24a2af
fix last layer bug
kylesayrs Dec 14, 2024
691bac4
better inference
kylesayrs Dec 14, 2024
1e15d3e
even better inference
kylesayrs Dec 14, 2024
a4744d9
do now throw warning for calibration with training
kylesayrs Dec 16, 2024
9617e53
add information about how to silence warning
kylesayrs Dec 16, 2024
3b4cac1
nice
kylesayrs Dec 16, 2024
f53a3dd
remove unnecessary warning silencing
kylesayrs Dec 16, 2024
f45d0fa
Merge branch 'kylesayrs/processor-replaces-tokenizer', remote-trackin…
kylesayrs Dec 16, 2024
70a2811
Merge branch 'kylesayrs/dataset-split-fallbacks' into kylesayrs/gptq-…
kylesayrs Dec 16, 2024
fd151e4
add unmerged thing
kylesayrs Dec 16, 2024
d1d42de
fix deleted columns
kylesayrs Dec 16, 2024
92151a1
handle dataset dict case
kylesayrs Dec 17, 2024
4c049db
support torch.nn.Conv2d, silently ignore embeddings
kylesayrs Dec 17, 2024
7667998
handle columns better
kylesayrs Dec 17, 2024
f0eb640
fix tokenizer args
kylesayrs Dec 18, 2024
af86f45
filter_tokenizer_args
kylesayrs Dec 18, 2024
5567a90
Merge remote-tracking branch 'origin' into kylesayrs/gptq-partition
kylesayrs Dec 18, 2024
0438e17
Merge remote-tracking branch 'origin' into kylesayrs/cleanup-custom-d…
kylesayrs Dec 18, 2024
9b61145
update docstring
kylesayrs Dec 18, 2024
2f65d01
remove unused util
kylesayrs Dec 18, 2024
338d1cb
remove debug
kylesayrs Dec 18, 2024
f4fa9c3
more tests
kylesayrs Dec 18, 2024
6bd1721
Merge remote-tracking branch 'origin' into kylesayrs/cleanup-custom-d…
kylesayrs Dec 18, 2024
e757e61
remove duplicate file
kylesayrs Dec 18, 2024
bdfa3d4
better help texts
kylesayrs Dec 18, 2024
cd9dd21
Merge branch 'kylesayrs/cleanup-custom-dataset' into kylesayrs/gptq-p…
kylesayrs Dec 18, 2024
f674579
Merge branch 'kylesayrs/calculate_offload_default_gpus' into kylesayr…
kylesayrs Dec 18, 2024
f1e1335
remove future notes, todos
kylesayrs Dec 19, 2024
e59c2e7
remove skipping patching
kylesayrs Dec 19, 2024
4932ec5
remove skipping for none args
kylesayrs Dec 19, 2024
6b7c11f
revert data split fallbacks
kylesayrs Dec 19, 2024
601cb0e
rvert data split fallbacks
kylesayrs Dec 19, 2024
4123636
propagate oom errors, separate data collators
kylesayrs Dec 19, 2024
c1e66e8
apply style, ignore visual on qwen
kylesayrs Dec 19, 2024
dc14e95
remove qwen while unsupported
kylesayrs Dec 19, 2024
47249c5
remove smoothquant while unsupported
kylesayrs Dec 19, 2024
de40a84
clean up examples
kylesayrs Dec 19, 2024
56ca97c
Merge remote-tracking branch 'origin' into kylesayrs/gptq-partition
kylesayrs Dec 19, 2024
7f6e8cd
handle non-fast tokenizers
kylesayrs Dec 20, 2024
1c8afe4
handle non-fast tokenizers
kylesayrs Dec 20, 2024
3a9816c
address nits, add logging
kylesayrs Dec 20, 2024
7be0c88
add back copyrights
kylesayrs Dec 20, 2024
bedbf8c
correctly update helptext
kylesayrs Dec 20, 2024
7c54bed
Merge remote-tracking branch 'origin' into kylesayrs/cleanup-custom-d…
kylesayrs Dec 20, 2024
d27dad3
Merge branch 'main' into kylesayrs/cleanup-custom-dataset
dsikka Dec 20, 2024
42f7892
do not remove prompt key
kylesayrs Dec 20, 2024
4139628
add no copyright to hf files
kylesayrs Dec 20, 2024
15fa27d
remove prompt key
kylesayrs Dec 23, 2024
ae16da3
do not process tokenized datasets, including adding labels
kylesayrs Dec 23, 2024
9a08725
Merge branch 'kylesayrs/cleanup-custom-dataset' into kylesayrs/gptq-p…
kylesayrs Dec 23, 2024
1eb7f83
Merge branch 'main' into kylesayrs/cleanup-custom-dataset
dsikka Dec 23, 2024
c3a663a
rename classes so the saved config is the original class
kylesayrs Dec 23, 2024
0d484bf
Merge branch 'main' into kylesayrs/cleanup-custom-dataset
dsikka Dec 23, 2024
ddb6fc3
Merge remote-tracking branch 'origin/kylesayrs/cleanup-custom-dataset…
kylesayrs Dec 23, 2024
e71f4e5
remove default chat template
kylesayrs Dec 23, 2024
966b96b
Merge branch 'kylesayrs/cleanup-custom-dataset' into kylesayrs/gptq-p…
kylesayrs Dec 23, 2024
0195fab
support llava-1.5 via installing metadata
kylesayrs Dec 23, 2024
148e617
account for models which improperly do not override the abstract methods
kylesayrs Dec 27, 2024
5ae2300
Merge branch 'kylesayrs/patch-mal-models' into kylesayrs/gptq-partition
kylesayrs Dec 27, 2024
e5dd582
add ChatGLMForConditionalGeneration
kylesayrs Dec 27, 2024
5303df2
list of unfixable errors
kylesayrs Dec 27, 2024
aa16223
Merge branch 'main' into kylesayrs/gptq-partition
dsikka Dec 28, 2024
5124e24
Merge remote-tracking branch 'origin' into kylesayrs/gptq-partition
kylesayrs Dec 29, 2024
14cbc97
add glm license, style
kylesayrs Dec 29, 2024
4ac9018
Merge branch 'main' into kylesayrs/gptq-partition
dsikka Jan 1, 2025
ff470b3
add suggestion to use offload_hessians
kylesayrs Jan 2, 2025
c1c3eaa
update names and comments
kylesayrs Jan 2, 2025
e5af728
change tqdm description, add comment
kylesayrs Jan 2, 2025
8fd93a7
add no vllm copyright to glm
kylesayrs Jan 2, 2025
8e5f693
update comments, remove unnecessary default values
kylesayrs Jan 2, 2025
0499bb1
Merge branch 'main' into kylesayrs/gptq-partition
dsikka Jan 2, 2025
7ba6f60
rename examples to have _example suffix
kylesayrs Jan 3, 2025
435cf0d
update all list
kylesayrs Jan 3, 2025
0d25307
update examples to use w4a16
kylesayrs Jan 3, 2025
9abdea8
llava: clarify changes, undo style changes
kylesayrs Jan 3, 2025
3dca7b3
glm comments, fix isort
kylesayrs Jan 3, 2025
f416674
correct typo 'tracable'
kylesayrs Jan 3, 2025
71faee7
mllama: remove unnecessary definitions
kylesayrs Jan 3, 2025
557467b
add keyboard interrupts to list of unfixable errors
kylesayrs Jan 3, 2025
e158b9b
mistral: remove unnecessary definitions
kylesayrs Jan 3, 2025
dfadc11
remove propagate_error argument
kylesayrs Jan 3, 2025
d146771
pipeline docstrings
kylesayrs Jan 4, 2025
bb77a44
add gptq lifecycle docstring
kylesayrs Jan 4, 2025
14f5d88
layer sequential helpers docstrings
kylesayrs Jan 4, 2025
fde309a
update comments
kylesayrs Jan 4, 2025
e6a8fa8
sequential helpers docstrings
kylesayrs Jan 4, 2025
954cd4e
more docstrings
kylesayrs Jan 4, 2025
00309e9
IntermediatesCache docstrings
kylesayrs Jan 4, 2025
57e8f21
free hessians on finalize
kylesayrs Jan 4, 2025
378afb3
remove unnecessary examples
kylesayrs Jan 4, 2025
83b81be
make diff closer to original implementation
kylesayrs Jan 4, 2025
b6c0a50
Merge branch 'main' into kylesayrs/gptq-partition
kylesayrs Jan 4, 2025
5363d40
use original mask padding function
kylesayrs Jan 4, 2025
ae89688
reduce diff
kylesayrs Jan 4, 2025
d3eebfe
replace list comprehesion
kylesayrs Jan 6, 2025
412086c
nit: only pass first layer
kylesayrs Jan 6, 2025
8433304
revert changes to tensors_to_device
kylesayrs Jan 6, 2025
07b3cc3
type hint intermediates cache for clarity
kylesayrs Jan 6, 2025
895b409
make hessian instability a _LinAlgError so it can be caught by gptq f…
kylesayrs Jan 6, 2025
18fe751
Merge remote-tracking branch 'origin' into kylesayrs/gptq-partition
kylesayrs Jan 6, 2025
336e064
defer chatglm for later
kylesayrs Jan 6, 2025
f6312d0
docstrings, reorder pipeline args
kylesayrs Jan 7, 2025
153a4fa
correct typos
kylesayrs Jan 7, 2025
3f9dd7d
code clarity
kylesayrs Jan 7, 2025
84db1e0
Merge branch 'main' into kylesayrs/gptq-partition
dsikka Jan 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions examples/multimodal_vision/llava_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
from transformers import AutoProcessor

from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor.transformers import oneshot
from llmcompressor.transformers.tracing import TraceableLlavaForConditionalGeneration
from llmcompressor.transformers.utils.data_collator import llava_data_collator

# Load model.
model_id = "llava-hf/llava-1.5-7b-hf"
model = TraceableLlavaForConditionalGeneration.from_pretrained(
model_id, device_map="auto", torch_dtype="auto"
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

# Oneshot arguments
DATASET_ID = "flickr30k"
DATASET_SPLIT = {"calibration": "test[:512]"}
NUM_CALIBRATION_SAMPLES = 512
MAX_SEQUENCE_LENGTH = 2048

# Recipe
recipe = [
GPTQModifier(
targets="Linear",
scheme="W4A16",
ignore=["re:.*lm_head", "re:vision_tower.*", "re:multi_modal_projector.*"],
sequential_targets=["LlamaDecoderLayer"],
),
]

# Perform oneshot
oneshot(
model=model,
tokenizer=model_id,
dataset=DATASET_ID,
splits=DATASET_SPLIT,
recipe=recipe,
max_seq_length=MAX_SEQUENCE_LENGTH,
num_calibration_samples=NUM_CALIBRATION_SAMPLES,
trust_remote_code_model=True,
data_collator=llava_data_collator,
)

# Confirm generations of the quantized model look sane.
print("========== SAMPLE GENERATION ==============")
input_ids = processor(text="Hello my name is", return_tensors="pt").input_ids.to("cuda")
output = model.generate(input_ids, max_new_tokens=20)
print(processor.decode(output[0]))
print("==========================================")

# Save to disk compressed.
SAVE_DIR = model_id.split("/")[1] + "-W4A16-G128"
model.save_pretrained(SAVE_DIR, save_compressed=True)
processor.save_pretrained(SAVE_DIR)
53 changes: 53 additions & 0 deletions examples/multimodal_vision/mllama_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
from transformers import AutoProcessor

from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor.transformers import oneshot
from llmcompressor.transformers.tracing import TraceableMllamaForConditionalGeneration
from llmcompressor.transformers.utils.data_collator import mllama_data_collator

# Load model.
model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
model = TraceableMllamaForConditionalGeneration.from_pretrained(
model_id, device_map="auto", torch_dtype="auto"
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

# Oneshot arguments
DATASET_ID = "flickr30k"
DATASET_SPLIT = {"calibration": "test[:512]"}
NUM_CALIBRATION_SAMPLES = 512
MAX_SEQUENCE_LENGTH = 2048

# Recipe
recipe = [
GPTQModifier(
targets="Linear",
scheme="W4A16",
ignore=["re:.*lm_head", "re:multi_modal_projector.*", "re:vision_model.*"],
),
]

# Perform oneshot
oneshot(
model=model,
tokenizer=model_id,
dataset=DATASET_ID,
splits=DATASET_SPLIT,
recipe=recipe,
max_seq_length=MAX_SEQUENCE_LENGTH,
num_calibration_samples=NUM_CALIBRATION_SAMPLES,
trust_remote_code_model=True,
data_collator=mllama_data_collator,
)

# Confirm generations of the quantized model look sane.
print("========== SAMPLE GENERATION ==============")
input_ids = processor(text="Hello my name is", return_tensors="pt").input_ids.to("cuda")
output = model.generate(input_ids, max_new_tokens=20)
print(processor.decode(output[0]))
print("==========================================")

# Save to disk compressed.
SAVE_DIR = model_id.split("/")[1] + "-W4A16-G128"
model.save_pretrained(SAVE_DIR, save_compressed=True)
processor.save_pretrained(SAVE_DIR)
54 changes: 54 additions & 0 deletions examples/multimodal_vision/pixtral_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
from transformers import AutoProcessor

from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor.transformers import oneshot
from llmcompressor.transformers.tracing import TraceableLlavaForConditionalGeneration
from llmcompressor.transformers.utils.data_collator import pixtral_data_collator

# Load model.
model_id = "mgoin/pixtral-12b"
model = TraceableLlavaForConditionalGeneration.from_pretrained(
model_id, device_map="auto", torch_dtype="auto"
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

# Oneshot arguments
DATASET_ID = "flickr30k"
DATASET_SPLIT = {"calibration": "test[:512]"}
NUM_CALIBRATION_SAMPLES = 512
MAX_SEQUENCE_LENGTH = 2048

# Recipe
recipe = [
GPTQModifier(
targets="Linear",
scheme="W4A16",
ignore=["re:.*lm_head", "re:vision_tower.*", "re:multi_modal_projector.*"],
sequential_targets=["MistralDecoderLayer"],
),
]

# Perform oneshot
oneshot(
model=model,
tokenizer=model_id,
dataset=DATASET_ID,
splits=DATASET_SPLIT,
recipe=recipe,
max_seq_length=MAX_SEQUENCE_LENGTH,
num_calibration_samples=NUM_CALIBRATION_SAMPLES,
trust_remote_code_model=True,
data_collator=pixtral_data_collator,
)

# Confirm generations of the quantized model look sane.
print("========== SAMPLE GENERATION ==============")
input_ids = processor(text="Hello my name is", return_tensors="pt").input_ids.to("cuda")
dsikka marked this conversation as resolved.
Show resolved Hide resolved
output = model.generate(input_ids, max_new_tokens=20)
print(processor.decode(output[0]))
print("==========================================")

# Save to disk compressed.
SAVE_DIR = model_id.split("/")[1] + "-W4A16-G128"
model.save_pretrained(SAVE_DIR, save_compressed=True)
processor.save_pretrained(SAVE_DIR)
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,13 @@ target-version = ['py38']

[tool.isort]
profile = "black"
skip = ["src/llmcompressor/transformers/tracing/"]

[tool.mypy]
files = "src/guidellm"

[tool.ruff]
exclude = ["build", "dist", "env", ".venv"]
exclude = ["build", "dist", "env", ".venv", "src/llmcompressor/transformers/tracing/"]
lint.select = ["E", "F", "W"]

[tool.flake8]
Expand Down
Loading
Loading