Skip to content

Commit

Permalink
.
Browse files Browse the repository at this point in the history
  • Loading branch information
patemotter committed Jan 15, 2025
1 parent a604b93 commit f63fb8a
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions MaxText/benchmarks/decode_paged_attention-7b.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,17 @@ python MaxText/decode.py \
MaxText/configs/base.yml \
model_name=llama2-7b \
tokenizer_path=assets/tokenizer.llama2 \
per_device_batch_size=8 \
max_prefill_predict_length=512 \
max_target_length=1024 \
per_device_batch_size=1 \
max_prefill_predict_length=1024 \
max_target_length=2048 \
weight_dtype=bfloat16 \
ici_fsdp_parallelism=1 \
ici_tensor_parallelism=-1 \
scan_layers=false \
load_parameters_path=gs://patemotter/checkpoints/quant_llama2-7b-chat/20240906200810/intmp_mp_scale \
load_parameters_path=gs://msingh-bkt/checkpoints/quant_llama2-7b-chat/20241120034012/int8_ \
quantization=int8 \
checkpoint_is_quantized=true \
attention=dot_product \
attention=paged \
num_pages=64 \
page_size=32 \
block_size=256

0 comments on commit f63fb8a

Please sign in to comment.