Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flag to turn on activation checkpointing on single GPU #835

Open
yaoshiang opened this issue Jan 10, 2025 · 1 comment
Open

Add flag to turn on activation checkpointing on single GPU #835

yaoshiang opened this issue Jan 10, 2025 · 1 comment

Comments

@yaoshiang
Copy link

🚀 The feature, motivation and pitch

This feature would allow developers to fine tune on smaller GPUs and / or larger batch sizes, likely leading to higher MFU.

Currently, activation checkpointing only works with FSDP, not single GPU.

Perhaps this is not a worthwhile feature because no one is going to realistically fine tune on a single GPU anyways, and as a workaround, you could just turn on FSDP on a single GPU to enable activation checkpointing. I verified that this workaround works. nvidia-smi now says python is using about 45GB of ram instead of 77GB without FSDP, and my time per batch increased from 1.09 to 1.65.

After bumping the batch size from 11 to 19 to take advantage of the new memory, my TPS bumped from 20_859 to 23_583 and MFU from 50% to 56%.

TOKENIZERS_PARALLELISM=true \
torchrun --nnodes 1 --nproc_per_node 1 \
    finetuning_wrapper.py \
    --model_name meta-llama/Llama-3.2-1B-Instruct \
    --use_peft \
    --peft_method lora \
    --dataset "custom_dataset" \
    --custom_dataset.file "./memorization_dataset.py" \
    --output_dir ./output \
    --num_epochs 2 \
    --batch_size_training 11 \
    --context_length 2048 \
    --lr 1e-3 \
    --enable_fsdp
"""This is a minimal wrapper so that torchrun has a physical py file to access."""
import fire
import llama_recipes.finetuning


if __name__ == "__main__":
    fire.Fire(llama_recipes.finetuning.main)

Alternatives

No response

Additional context

No response

@Mattral
Copy link

Mattral commented Jan 12, 2025

Interesting feature, looking forward to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants