Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading checkpoint of saved intervened model takes long time #45

Open
jeffreyzhanghc opened this issue Nov 22, 2024 · 5 comments
Open

Comments

@jeffreyzhanghc
Copy link

Hi, thanks for the great work, I am trying to replicate the outcome on different dataset, but when I intervene llama3.1 8B
image
it has following warning, and the loading time is about 20min, I want to ask is that long checkpoint loading time normal?

@jujipotle
Copy link
Contributor

@jeffreyzhanghc Hi, I worked on updating this codebase and can help you. May I ask what GPU setup you are using to intervene on llama3.1 8b? I used 1 H100 and loaded the model in 1 minute, so perhaps it's a GPU size issue.

@jeffreyzhanghc
Copy link
Author

Hi, thanks for helping, yes i m using 2 A100 80GB, and it can takes up to 30mins or even more, does the case of long prompt also affect the intervention time

@jujipotle
Copy link
Contributor

Hmm, that's odd. Are you using using both devices, e.g. when running CUDA_VISIBLE_DEVICES=0 python validate_2fold.py --model_name llama_3p1_8B --num_heads 48 --alpha 15 --device 0 --num_fold 2 --use_center_of_mass --instruction_prompt default --judge_name <your GPT-judge name> --info_name <your GPT-info name>, do you omit the CUDA_VISIBLE_DEVICES=0?

@jeffreyzhanghc
Copy link
Author

yes i omit the cuda=0

@jujipotle
Copy link
Contributor

Hi Jeffrey,
Sorry about the late reply, I am in the midst of finals right now at my school. Thanks for your patience.

I just wanted to confirm the workflow you are doing:

  1. You run python edit_weight.py --model_name llama3p1_8B_instruct --num_heads 48 --alpha 15 without the issue of a long loading time. The edited model is saved to .../honest_llama/validation/results_dump/edited_models_dump/llama3p1_8B_instruct_seed_42_top_48_heads_alpha_15.
  2. You run python validate_2fold.py --model_name path_to_edited_model --num_heads 1 --alpha 0 ..., and now you see the long loading time issue.

To clarify: the warning you see of Some weights of LlamaForCausalLM were not initialized from the model checkpoint at results_dump/edited_models_dump/llama3p1_8B_instruct_seed_42_top_48_heads_alpha_15 and are newly initialized: is expected and not an issue. It's because the llama models do not have attention biases, so the ITI-introduced biases aren't expected (but it should still work).
The long loading time is unexpected, however. I was able to load the huggingface llama3.1_8b_instruct model with 2 A100s in ~20 seconds, and also load my edited model in about the same time.

Please let me know if my understanding of your issue is correct, and I'll see how else I can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants