Loading checkpoint of saved intervened model takes long time #45

jeffreyzhanghc · 2024-11-22T03:33:45Z

Hi, thanks for the great work, I am trying to replicate the outcome on different dataset, but when I intervene llama3.1 8B

it has following warning, and the loading time is about 20min, I want to ask is that long checkpoint loading time normal?

jujipotle · 2024-11-27T01:47:30Z

@jeffreyzhanghc Hi, I worked on updating this codebase and can help you. May I ask what GPU setup you are using to intervene on llama3.1 8b? I used 1 H100 and loaded the model in 1 minute, so perhaps it's a GPU size issue.

jeffreyzhanghc · 2024-11-27T01:59:59Z

Hi, thanks for helping, yes i m using 2 A100 80GB, and it can takes up to 30mins or even more, does the case of long prompt also affect the intervention time

jujipotle · 2024-11-27T05:38:25Z

Hmm, that's odd. Are you using using both devices, e.g. when running CUDA_VISIBLE_DEVICES=0 python validate_2fold.py --model_name llama_3p1_8B --num_heads 48 --alpha 15 --device 0 --num_fold 2 --use_center_of_mass --instruction_prompt default --judge_name <your GPT-judge name> --info_name <your GPT-info name>, do you omit the CUDA_VISIBLE_DEVICES=0?

jeffreyzhanghc · 2024-11-28T22:51:56Z

yes i omit the cuda=0

jujipotle · 2024-12-05T18:41:11Z

Hi Jeffrey,
Sorry about the late reply, I am in the midst of finals right now at my school. Thanks for your patience.

I just wanted to confirm the workflow you are doing:

You run python edit_weight.py --model_name llama3p1_8B_instruct --num_heads 48 --alpha 15 without the issue of a long loading time. The edited model is saved to .../honest_llama/validation/results_dump/edited_models_dump/llama3p1_8B_instruct_seed_42_top_48_heads_alpha_15.
You run python validate_2fold.py --model_name path_to_edited_model --num_heads 1 --alpha 0 ..., and now you see the long loading time issue.

To clarify: the warning you see of Some weights of LlamaForCausalLM were not initialized from the model checkpoint at results_dump/edited_models_dump/llama3p1_8B_instruct_seed_42_top_48_heads_alpha_15 and are newly initialized: is expected and not an issue. It's because the llama models do not have attention biases, so the ITI-introduced biases aren't expected (but it should still work).
The long loading time is unexpected, however. I was able to load the huggingface llama3.1_8b_instruct model with 2 A100s in ~20 seconds, and also load my edited model in about the same time.

Please let me know if my understanding of your issue is correct, and I'll see how else I can help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading checkpoint of saved intervened model takes long time #45

Loading checkpoint of saved intervened model takes long time #45

jeffreyzhanghc commented Nov 22, 2024

jujipotle commented Nov 27, 2024

jeffreyzhanghc commented Nov 27, 2024

jujipotle commented Nov 27, 2024

jeffreyzhanghc commented Nov 28, 2024

jujipotle commented Dec 5, 2024

Loading checkpoint of saved intervened model takes long time #45

Loading checkpoint of saved intervened model takes long time #45

Comments

jeffreyzhanghc commented Nov 22, 2024

jujipotle commented Nov 27, 2024

jeffreyzhanghc commented Nov 27, 2024

jujipotle commented Nov 27, 2024

jeffreyzhanghc commented Nov 28, 2024

jujipotle commented Dec 5, 2024