" RuntimeError: User specified autocast device_type must be 'cuda' or 'cpu' " when training. #1567

AndItDontFade · 2023-10-02T22:11:16Z

AndItDontFade
Oct 2, 2023

When I attempt to train my Lora model (Start training) after setting all parameters, it begins to work for a minute and then gives me this error

RuntimeError: User specified autocast device_type must be 'cuda' or 'cpu'

I'm using an M1 Macbook pro, 2021, choosing Stable diffusion v1-5-pruned.safetensors as my model.

I'm using 17 images for the training and all images have been found

This is the traceback:

Traceback (most recent call last): File "/Users/I585070/Documents/Kohya_ss/kohya_ss/venv/bin/accelerate", line 8, in <module> sys.exit(main()) File "/Users/I585070/Documents/Kohya_ss/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main args.func(args) File "/Users/I585070/Documents/Kohya_ss/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 986, in launch_command simple_launcher(args) File "/Users/I585070/Documents/Kohya_ss/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/Users/I585070/Documents/Kohya_ss/kohya_ss/venv/bin/python', './train_db.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=/Users/I585070/Documents/STABLE_DIFFUSION_WEBUI/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned.safetensors', '--train_data_dir=/Users/I585070/Downloads/Lora_Training/images', '--resolution=768,768', '--output_dir=/Users/I585070/Downloads/Lora_Training/model', '--logging_dir=/Users/I585070/Downloads/Lora_Training/log', '--save_model_as=safetensors', '--output_name=TShirts', '--lr_scheduler_num_cycles=10', '--max_data_loader_n_workers=0', '--learning_rate=1e-05', '--lr_scheduler=cosine', '--lr_warmup_steps=289', '--train_batch_size=1', '--max_train_steps=2890', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.

AndItDontFade · 2023-10-04T19:13:14Z

AndItDontFade
Oct 4, 2023
Author

Anyone able to help me on this?

1 reply

mariokreutzfeldt Nov 4, 2023

Same issue here. M1 pro Max 64GB.
Might be related to Allow float dtype when Autocast CPU Disabled
or this FP16 on MPS devices

mariokreutzfeldt · 2023-11-04T15:47:59Z

mariokreutzfeldt
Nov 4, 2023

Hi,

I have:

removed torch 2.0.0 reference from the requirements_macos_arm64.txt
installed torch 2.1.0 manually in the venv
changed onnxruntime-gpu to onnxruntime in the requirements.txt

Launch command:

python ./train_network.py --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="/image" --resolution="512,512" --output_dir="/model" --logging_dir="log" --save_model_as="ckpt" --network_module="networks.lora" --text_encoder_lr="5e-05" --unet_lr="0.0001" --network_dim="8" --output_name="cmdj_LoRA" --lr_scheduler_num_cycles="100" --no_half_vae --learning_rate="0.0001" --lr_scheduler="cosine" --lr_warmup_steps="2000" --train_batch_size="5" --max_train_steps="1000" --save_every_n_epochs=100 --mixed_precision="no" --save_precision="float" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="0" --bucket_reso_steps="64" --bucket_no_upscale --noise_offset="0.0"

now i get:

`
epoch 1/1

Traceback (most recent call last):
File "/kohya_ss-master/./train_network.py", line 1009, in
trainer.train(args)
File "/kohya_ss-master/./train_network.py", line 822, in train
optimizer.step()
File "/kohya_ss-master/venv/lib/python3.10/site-packages/accelerate/optimizer.py", line 145, in step
self.optimizer.step(closure)
File "/kohya_ss-master/venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(*args, **kwargs)
File "/kohya_ss-master/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, **kwargs)
File "/kohya_ss-master/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/kohya_ss-master/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 269, in step
self.update_step(group, p, gindex, pindex)
File "/kohya_ss-master/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/kohya_ss-master/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 517, in update_step
F.optimizer_update_8bit_blockwise(
File "/kohya_ss-master/venv/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1290, in optimizer_update_8bit_blockwise
prev_device = pre_call(g.device)
File "/kohya_ss-master/venv/lib/python3.10/site-packages/bitsandbytes/functional.py", line 427, in pre_call
prev_device = torch.cuda.current_device()
File "/kohya_ss-master/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 769, in current_device
_lazy_init()
File "/kohya_ss-master/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 289, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

`

This can probably be fixed with enabling the MPS backend of pytorch

IMPORTANT NOTE: If --optimizer_type="", the training starts!

But this is above my capabilities ;)

Cheers

2 replies

mariokreutzfeldt Nov 4, 2023

Quick update. This command starts the training:

with --optimizer_type="Adafactor" the training starts.

mariokreutzfeldt Nov 4, 2023

somehow the training loss stays around 0.14. doesn't matter if max_train_steps="100" or "1000".
The LoRA file is successfully created, but the style is not visible, as suspected with this loss.

minhdtb · 2024-07-10T05:11:20Z

minhdtb
Jul 10, 2024

Same here, I'm using Mac M3 . Anyone know how to fix?

0 replies

noxiouscardiumdimidium · 2024-07-15T03:19:29Z

noxiouscardiumdimidium
Jul 15, 2024

your using a device that wants/must run in full precision FP32 autocast. (like my tesla m40) enable "memory efficient attention" option, disable xformers attention option (set "cross attention" to none). make sure save_precision/mixed_precision are FP16 or float. (no bf16 or 8-bit optimizer, except adamw8bit scheduler optimizer) enable the "full fp16 training (experimental)" option. autocast can downcast to 16-bit as long as everything else is set right, which should allow adamw8bit to run on cpu without throwing that error, or causing disabling autocast-cpu from throwing the error from FP32 and adamw8bit trying to run on the same device... mps is trying to emulate nvidia autocast, and this works for older nvidia cards that utilize autocast, so it's worth a try

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

" RuntimeError: User specified autocast device_type must be 'cuda' or 'cpu' " when training. #1567

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

" RuntimeError: User specified autocast device_type must be 'cuda' or 'cpu' " when training. #1567

AndItDontFade Oct 2, 2023

Replies: 4 comments · 3 replies

AndItDontFade Oct 4, 2023 Author

mariokreutzfeldt Nov 4, 2023

mariokreutzfeldt Nov 4, 2023

mariokreutzfeldt Nov 4, 2023

mariokreutzfeldt Nov 4, 2023

minhdtb Jul 10, 2024

noxiouscardiumdimidium Jul 15, 2024

AndItDontFade
Oct 2, 2023

Replies: 4 comments 3 replies

AndItDontFade
Oct 4, 2023
Author

mariokreutzfeldt
Nov 4, 2023

minhdtb
Jul 10, 2024

noxiouscardiumdimidium
Jul 15, 2024