-
Notifications
You must be signed in to change notification settings - Fork 50
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: include docs on installing deepspeed w/ cpuadam
Signed-off-by: Oleg S <[email protected]>
- Loading branch information
Showing
1 changed file
with
28 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -121,7 +121,34 @@ allow you to customize aspects of the ZeRO stage 2 optimizer. | |
|
||
For more information about DeepSpeed, see [deepspeed.ai](https://www.deepspeed.ai/) | ||
|
||
#### `FSDPOptions` | ||
#### DeepSpeed with CPU Offloading | ||
|
||
To use DeepSpeed with CPU offloading, you'll usually encounter an issue indicating that the optimizer needed to use the Adam optimizer on CPU doesn't exist. To resolve this, please follow the following steps: | ||
|
||
**Rebuild DeepSpeed with CPUAdam**: | ||
|
||
You'll need to rebuild DeepSpeed in order for the optimizer to be present: | ||
|
||
```bash | ||
# uninstall deepspeed & reinstall with the flags for installing CPUAdam | ||
pip uninstall deepspeed | ||
DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 pip install deepspeed --no-deps | ||
``` | ||
|
||
**Ensure `-lcurand` is linked correctly**: | ||
|
||
A problem that we commonly encounter is that the `-lcurand` linker will not be present when | ||
DeepSpeed recompiles. To resolve this, you will need to find the location of the `libcurand.so` file in your machine and ensure it's present in `/usr/lib64`: | ||
|
||
```bash | ||
sudo ln -s /usr/local/cuda/lib64/libcurand.so.10 /usr/lib64/libcurand.so | ||
``` | ||
|
||
> ![NOTE] | ||
> The libcurand file may be located elswhere on your machine. To find it, you can use the following command: | ||
> `find / -name 'libcurand*.so*' 2>/dev/null` | ||
Check failure on line 149 in README.md GitHub Actions / markdown-lintTrailing spaces
|
||
### `FSDPOptions` | ||
|
||
Like DeepSpeed, we only expose a number of parameters for you to modify with FSDP. | ||
They are listed below: | ||
|