From ea9748f0c35235c62a5f8fe12b525b1f641ae277 Mon Sep 17 00:00:00 2001 From: Oleg S <97077423+RobotSail@users.noreply.github.com> Date: Tue, 12 Nov 2024 14:50:35 +0000 Subject: [PATCH] docs: include docs on installing deepspeed w/ cpuadam Signed-off-by: Oleg S <97077423+RobotSail@users.noreply.github.com> --- README.md | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 27219af9..72ef56fa 100644 --- a/README.md +++ b/README.md @@ -121,7 +121,34 @@ allow you to customize aspects of the ZeRO stage 2 optimizer. For more information about DeepSpeed, see [deepspeed.ai](https://www.deepspeed.ai/) -#### `FSDPOptions` +#### DeepSpeed with CPU Offloading + +To use DeepSpeed with CPU offloading, you'll usually encounter an issue indicating that the optimizer needed to use the Adam optimizer on CPU doesn't exist. To resolve this, please follow the following steps: + +**Rebuild DeepSpeed with CPUAdam**: + +You'll need to rebuild DeepSpeed in order for the optimizer to be present: + +```bash +# uninstall deepspeed & reinstall with the flags for installing CPUAdam +pip uninstall deepspeed +DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 pip install deepspeed --no-deps +``` + +**Ensure `-lcurand` is linked correctly**: + +A problem that we commonly encounter is that the `-lcurand` linker will not be present when +DeepSpeed recompiles. To resolve this, you will need to find the location of the `libcurand.so` file in your machine and ensure it's present in `/usr/lib64`: + +```bash +sudo ln -s /usr/local/cuda/lib64/libcurand.so.10 /usr/lib64/libcurand.so +``` + +> ![NOTE] +> The libcurand file may be located elswhere on your machine. To find it, you can use the following command: +> `find / -name 'libcurand*.so*' 2>/dev/null` + +### `FSDPOptions` Like DeepSpeed, we only expose a number of parameters for you to modify with FSDP. They are listed below: