Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update ubuntu in ci #97

Merged
merged 14 commits into from
Feb 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ concurrency:
jobs:
cmake-build:
name: Build FlexFlow Serve
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
defaults:
run:
shell: bash -l {0} # required to use an activated conda environment
Expand Down Expand Up @@ -68,14 +68,14 @@ jobs:
../config/config.linux
make -j $n_build_cores
make install
# sudo make install
# sudo ldconfig

- name: Check availability of flexflow modules in Python
run: |
if [[ "${FF_GPU_BACKEND}" == "cuda" ]]; then
export LD_LIBRARY_PATH="$CUDA_PATH/lib64/stubs:$LD_LIBRARY_PATH"
sudo ln -s $CUDA_PATH/lib64/stubs/libcuda.so $CUDA_PATH/lib64/stubs/libcuda.so.1
else
sudo ln -sf /usr/lib/x86_64-linux-gnu/libstdc++.so.6 $CONDA_PREFIX/lib/libstdc++.so.6
fi
# Remove build folder to check that the installed version can run independently of the build files
rm -rf build
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/clang-format-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ on: [push, pull_request, workflow_dispatch]
jobs:
formatting-check:
name: Formatting Check
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
strategy:
matrix:
path:
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/docker-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ concurrency:
jobs:
docker-build-rocm:
name: Build and Install FlexFlow in a Docker Container (ROCm backend)
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
if: ${{ ( github.event_name != 'push' && github.event_name != 'schedule' && github.event_name != 'workflow_dispatch' ) || github.ref_name != 'inference' }}
env:
FF_GPU_BACKEND: "hip_rocm"
Expand Down Expand Up @@ -69,7 +69,7 @@ jobs:

docker-build-cuda:
name: Build and Install FlexFlow in a Docker Container (CUDA backend)
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
strategy:
matrix:
cuda_version: ["11.8", "12.0", "12.1", "12.2"]
Expand Down Expand Up @@ -119,7 +119,7 @@ jobs:

notify-slack:
name: Notify Slack in case of failure
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
needs: [docker-build-cuda, docker-build-and-publish-rocm]
if: ${{ failure() && github.repository_owner == 'flexflow' && ( github.event_name == 'push' || github.event_name == 'workflow_dispatch' ) && github.ref_name == 'inference' }}
steps:
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/gpu-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ jobs:
CPP_INFERENCE_TESTS: ${{ vars.CPP_INFERENCE_TESTS }}
run: |
source ./build/set_python_envs.sh
./tests/fine_grained_alignment_test.sh
./tests/inference_tests.sh
- name: Run PEFT tests
Expand Down
75 changes: 7 additions & 68 deletions .github/workflows/helpers/install_cudnn.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,72 +8,11 @@ cd "${BASH_SOURCE[0]%/*}"
ubuntu_version=$(lsb_release -rs)
ubuntu_version=${ubuntu_version//./}

# Install CUDNN
cuda_version=${1:-12.1.1}
cuda_version=$(echo "${cuda_version}" | cut -f1,2 -d'.')
echo "Installing CUDNN for CUDA version: ${cuda_version} ..."
CUDNN_LINK=http://developer.download.nvidia.com/compute/redist/cudnn/v8.0.5/cudnn-11.1-linux-x64-v8.0.5.39.tgz
CUDNN_TARBALL_NAME=cudnn-11.1-linux-x64-v8.0.5.39.tgz
if [[ "$cuda_version" == "10.1" ]]; then
CUDNN_LINK=https://developer.download.nvidia.com/compute/redist/cudnn/v8.0.5/cudnn-10.1-linux-x64-v8.0.5.39.tgz
CUDNN_TARBALL_NAME=cudnn-10.1-linux-x64-v8.0.5.39.tgz
elif [[ "$cuda_version" == "10.2" ]]; then
CUDNN_LINK=https://developer.download.nvidia.com/compute/redist/cudnn/v8.0.5/cudnn-10.2-linux-x64-v8.0.5.39.tgz
CUDNN_TARBALL_NAME=cudnn-10.2-linux-x64-v8.0.5.39.tgz
elif [[ "$cuda_version" == "11.0" ]]; then
CUDNN_LINK=https://developer.download.nvidia.com/compute/redist/cudnn/v8.0.5/cudnn-11.0-linux-x64-v8.0.5.39.tgz
CUDNN_TARBALL_NAME=cudnn-11.0-linux-x64-v8.0.5.39.tgz
elif [[ "$cuda_version" == "11.1" ]]; then
CUDNN_LINK=https://developer.download.nvidia.com/compute/redist/cudnn/v8.0.5/cudnn-11.1-linux-x64-v8.0.5.39.tgz
CUDNN_TARBALL_NAME=cudnn-11.1-linux-x64-v8.0.5.39.tgz
elif [[ "$cuda_version" == "11.2" ]]; then
CUDNN_LINK=https://developer.download.nvidia.com/compute/redist/cudnn/v8.1.1/cudnn-11.2-linux-x64-v8.1.1.33.tgz
CUDNN_TARBALL_NAME=cudnn-11.2-linux-x64-v8.1.1.33.tgz
elif [[ "$cuda_version" == "11.3" ]]; then
CUDNN_LINK=https://developer.download.nvidia.com/compute/redist/cudnn/v8.2.1/cudnn-11.3-linux-x64-v8.2.1.32.tgz
CUDNN_TARBALL_NAME=cudnn-11.3-linux-x64-v8.2.1.32.tgz
elif [[ "$cuda_version" == "11.4" ]]; then
CUDNN_LINK=https://developer.download.nvidia.com/compute/redist/cudnn/v8.2.4/cudnn-11.4-linux-x64-v8.2.4.15.tgz
CUDNN_TARBALL_NAME=cudnn-11.4-linux-x64-v8.2.4.15.tgz
elif [[ "$cuda_version" == "11.5" ]]; then
CUDNN_LINK=https://developer.download.nvidia.com/compute/redist/cudnn/v8.3.0/cudnn-11.5-linux-x64-v8.3.0.98.tgz
CUDNN_TARBALL_NAME=cudnn-11.5-linux-x64-v8.3.0.98.tgz
elif [[ "$cuda_version" == "11.6" ]]; then
CUDNN_LINK=https://developer.download.nvidia.com/compute/redist/cudnn/v8.4.0/local_installers/11.6/cudnn-linux-x86_64-8.4.0.27_cuda11.6-archive.tar.xz
CUDNN_TARBALL_NAME=cudnn-linux-x86_64-8.4.0.27_cuda11.6-archive.tar.xz
elif [[ "$cuda_version" == "11.7" ]]; then
CUDNN_LINK=https://developer.download.nvidia.com/compute/redist/cudnn/v8.5.0/local_installers/11.7/cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
CUDNN_TARBALL_NAME=cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
elif [[ "$cuda_version" == "11.8" ]]; then
CUDNN_LINK=https://developer.download.nvidia.com/compute/redist/cudnn/v8.7.0/local_installers/11.8/cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz
CUDNN_TARBALL_NAME=cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz
elif [[ "$cuda_version" == "12.0" || "$cuda_version" == "12.1" || "$cuda_version" == "12.2" || "$cuda_version" == "12.3" || "$cuda_version" == "12.4" || "$cuda_version" == "12.5" ]]; then
CUDNN_LINK=https://developer.download.nvidia.com/compute/redist/cudnn/v8.8.0/local_installers/12.0/cudnn-local-repo-ubuntu2004-8.8.0.121_1.0-1_amd64.deb
CUDNN_TARBALL_NAME=cudnn-local-repo-ubuntu2004-8.8.0.121_1.0-1_amd64.deb
else
echo "CUDNN support for CUDA version above 12.5 not yet added"
exit 1
fi
wget -c -q $CUDNN_LINK
if [[ "$cuda_version" == "11.6" || "$cuda_version" == "11.7" || "$cuda_version" == "11.8" ]]; then
tar -xf $CUDNN_TARBALL_NAME -C ./
CUDNN_EXTRACTED_TARBALL_NAME="${CUDNN_TARBALL_NAME::-7}"
sudo cp -r "$CUDNN_EXTRACTED_TARBALL_NAME"/include/* /usr/local/include
sudo cp -r "$CUDNN_EXTRACTED_TARBALL_NAME"/lib/* /usr/local/lib
rm -rf "$CUDNN_EXTRACTED_TARBALL_NAME"
elif [[ "$CUDNN_TARBALL_NAME" == *.deb ]]; then
wget -c -q "https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${ubuntu_version}/x86_64/cuda-keyring_1.1-1_all.deb"
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update -y
rm -f cuda-keyring_1.1-1_all.deb
sudo dpkg -i $CUDNN_TARBALL_NAME
sudo cp /var/cudnn-local-repo-ubuntu2004-8.8.0.121/cudnn-local-A9E17745-keyring.gpg /usr/share/keyrings/
sudo apt update -y
sudo apt install -y libcudnn8
sudo apt install -y libcudnn8-dev
sudo apt install -y libcudnn8-samples
else
sudo tar -xzf $CUDNN_TARBALL_NAME -C /usr/local
fi
rm $CUDNN_TARBALL_NAME
wget -c -q "https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${ubuntu_version}/x86_64/cuda-keyring_1.1-1_all.deb"
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update -y
rm -f cuda-keyring_1.1-1_all.deb
sudo apt-get -y install libcudnn9-cuda-12
sudo apt-get -y install libcudnn9-dev-cuda-12
sudo apt-get -y install libcudnn9-samples
sudo ldconfig
43 changes: 28 additions & 15 deletions .github/workflows/helpers/install_dependencies.sh
Original file line number Diff line number Diff line change
Expand Up @@ -40,28 +40,41 @@ if [[ "$FF_GPU_BACKEND" == "hip_cuda" || "$FF_GPU_BACKEND" = "hip_rocm" ]]; then
elif [ "$hip_version" = "5.5" ]; then
AMD_GPU_SCRIPT_NAME=amdgpu-install_5.5.50500-1_all.deb
fi
AMD_GPU_SCRIPT_URL="https://repo.radeon.com/amdgpu-install/${hip_version}/ubuntu/focal/${AMD_GPU_SCRIPT_NAME}"
# Detect Ubuntu version
UBUNTU_VERSION=$(lsb_release -rs)
if [[ "$UBUNTU_VERSION" == "20.04" ]]; then
UBUNTU_CODENAME="focal"
elif [[ "$UBUNTU_VERSION" == "22.04" ]]; then
UBUNTU_CODENAME="jammy"
elif [[ "$UBUNTU_VERSION" == "24.04" ]]; then
UBUNTU_CODENAME="jammy"
else
echo "Unsupported Ubuntu version: $UBUNTU_VERSION"
exit 1
fi

AMD_GPU_SCRIPT_URL="https://repo.radeon.com/amdgpu-install/${hip_version}/ubuntu/${UBUNTU_CODENAME}/${AMD_GPU_SCRIPT_NAME}"
# Download and install AMD GPU software with ROCM and HIP support
wget "$AMD_GPU_SCRIPT_URL"
sudo apt-get install -y ./${AMD_GPU_SCRIPT_NAME}
sudo rm ./${AMD_GPU_SCRIPT_NAME}
sudo amdgpu-install -y --usecase=hip,rocm --no-dkms
sudo apt-get install -y hip-dev hipblas miopen-hip rocm-hip-sdk rocm-device-libs

# Install protobuf v3.20.x manually
sudo apt-get update -y && sudo apt-get install -y pkg-config zip g++ zlib1g-dev unzip python autoconf automake libtool curl make
git clone -b 3.20.x https://github.com/protocolbuffers/protobuf.git
cd protobuf/
git submodule update --init --recursive
./autogen.sh
./configure
cores_available=$(nproc --all)
n_build_cores=$(( cores_available -1 ))
if (( n_build_cores < 1 )) ; then n_build_cores=1 ; fi
make -j $n_build_cores
sudo make install
sudo ldconfig
cd ..
# # Install protobuf v3.20.x manually
# sudo apt-get update -y && sudo apt-get install -y pkg-config zip g++ zlib1g-dev unzip python autoconf automake libtool curl make
# git clone -b 3.20.x https://github.com/protocolbuffers/protobuf.git
# cd protobuf/
# git submodule update --init --recursive
# ./autogen.sh
# ./configure
# cores_available=$(nproc --all)
# n_build_cores=$(( cores_available -1 ))
# if (( n_build_cores < 1 )) ; then n_build_cores=1 ; fi
# make -j $n_build_cores
# sudo make install
# sudo ldconfig
# cd ..
else
echo "FF_GPU_BACKEND: ${FF_GPU_BACKEND}. Skipping installing HIP dependencies"
fi
Expand Down
43 changes: 3 additions & 40 deletions .github/workflows/helpers/install_nccl.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,47 +5,10 @@ set -x
# Cd into directory holding this script
cd "${BASH_SOURCE[0]%/*}"

# Add NCCL key ring
ubuntu_version=$(lsb_release -rs)
ubuntu_version=${ubuntu_version//./}
wget "https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${ubuntu_version}/x86_64/cuda-keyring_1.1-1_all.deb"
wget -c -q "https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${ubuntu_version}/x86_64/cuda-keyring_1.1-1_all.deb"
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update -y
sudo apt-get update -y --allow-change-held-packages
rm -f cuda-keyring_1.1-1_all.deb

# Install NCCL
cuda_version=${1:-12.1.1}
cuda_version=$(echo "${cuda_version}" | cut -f1,2 -d'.')
echo "Installing NCCL for CUDA version: ${cuda_version} ..."

# We need to run a different install command based on the CUDA version, otherwise running `sudo apt install libnccl2 libnccl-dev`
# will automatically upgrade CUDA to the latest version.

if [[ "$cuda_version" == "11.0" ]]; then
sudo apt install libnccl2=2.15.5-1+cuda11.0 libnccl-dev=2.15.5-1+cuda11.0
elif [[ "$cuda_version" == "11.1" ]]; then
sudo apt install libnccl2=2.8.4-1+cuda11.1 libnccl-dev=2.8.4-1+cuda11.1
elif [[ "$cuda_version" == "11.2" ]]; then
sudo apt install libnccl2=2.8.4-1+cuda11.2 libnccl-dev=2.8.4-1+cuda11.2
elif [[ "$cuda_version" == "11.3" ]]; then
sudo apt install libnccl2=2.9.9-1+cuda11.3 libnccl-dev=2.9.9-1+cuda11.3
elif [[ "$cuda_version" == "11.4" ]]; then
sudo apt install libnccl2=2.11.4-1+cuda11.4 libnccl-dev=2.11.4-1+cuda11.4
elif [[ "$cuda_version" == "11.5" ]]; then
sudo apt install libnccl2=2.11.4-1+cuda11.5 libnccl-dev=2.11.4-1+cuda11.5
elif [[ "$cuda_version" == "11.6" ]]; then
sudo apt install libnccl2=2.12.12-1+cuda11.6 libnccl-dev=2.12.12-1+cuda11.6
elif [[ "$cuda_version" == "11.7" ]]; then
sudo apt install libnccl2=2.14.3-1+cuda11.7 libnccl-dev=2.14.3-1+cuda11.7
elif [[ "$cuda_version" == "11.8" ]]; then
sudo apt install libnccl2=2.16.5-1+cuda11.8 libnccl-dev=2.16.5-1+cuda11.8
elif [[ "$cuda_version" == "12.0" ]]; then
sudo apt install libnccl2=2.18.3-1+cuda12.0 libnccl-dev=2.18.3-1+cuda12.0
elif [[ "$cuda_version" == "12.1" ]]; then
sudo apt install libnccl2=2.18.3-1+cuda12.1 libnccl-dev=2.18.3-1+cuda12.1
elif [[ "$cuda_version" == "12.2" ]]; then
sudo apt install libnccl2=2.18.3-1+cuda12.2 libnccl-dev=2.18.3-1+cuda12.2
else
echo "Installing NCCL for CUDA version ${cuda_version} is not supported"
exit 1
fi
sudo apt install -y --allow-change-held-packages libnccl2 libnccl-dev
2 changes: 1 addition & 1 deletion .github/workflows/pip-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ concurrency:
jobs:
build-n-publish:
name: Build and publish Python 🐍 distributions 📦 to PyPI and TestPyPI
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
permissions:
# IMPORTANT: this permission is mandatory for trusted publishing
id-token: write
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pip-install.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ concurrency:
jobs:
pip-install-flexflow:
name: Install FlexFlow with pip
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
defaults:
run:
shell: bash -l {0} # required to use an activated conda environment
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/shell-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ on: [push, pull_request, workflow_dispatch]
jobs:
shellcheck:
name: Shellcheck
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- name: Run ShellCheck
Expand Down
16 changes: 6 additions & 10 deletions conda/flexflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,19 @@ channels:
- defaults
- conda-forge
dependencies:
- python>=3.6,<3.12
- cffi>=1.11.0
- Pillow
- python
- cffi
- rust
- cmake-build-extension
- jq
- pytest
- pip
- pip:
- qualname>=0.1.0
- keras_preprocessing>=1.1.2
- numpy>=1.16.0
- torch>=1.13.1
- torchaudio>=0.13.1
- torchvision>=0.14.1
- numpy
- torch
- torchaudio
- torchvision
- regex
- onnx
- transformers>=4.47.1
- sentencepiece
- einops
Expand Down
2 changes: 1 addition & 1 deletion docker/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ cores_available=$(nproc --all)
n_build_cores=$(( cores_available -1 ))

# check python_version
if [[ "$python_version" != @(3.8|3.9|3.10|3.11|latest) ]]; then
if [[ "$python_version" != @(3.8|3.9|3.10|3.11|3.12|latest) ]]; then
echo "python_version not supported!"
exit 0
fi
Expand Down
Loading
Loading