You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently RAPIDS CI jobs spend a significant amount of time constructing environments, whether they be pip or conda.
A meaningful chunk of this time is spent downloading packages from remote sources.
Aside from the inherent wastefulness in time and network bandwidth, these downloads also expose us to more network connectivity issues, which have plagued our CI in general.
We should investigate using Github's native dependency caching functionality.
GHA recommends caching for specific package managers using the corresponding setup-* scripts, but those are more general tools intended to actually set up the installation of those package managers as well.
Since we will have those package managers installed into our base images, we will have to manage the caching directly.
That shouldn't be too difficult though; we simply need to construct a suitable cache key corresponding to the path to each package manager's local cache (e.g. /opt/conda/pkgs for conda).
We will need to figure out what makes the most sense to put into a cache key.
One option would be to use a single cache for all conda packages across our entire matrix of jobs, but that would mean sharing a cache between different architectures and CUDA versions, which may not be ideal.
The opposite alternative would be having a separate cache for every matrix entry in a job (e.g. arch/CUDA version/Python version).
In general, we'll need to balance cache size (which should speed up cache upload/download), contention (I don't know how well GHA handles every PR in a repo trying to upload or download the exact same cache simultaneously, hopefully that's optimized well but we'll have to test), and cache hit rate (if different jobs have partial overlap in their dependencies, then using a shared cache will increase the hit rate).
The text was updated successfully, but these errors were encountered:
Using GH native caching feature may not work as expected because RAPIDS is using self-hosted runners. When using caching with self-hosted runners, the cache is stored on GitHub-owned cloud storage, which means the runners will still need to download the cache from this storage for every run.
From GH documentation:
We are investigating to add some caching at the runner level for package managers like pip or conda
thanks Vyas for bringing this issue to my attention.
Jordan's comment is correct. Caching dependencies with GitHub's native solution doesn't really work for self-hosted runners. there is a community issue about it below:
We are working on a NGINX caching proxy that can be used to cache pip and conda packages close to our self-hosted runners. We are still in the testing phases, but we will be sure to broadcast the feature when it's ready.
Until then, I would recommend that no one work on this issue.
Currently RAPIDS CI jobs spend a significant amount of time constructing environments, whether they be pip or conda.
A meaningful chunk of this time is spent downloading packages from remote sources.
Aside from the inherent wastefulness in time and network bandwidth, these downloads also expose us to more network connectivity issues, which have plagued our CI in general.
We should investigate using Github's native dependency caching functionality.
GHA recommends caching for specific package managers using the corresponding
setup-*
scripts, but those are more general tools intended to actually set up the installation of those package managers as well.Since we will have those package managers installed into our base images, we will have to manage the caching directly.
That shouldn't be too difficult though; we simply need to construct a suitable cache key corresponding to the path to each package manager's local cache (e.g.
/opt/conda/pkgs
for conda).We will need to figure out what makes the most sense to put into a cache key.
One option would be to use a single cache for all conda packages across our entire matrix of jobs, but that would mean sharing a cache between different architectures and CUDA versions, which may not be ideal.
The opposite alternative would be having a separate cache for every matrix entry in a job (e.g. arch/CUDA version/Python version).
In general, we'll need to balance cache size (which should speed up cache upload/download), contention (I don't know how well GHA handles every PR in a repo trying to upload or download the exact same cache simultaneously, hopefully that's optimized well but we'll have to test), and cache hit rate (if different jobs have partial overlap in their dependencies, then using a shared cache will increase the hit rate).
The text was updated successfully, but these errors were encountered: