Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Unpin Dask, adopt CUDA 12.8 and RAPIDS 24.12 #11194

Merged
merged 12 commits into from
Feb 8, 2025
Merged

Conversation

hcho3
Copy link
Collaborator

@hcho3 hcho3 commented Jan 30, 2025

Related: dmlc/xgboost-devops#9, dmlc/xgboost-devops#11

Use latest Dask, CUDA 12.8, and RAPIDS 24.12.

Also it's now easy to update the image tag in all pipelines simultaneously.

@hcho3 hcho3 changed the title [Don't merge] [CI] Try unpinning Dask [Don't merge] [CI] Try unpinning Dask to support RAPIDS 24.12 Jan 30, 2025
@hcho3
Copy link
Collaborator Author

hcho3 commented Jan 30, 2025

@trivialfis We have a problem here: Latest RAPIDS cannot be installed with Dask 2024.10.0, and yet our tests cannot pass with latest Dask. 😢

@jakirkham
Copy link
Contributor

Thanks Hyunsu! 🙏

What are the failures we are seeing?

@hcho3
Copy link
Collaborator Author

hcho3 commented Jan 31, 2025

What are the failures we are seeing?

See:

E distributed.client.FutureCancelledError: ('_argmax-16ef6ea6596412a85b175e90dab51ecb', 44) cancelled for reason: unknown.
/opt/miniforge/envs/gpu_test/lib/python3.10/site-packages/distributed/client.py:2427: FutureCancelledError

@trivialfis
Copy link
Member

We have a problem here: Latest RAPIDS cannot be installed with Dask 2024.10.0, and yet our tests cannot pass with latest Das

I assume it's caused by dask dataframe. Can we disable some of the tests for now? Or use arrays in place of dataframes? I think it will take some time for dask to go over all the issues. It's unlikely we halt the ci for that.

@hcho3
Copy link
Collaborator Author

hcho3 commented Feb 7, 2025

Yes, I'll probably skip the problematic test, as done in the 2.1 branch:

@pytest.mark.xfail(
dask_version_ge110, reason="Test cannot pass with Dask 2024.11.0+"
)
@pytest.mark.skipif(**tm.no_cudf())
@pytest.mark.parametrize("model", ["boosting"])
def test_dask_classifier(self, model: str, local_cuda_client: Client) -> None:

@hcho3 hcho3 changed the title [Don't merge] [CI] Try unpinning Dask to support RAPIDS 24.12 [CI] Unpin Dask to support RAPIDS 24.12 Feb 8, 2025
@hcho3 hcho3 changed the title [CI] Unpin Dask to support RAPIDS 24.12 [CI] Unpin Dask, adopt CUDA 12.8 and RAPIDS 24.12 Feb 8, 2025
@hcho3 hcho3 changed the title [CI] Unpin Dask, adopt CUDA 12.8 and RAPIDS 24.12 [wip] [CI] Unpin Dask, adopt CUDA 12.8 and RAPIDS 24.12 Feb 8, 2025
@hcho3 hcho3 changed the title [wip] [CI] Unpin Dask, adopt CUDA 12.8 and RAPIDS 24.12 [CI] Unpin Dask, adopt CUDA 12.8 and RAPIDS 24.12 Feb 8, 2025
@hcho3 hcho3 requested review from trivialfis and removed request for trivialfis February 8, 2025 08:39
@hcho3
Copy link
Collaborator Author

hcho3 commented Feb 8, 2025

@trivialfis I change the CI scripts a bit to make it easy to test new CI images. Now we only have to change a single file ops/pipeline/get-image-tag.sh. See the updated instructions in https://xgboost--11194.org.readthedocs.build/en/11194/contrib/ci.html#making-changes-to-ci-containers.

TODO.

  1. Wait until all the tests pass for this PR.
  2. Merge Upgrade to CUDA 12.8 xgboost-devops#11.
  3. Update this PR to change ops/pipeline/get-image-tag.sh to IMAGE_REPO=main.

@hcho3 hcho3 requested a review from trivialfis February 8, 2025 08:45
Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, one minor question.

@@ -10,6 +10,7 @@
import pytest
from hypothesis import given, note, settings, strategies
from hypothesis._settings import duration
from packaging.version import parse as parse_version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not in the stdlib, do we need to specify it in our testing environments?

Copy link
Collaborator Author

@hcho3 hcho3 Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, our CI already has packaging installed (due to it being a dependency of other packages we use). However, if we were to add a [test] suite for optional dependencies in pyproject.toml, we should have packaging in that suite.

@hcho3 hcho3 requested a review from trivialfis February 8, 2025 09:21
@hcho3 hcho3 merged commit 88c8d1a into dmlc:master Feb 8, 2025
57 checks passed
@hcho3 hcho3 deleted the unpin_dask branch February 8, 2025 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants