Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate replacing conda-build with rattler-build #47

Open
vyasr opened this issue Apr 19, 2024 · 11 comments
Open

Evaluate replacing conda-build with rattler-build #47

vyasr opened this issue Apr 19, 2024 · 11 comments
Assignees

Comments

@vyasr
Copy link
Contributor

vyasr commented Apr 19, 2024

RAPIDS currently builds conda packages in CI using conda-build. The rattler-build tool is a newer alternative. It is written in Rust, and should be faster than conda-build (I haven't seen any official benchmarks yet, though). It only supports a limited subset of the meta.yaml recipe format, but that subset is designed to still enable all the same features, just with a more limited syntax (see CEPS 13 and 14). conda-build overhead is nontrivial (I've never benchmarked it, but I know it can stretch into multiple minutes beyond the environment solve when doing local CI reproductions), and reducing that would be quite valuable for us in improving our CI turnaround. Moreover, switching to the more restricted syntax described in the above CEPs would be beneficial because it would convert our conda recipes into pure YAML rather than the extended YAML currently used by meta.yaml. That change is important because the YAML extensions currently in our recipe make it impossible to parse or write with standard YAML parsers, which is a big reason why we have struggled to do things like support meta.yaml files in rapids-dependency-file-generator.

We should do a PoC of replacing conda-build with rattler-build in one repo (preferably something reasonably complex like cudf or cugraph) to see what it would take to make this transition, and how much we would benefit.

@vyasr
Copy link
Contributor Author

vyasr commented Jul 22, 2024

I now have two POCs of using rattler-build

Some notes:

I've littered the PRs liberally with comments, so those should provide additional information.

@msarahan
Copy link

Nice! Looks like you got some pretty good speedup there - maybe around 5 minutes of speedup, from what I can see. Considering the whole build time was 8-10 minutes, saving 5 minutes is huge.

@vyasr
Copy link
Contributor Author

vyasr commented Jul 23, 2024

Yup, similar speedups in absolute numbers for cugraph (smaller percentage, but still not bad to go from 28 mins to 22 mins. Also there's probably an extra 30 s in there for installing rattler-build itself in each job since it's not yet preinstalled in the image (but it will be).

@bdice
Copy link
Contributor

bdice commented Jul 23, 2024

I looked at the two POCs for ucxx and cugraph and I am supportive of this effort. It seems like there are a lot of steps to take first (like fixing strict priority conda builds, maybe altering librmm run exports, and maybe some rattler-build issues -- basically all the things that are marked in the comments) but overall I think this is mature enough that we can invest some time in this direction.

@vyasr
Copy link
Contributor Author

vyasr commented Jul 24, 2024

FYI @wolfv @ruben-arts here's the follow-up to our discussions from SciPy 🙂

@vyasr
Copy link
Contributor Author

vyasr commented Aug 16, 2024

We're actively discussing #84, which is IMO a blocker to making this change. We can work around it, but it requires a lot of extra boilerplate in recipes that I don't think it's worth proliferating.

@msarahan
Copy link

Adding repo tracker for progress tracking:

@bdice
Copy link
Contributor

bdice commented Jan 21, 2025

We should re-evaluate this now that prefix-dev/rattler-build#343 is closed. I am not sure if strict channel priority is still a blocker for us to migrate to rattler-build. We need to see if prefix-dev/rattler-build#1211 solved the problems we were seeing.

cc: @msarahan @gforsyth @vyasr @jakirkham

@vyasr
Copy link
Contributor Author

vyasr commented Jan 22, 2025

prefix-dev/rattler-build#343 is indeed the issue that was blocking us, and it looks like it was resolved about a month ago which is great news. It should allow us to proceed on both fronts in parallel. I think enabling strict channel priority is worthwhile enough that we may as well move forward with that as well now given that we have enough information about how we can do it and approval to do so. It should give additional improvements to both user experiences and our solve times.

@bdice
Copy link
Contributor

bdice commented Feb 7, 2025

I want to tack on a task here while we refactor our conda recipes: remove the + environ.get('VERSION_SUFFIX', '') from all our package version strings. We never use a VERSION_SUFFIX, it is leftover from when we used versioneer long ago. See examples: https://github.com/search?q=org%3Arapidsai+%2F%28%3F-i%29%5C%2B+environ.get%5C%28%27VERSION_SUFFIX%27%2C+%27%27%5C%29%2F&type=code

We had this as a task item 2 years ago when we switched to GitHub Actions and we never got it cleaned up. Here is a list of related task items, which we should re-evaluate as we rewrite the conda recipes. Some may be irrelevant or already completed now. https://github.com/rapidsai/ops/issues/2447#issuecomment-1343422135

@vyasr
Copy link
Contributor Author

vyasr commented Feb 8, 2025

The items originally listed in https://github.com/rapidsai/ops/issues/2447#issuecomment-1343422135 were later migrated to https://github.com/rapidsai/ops/issues/2537, which is the more up-to-date list. Most of the remaining unchecked items probably just require a single pass through the repos to see where there might be lingering issues because I think we've largely addressed them:

  • Change meta.yaml source.git_url to source.path to allow for local builds
  • Removing extra sccache env vars (at least cuxfilter is done, but we might find other places where we could improve this still)
  • Replace environ.get('XXX', 'YY') w/ just environ['XXX'] for env vars RAPIDS_CUDA_VERSION and CONDA_PY (@bdice just made a couple of PRs to help this along, and I suspect a quick pass through can get the rest)
  • Ensure rapids-*-env packages don't exist in dependencies.yaml (I believe this is done now)
  • Update version pinnings for libcu* so that we use <NEXT instead of <=HIGHEST to ensure compatibility with potential releases of CUDA 11.8.X while keeping below the known CUDA 12 version numbers (this just seems outdated given that our modern cuda-version pinning strategies for compatibility take this into account)

I'm not sure the following are worthwhile:

  • junit output annotations (not sure what it buys us)
  • consolidating CBCs (I think we have generally moved away from centralizing versions towards instead performing synchronized updates with something like rapids-reviser; if we do decide to centralize, I would prefer that we handle this by getting rapids-dependency-file-generator to support recipe.yaml files, which should be feasible after the rattler-build migration, and then implementing a centralized template store in dfg)

Last, this item is a minor quality of life improvement that we might as well do during the rattler migration but isn't really important enough to block anything on.

  • For packages with multiple mambabuild invocations (typically multiple Python packages), separate the outputs with rapids-logger notices like "Building cudf", "Building dask-cudf", etc. Applies to cudf, raft, cugraph, others?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants