-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split AI Lab Recipes from RHEL AI Images #771
Comments
There is talk about splitting out the training section into their own repositories. The question is whether their should be 1 or three. ai-training-amd or ai-training/amd |
Yes, I have started the exercise to move the bootc images outside of ai-lab-recipes. Here is an example Github org showing what it could look like: https://github.com/smgglrs-ai/. These images are meant to be used as base images to install AI Lab recipes, so they only have the hardware enablement components and no prebaked application container images or cloud specific tools. In my opinion, the application container images should be added as we specialize the image for a given recipe. And if we want to ship to a specific cloud, we should add the relevant packages during the final image (AMI, VHD, etc... build, probably as an image builder feature. |
Here is a proposal for creating new repositories under https://github.com/containers: driver-toolkit This container image can be used by any stack to build out-of-tree drivers for a given kernel. bootc-amd-rocm, bootc-intel-gaudi, bootc-nvidia-cuda The bootc images are derived from the Cleanup The other folders under training could be removed at this stage. If we need more images for specific recipes, we can create new repositories or add them to the recipes folder, based on the level of dependency of their lifecycles. |
Why such a huge proliferation of repos? Why not keep them under a bootc-ai repo? or something similarly named. |
The have different lifecycles and require different expertise. An AMD stack contributor may not be relevant for NVIDIA code reviews. And we're currently talking about splitting the repository because of the proliferation of subfolder which complexifies the whole structure. |
Another reason would be CI complexity. The more artifacts the more complex CI. |
But there is also interaction between these repos, in some cases we want to share content, and not force people to open up the same change in three different repositories. Finally these REPOS are going to be fairly tiny. just a couple of Containerfiles? |
These repositories have a similar structure, but they don't really share much. The only thing that is identical is the update service, which could become an RPM to be shipped independently. |
So my understanding is that this repo will have model-servers and recipes kept so this is good for us (Podman AI Lab team) |
Yes just training is moving out. |
Actually, we could keep the |
There are no "recipes" for training, this was just thrown there so that we could start the process of building a AI Training project. It can be moved out without affecting other uses of ai-lab-recipes. |
Currently it is very difficult to understand how to contribute new recipes to this repository as it has grown to include additional things outside the scope of the podman extension recipes. The idea would be to somehow split the repositories so that the various stakeholders still have what is needed while making it easy for the community or RH contributors to add content to the individual pieces important to them.
/cc @sallyom @MichaelClifford @rhatdan @Gregory-Pereira
The text was updated successfully, but these errors were encountered: