Releases · nod-ai/shark-ai

08 Jan 22:40

v3.1.0

297d83d

Release v3.1.0 Latest

Latest

This release brings support for Large Language Model (LLM) serving, starting with the Llama 3.1 family of models from Meta on AMD Instinct™ MI300X Accelerators.

The full vertically-integrated SHARK AI stack is now available for deploying machine learning models:

The sharktank package builds bridges from popular machine learning models coming from existing model repositories like Hugging Face and frameworks like llama.cpp to the IREE compiler. This model export and compilation pipeline features whole program optimization and efficient cross-target code generation without depending on operator libraries.
The shortfin package provides serving applications built on top of the IREE runtime, with integration points to other ecosystem projects like the SGLang frontend. These applications are lightweight, portable, and packed with optimizations to improve serving efficiency.

Together, these packages simplify model deployment by eliminating the need for complex Docker containers or vendor-specific libraries while continuing to provide competitive performance and flexibility. Here are some metrics:

The native shortfin serving library, including a GPU runtime, fits in less than 2MB.
The self-contained compiler fits within 70MB. Once a model is compiled, it can be deployed using shortfin with no additional dependencies.

Highlights in this release

Llama 3.1 serving

Guides for serving Llama 3.1 models are available here:

This release focuses support on the 8B and 70B model sizes on a single GPU. Support for 405B models and multi-GPU serving is currently experimental.

Stable Diffusion XL (SDXL) enhancements

The previous release added initial support for serving SDXL through shortfin. This release contains several performance improvements for the SDXL model and for shortfin serving.

sharktank

The sharktank sub-project is SHARK's model development toolkit, which is now available as part of the shark-ai Python package.

Models in the Llama model family can be exported for compilation with sharktank.examples.export_paged_llm_v1, using the model implementation in sharktank/models/llama/. The model export and compilation process will be streamlined in future releases.
A preliminatry SHARK Tank Programming Guide is available for developers interested in understanding system architecture and implementation details.

shortfin

The shortfin_apps.sd.server application used for serving the SDXL diffusion model is now joined by shortfin_apps.llm.server to serve Large Language Models like Llama.
The LLM server can be used as a backend for the SGLang frontend by following the Using shortfin with sglang guide.

Changelog

Full list of changes: v3.0.0...v3.1.0

What's up next?

As always, SHARK AI is fully open source - including import pipelines, compiler tools, runtime libraries, and serving layers. Future releases will continue to build on these foundational components: expanding model architecture support, improving performance, connecting to a broader set of ecosystem services, and streamlining deployment workflows.

Assets 8

06 Nov 19:13

ScottTodd

v3.0.0

4770759

Release v3.0.0

This release marks public availability for the SHARK AI project, with a focus on serving the Stable Diffusion XL model on AMD Instinct™ MI300X Accelerators.

Highlights

shark-ai

The shark-ai package is the recommended entry point to using the project. This meta package includes compatible versions of all relevant sub-projects.

shortfin

The shortfin sub-project is SHARK's high performance inference library and serving engine.

Key features:

Fast inference using ahead of time model compilation powered by IREE
Throughput optimization via request batching and support for flexible device topologies
Asynchronous execution and efficient threading
Example applications for supported models
APIs available in Python and C
Detailed profiling support

For this release, shortfin uses precompiled programs built by the SHARK team using the sharktank sub-project. Future releases will streamline the model conversion process, add user guides, and enable adventurous users to bring their own custom models.

Current shortfin system requirements:

Python 3.11+
An AMD Instinct™ MI300X Accelerator
A compatible version of Linux and ROCm (see the ROCm compatability matrix)

Serving Stable Diffusion XL (SDXL) on MI300X

See the user guide for the latest instructions.

To serve the Stable Diffusion XL model, which generates output images given input text prompts:

# Set up a Python virtual environment.
python -m venv .venv
source .venv/bin/activate
# Optional: faster installation of torch with just CPU support.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Install shark-ai, including extra dependencies for apps.
pip install shark-ai[apps]

# Start the server then wait for it to download artifacts.
python -m shortfin_apps.sd.server \
  --device=amdgpu --device_ids=0 --topology="spx_single" \
  --build_preference=precompiled
# (wait for setup to complete)
# INFO - Application startup complete.
# INFO - Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

# Run the interactive client, sending text prompts and receiving generated images back.
python -m shortfin_apps.sd.simple_client --interactive
# Enter a prompt: a single cybernetic shark jumping out of the waves set against a technicolor sunset
# Sending request with prompt: ['a single cybernetic shark jumping out of the waves set against a technicolor sunset']
# Sending request batch # 0
# Saving response as image...
# Saved to gen_imgs/shortfin_sd_output_2024-11-15_16-30-30_0.png

Roadmap

This release is just the start of a longer journey. The SHARK platform is fully open source, so stay tuned for future developments. Here is a taste of what we have planned:

Support for a wider range of ML models, including popular LLMs
Performance improvements and optimized implementations for supported models across a wider range of devices
Integrations with other popular frameworks and APIs
General availability and user guides for the sharktank model development toolkit

Assets 8

02 Oct 16:43

ScottTodd

dev-wheels

38e71df

dev-wheels

Automatic snapshot release of shark-ai python wheels.

Assets 545

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highlights in this release

Llama 3.1 serving

Stable Diffusion XL (SDXL) enhancements

sharktank

shortfin

Changelog

What's up next?

Highlights

shark-ai

shortfin

Serving Stable Diffusion XL (SDXL) on MI300X

Roadmap

Releases: nod-ai/shark-ai

Release v3.1.0

Highlights in this release

Llama 3.1 serving

Stable Diffusion XL (SDXL) enhancements

sharktank

shortfin

Changelog

What's up next?

Release v3.0.0

Highlights

shark-ai

shortfin

Serving Stable Diffusion XL (SDXL) on MI300X

Roadmap

dev-wheels