Skip to content

1235357/SakuraLLM-Colab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SakuraLLM-Colab-Notebooks

Open in Colab


📑 Table of Contents

  1. ⚠️ Important Warning ⚠️
  2. 📖 Introduction
  3. 🚀 One-Click Launch in Colab
  4. 📊 Example Usage
  5. 🛠️ Prerequisites
  6. 📋 Step-by-Step Explanation
  7. 🔧 Testing the API
  8. ❓ FAQ
  9. 🙌 Acknowledgments

⚠️ Important Warning ⚠️

🚨 Kaggle Ban Notice: Kaggle has officially banned all SakuraLLM models. Using them on Kaggle will result in a permanent account ban.
👉 Alternative Options: Use GPU rental services or community compute-sharing platforms.
🔗 For more details, see the Issue Report.


📖 Introduction

This repository offers an easy-to-use Google Colab notebook to deploy the Sakura-14B-Qwen2.5-v1.0-GGUF model:

  • Backend: Uses vLLM for OpenAI-style API compatibility.
  • Applications: Translation tools, GPT-based custom bots, and other AI utilities.
  • Visualization: See example output below!

💡 Key Features

✔️ Easy one-click setup in Colab.
✔️ Supports OpenAI API for effortless integration.
✔️ Multiple API forwarding options (ngrok or localtunnel).
✔️ Beginner-friendly guidance, with rich examples and troubleshooting steps.


🚀 One-Click Launch in Colab

Open in Colab


📊 Example Usage

Below is a visualization of how the API to be used. Input any text in the prompt for dynamic results!

The vLLM backend supports adding multiple translators at the same time (request concurrency)


🛠️ Prerequisites

To get started, you’ll need:

  1. Google Account: Access Colab.
  2. Python Libraries (installed automatically in the notebook).
    • transformers, tokenizers, vLLM, huggingface-hub, flask, pyngrok, triton, torch.
  3. ngrok Token (Highly Suggest): Sign up at ngrok, log in, and copy your token for API forwarding.

📋 Step-by-Step Explaination

Open in Colab

Step 1: Mount Google Drive (Highly Suggest)

Mounting Google Drive ensures you can save models persistently across sessions.
Paste this code into a Colab cell:

from google.colab import drive
drive.mount('/content/gdrive')
ROOT_PATH = "/content/gdrive/MyDrive"

⚠️ Skipping This Step: Set the working directory to /content if you choose not to mount:

ROOT_PATH = "/content"

Step 2: Install Required Libraries

To install all necessary libraries, run the following command in your Colab notebook:

!pip install --upgrade pip
!pip install transformers tokenizers vllm huggingface-hub flask pyngrok triton torch

Important Note:

Sometimes, during the installation process, you may encounter errors or warnings similar to the one shown below:

Installation Warning Example

What to do:
This is a normal occurrence, especially in Colab environments. Simply follow the instructions provided by the error message. In most cases, you can resolve the issue by clicking:

  1. Runtime (in the top Colab menu).
  2. Select "Run all" to restart the setup process.

If it prompts you to re-enable certain permissions or runtime settings, accept those changes.


For further reassurance, here's another example of such an error you might see:

Runtime Restart Example

By following these simple steps, you'll ensure that all dependencies are properly installed and your environment is correctly configured. If you continue to face issues, consider resetting the Colab runtime and re-running the commands.


Step 3: API Forwarding Configuration Using Ngrok or Cloudflare

This step helps you configure API forwarding so that your application can be accessed over the internet, even if you don't have a static domain. We'll guide you through two options:

  1. Using Ngrok - For static domain forwarding with a custom domain.
  2. Using Cloudflare Tunnel - For temporary API access.

Option 1: Using Ngrok (Static Domain)

Ngrok simplifies API forwarding by providing a secure and reliable tunnel. If you need a static domain for consistent access, follow these steps:


Steps to Set Up Ngrok
  1. Get Your Ngrok Authentication Token

    • Visit Ngrok's Dashboard.
    • Sign up or log in to your account.
    • Copy your authentication token from the page.
  2. Obtain a Free Static Domain

    • Navigate to the Domains section on the Ngrok Dashboard.
    • Click "New Domain" to create a free static domain for your API.

Python Code for Ngrok Configuration

Add the following Python code to your project to set up Ngrok with your static domain:

# Set the ngrok authentication token (get it from https://dashboard.ngrok.com/get-started/your-authtoken)
ngrokToken = ""  # Add your token here

if ngrokToken:
    from pyngrok import conf, ngrok
    
    # Configure the ngrok authentication token
    conf.get_default().auth_token = ngrokToken
    conf.get_default().monitor_thread = False

    # Start ngrok tunnel with the custom domain
    try:
        # Set the ngrok free static domain (from https://dashboard.ngrok.com/domains)
        ssh_tunnel = ngrok.connect(8001, bind_tls=True, hostname="")
        public_url = ssh_tunnel.public_url
        print('Custom Domain Address: ' + public_url)
    except Exception as e:
        print(f"Error starting ngrok tunnel: {e}")

Expected Output
  • The script will display your public API URL on the first line of the output.
  • Save this URL; you’ll use it for making API requests.

Option 2: Using Cloudflare Tunnel (Temporary URL)

If you don’t have a static domain or an Ngrok authentication token, Cloudflare’s cloudflared provides a quick, temporary URL for API access.


Steps to Set Up Cloudflare Tunnel
  1. Download the cloudflared Binary
    Run the following commands to download and prepare cloudflared for use:

    # Navigate to the root path of your project
    %cd $ROOT_PATH
    
    # Download the latest Cloudflare binary
    !wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -O cloudflared
    !chmod a+x cloudflared
  2. Start the Tunnel
    Execute this command to open a Cloudflare tunnel:

    !./cloudflared tunnel --url localhost:8001

Expected Output
  • A temporary Cloudflare URL will appear in the output.
  • Use this URL for testing API requests during your development session.

Tips for Beginners

  • Ngrok vs. Cloudflare:

    • Use Ngrok if you need a persistent, custom domain for reliable API access.
    • Use Cloudflare for a quick, temporary solution without account setup.
  • Best Practices:

    • Save the displayed URLs immediately after running the commands.
    • Test the connection by accessing the URL in your browser or API client.

By following these steps, you’ll successfully set up API forwarding for your project using either Ngrok or Cloudflare.


Step 4: Downloading the Model and Starting the API Server

Once the tunnel is set up, proceed to download the model and run the API server:

# Navigate to the root path
%cd $ROOT_PATH

# Download the model from Hugging Face
!HF_ENDPOINT=https://huggingface.co huggingface-cli download SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF --local-dir models --include sakura-14b-qwen2.5-v1.0-q6k.gguf

# Start the API server with vLLM
!RAY_memory_monitor_refresh_ms="0" HF_ENDPOINT=https://huggingface.co OMP_NUM_THREADS=36 \
VLLM_ATTENTION_BACKEND=XFORMERS vllm serve ./models/sakura-14b-qwen2.5-v1.0-q6k.gguf \
--tokenizer Qwen/Qwen2.5-14B-Instruct --dtype float16 --api-key token-abc123 \
--kv-cache-dtype auto --max-model-len 4096 --tensor-parallel-size 1 \
--gpu-memory-utilization 0.99 --disable-custom-all-reduce --enforce-eager \
--use-v2-block-manager --disable-log-requests --host 0.0.0.0 --port 8001 \
--served-model-name "Qwen2.5-14B-Instruct" &

Output: Your API backend will start, and you can use the displayed public URL from the previous step for making requests.


🔧 Testing the API

Python Example Code

Use the following script to send requests to the API. Replace <YOUR_API_URL> with the public API endpoint.

import requests

API_ENDPOINT = "<YOUR_API_URL>"  # Replace with your API URL
API_KEY = "token-abc123"  # Replace with your API key

headers = {"Authorization": f"Bearer {API_KEY}"}
data = {
    "prompt": "Translate this sentence to Japanese: 'Good morning!'",
    "max_tokens": 50
}

response = requests.post(f"{API_ENDPOINT}/v1/completions", json=data, headers=headers)
print(response.json())

Open in Colab

❓ FAQ

Q1: What is the purpose of mounting Google Drive?

Mounting Drive ensures that all downloaded models and intermediate data are preserved between sessions. Without this, data is erased when Colab disconnects.

Q2: Do I need a powerful GPU?

No, the notebook is designed for use with free-tier Colab GPUs. However, performance and speed may improve if you use Colab Pro, Colab Pro+, or rented GPUs with higher memory capacity.

Q3: I encountered the error "Failed to infer device type." What does it mean?

This typically happens when:

  1. Insufficient GPU memory is available for the model.
  2. The notebook is configured to use GPU, but Colab assigns a CPU-only runtime.
  3. A dependency conflict in the environment.

Solutions:

  • Check if the assigned runtime is GPU-enabled (Runtime > Change runtime type > Hardware accelerator > GPU).
  • Verify that the model's size fits within your GPU’s memory capacity (15GB is usually sufficient for most tasks).
  • Restart the runtime, clear storage if needed, and re-run the notebook.

Q4: How do I troubleshoot API connection issues?

If you're using ngrok:

  • Ensure your ngrokToken is correctly set and matches your ngrok account.
  • Confirm that the generated public URL appears in the output after Step 3.
  • Verify the connection by opening the public URL in a browser to check the API's availability.

If the public URL works but not in the Sakura workspace:

  • Ensure the API Key specified during setup (e.g., --api-key token-abc123) matches the key used in Sakura.
  • Double-check the model name (e.g., --served-model-name "Qwen2.5-14B-Instruct") for consistency.

For Cloudflare users, remember that the URL is temporary and may require reconfiguration between sessions.


Q5: Can I modify the model?

Yes! The notebook supports fine-tuning or modifications to the model using tools like Hugging Face's transformers library. For large models, ensure you have sufficient GPU resources. Cloud environments may not always support extensive fine-tuning, but you can use your local machine or rented servers for advanced customization.


Q6: What is the API Key and model name?

  • The API Key is manually set during the vLLM backend setup. For example, in the code:

    --api-key token-abc123

    Use the same API Key (token-abc123) in the Sakura workspace or other applications.

  • The model name is defined in the setup as:

    --served-model-name "Qwen2.5-14B-Instruct"

    Ensure you input this exact model name in the application using the API.


Q7: Why doesn’t the public API URL work with the Sakura workspace?

This is commonly due to:

  1. Incorrect API Key: Ensure the key matches what was set during the vLLM server startup.
  2. Model name mismatch: The model name used in the backend must match the one configured in the Sakura workspace.
  3. Dynamic URLs: If you use a non-static URL (e.g., without a custom ngrok domain), the URL changes with each session.

Suggested Solution:

  • Double-check the API Key and model name used in the workspace.
  • For persistent URLs, use a static ngrok domain or migrate to a local hosting setup for stability.

Q8: Is it safe to use Colab for this project?

Using Colab for personal or research projects is generally acceptable. However, avoid engaging in activities that violate Google's usage policies, such as excessive resource usage or sharing Colab-generated APIs publicly without restrictions. Always respect the terms of service to prevent potential limitations or bans.


Q9: Is there a one-click setup?

Yes! The Colab notebook includes a one-click launcher link for streamlined operation. Simply open the link and run all cells without modifying the code. This setup is tested to work seamlessly as long as runtime conditions are met.


🙌 Acknowledgments