- Actively submit optimized checkpoints of the specified model or pipeline. No need for continuous operation; can wait for results after submission
- Use custom algorithms or tools to enhance model performance
- Aim to produce the most generically optimized version of the model
- Pick an active contest and clone the baseline repository. for example:
git clone --depth 1 https://github.com/womboai/flux-schnell-edge-inference
- Make your own repository on a git provider such as
GitHub
orHuggingFace
to optimize in. Ensure the repository is public. - Edit the
src/pipeline.py
file to include any loading or inference optimizations, and commit when finished.- See Proposals for Optimizations for ideas.
- Ensure the repository follows the Submission Requirements.
- Clone the EdgeMaxxing repository:
git clone --depth 1 https://github.com/womboai/edge-maxxing cd edge-maxxing/miner
- Install pipx
- Install
uv
:pipx ensurepath pipx install uv
- Run the submission script and follow the interactive prompts:
uv run submit_model \ --netuid 39 \ --subtensor.network finney \ --wallet.name {wallet} \ --wallet.hotkey {hotkey}
- Optionally, benchmark your model locally before submitting
pipx install huggingface-hub[cli,hf_transfer] uv run submit_model {existing args} --benchmarking.on
- Requires
Ubuntu 22.04
- Ensure you are on the correct hardware for the contest, e.g.
NVIDIA GeForce RTX 4090
- Requires
- Must be a public repository
- All code within the repository must be under
16MB
. Include large Huggingface files and models in themodels
array in thepyproject.toml
. - The size of the repository + all Huggingface models must be under
100GB
. - The pipeline must load within
240s
on the target hardware. - Must be offline (network is disabled during benchmarking).
- Submission code must not be obfuscated. Obfuscation libraries like
pyarmor
are not allowed. - You are free to build on top of other miner's work; however, submitting copied submissions is not allowed and may result in your coldkey being blacklisted!
There are several effective techniques to explore when optimizing machine learning models for edge devices. Here are some key approaches to consider:
-
Knowledge Distillation: Train a smaller, more efficient model to mimic a larger, more complex one. This technique is particularly useful for deploying models on devices with limited computational resources.
-
Quantization: Reduce the precision of the model's weights and activations, typically from 32-bit floating-point to 8-bit integers. This decreases memory usage and computational requirements, making it possible to run models on edge devices. Additionally, exploring low-precision representation for weights (e.g., using 8-bit integers) can reduce memory bandwidth usage for memory-bound models, even if the actual compute is done in higher precision (e.g., 32-bit).
-
TensorRT and Hardware-Specific Optimizations: Utilize NVIDIA's TensorRT to optimize deep learning models for inference on NVIDIA GPUs. This involves more than just layer fusion; it includes optimizing assembly, identifying prefetch opportunities, optimizing L2 memory allocation, writing specialized kernels, and performing graph optimizations. These techniques enhance performance and reduce latency by tailoring the model to the specific hardware configuration.
-
Hyperparameter Tuning: Optimize the configuration settings of the model to improve its performance. This can be done manually or through automated methods such as grid search or Bayesian optimization. While not a direct edge optimization, it is an essential step in the overall process of model optimization.
We encourage developers to explore these optimization techniques or develop other approaches to enhance model performance and efficiency specifically for edge devices.