TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios
Daniel Rossi, Guido Borghi, Roberto Vezzani
University of Modena and Reggio Emilia, Italy
TakuNet is a convolutional architecture designed to be extremely efficient when deployed on embedded systems. Extensive experiments on AIDER and AIDERV2 demostrate that TakuNet is able to achieve near-state-of-the-art accuracy while being extremely efficient in terms of number of parameters, memory footprint and FLOPs among the competitors.
A deeper inspection on models performance on a few embedded devices such as Raspberry Pis and NVIDIA Jetson Orin Nano show that TakuNet can achieve a very large speedup in terms of Frame per Second against competitors, mostly on recent embedded architectures. Since TakuNet is trained with float-16 resolution, its optimization through TensorRT on NVIDIA hardware accelerator does not approximate the model weights.
TakuNet code exploits docker container to simplify code distribution and execution on different devices and hardware architectures. If you have already installed docker on your machine, you can skip the docker setup step.
This work was developed on a Ubuntu-24.04.1-LTS-based system with NVIDIA Drivers 560.35.03, equipped with an Intel i5 8600K, 16GB DDR4 2666MHz, NVIDIA RTX 3090 24GB. Training were performed on a different machine, composed of an Intel i7 12700F and NVIDIA RTX 4070ti Super. On the other hand, experiments on Raspberry Pi(s) were conducted through the docker container running on Raspbian Bookworm, while NVIDIA Jetpack 6.1 was installed on the Jetson Orin Nano Devkit device.
- Install docker on a Linux based machine possibly
wget https://get.docker.com/ -O get_docker.sh chmod +x get_docker.sh bash get_docker.sh
- Once docker has been installed, install nvidia-docker for GPU support
update and install the nvidia container tool
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
configure nvidia container toolkitsudo apt-get update sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker
- Add required permissions to your user in order to perform actions with docker on containers
sudo groupadd docker sudo usermod -aG docker $USER newgrp docker
- Clone the repository
git clone https://github.com/DanielRossi1/TakuNet.git
- Build the docker container
cd TakuNet/docker ./build.sh
- Once the container has finished to build, run it. the run script supports directory mount through arguments. Directories are mounted in
/home/user/...
# Just run the container ./run # run the container and mount a directory (e.g. the one which contains the dataset). Here you will find AIDER in /home/user/AIDER ./run -d /home/your-username/path-to-data/AIDER
The execution interface is really simple, it consists of a bash script which launches the main.py script, automatically loading the arguments and configurations specified in TakuNet's configuration file: configs/TakuNet.yml
.
cd src
./launch.sh TakuNet
Base Settings
- num_epochs (int): Total number of training epochs
- batch_size (int): Batch size used for training
- seed (int): Random seed set for training and testing
- experiment_name (str): Name of the folder that will be created for the training, or sourced for testing. It will be created in
src/runs/
. Multiple runs over the same experiment name will overwrite logs. - main_runs_folder (path): Here you specify the train and test output path
- pin_memory (bool): Torchvision dataloader pin memory
- mode (Train/Test/Export): You can choose to train, test, or export the model in ONNX format
Logging
- tensorboard (bool): Whether to use TensorBoard for logging
- wandb (bool): Whether to use Weights and Biases (wandb) for logging
- gradcam (bool): enable GradCam for gradient flow inspection
Dataset and Data loading
- num_workers (int): Number of threads used by the dataloader
- persistent_workers (bool): torchvision dataloader persistent workers
- dataset (AIDER/AIDERV2): Specifies the dataset, and in particular the dataloader, to be used
- data_path (path): The path where the actual dataset is stored on your docker container (e.g.
/home/user/Data/AIDER
) - num_classes (int): Number of output classes of the model. AIDER has 5 classes while AIDERV2 has 4 classes of different images.
- img_height (int): Images are resized by default, this sets the height.
- img_width (int): Images are resized by default, this sets the width.
- augment (bool): Enables or disables data augmentation
- k_fold (int): Number of folds for k-fold cross-validation. Works only on AIDER since AIDERV2 has its own stand-alone validation set.
- split (proportional/exact): Defines how to split the AIDER dataset.
proportional
follows the same proportions used in the EmergencyNet paper, whileexact
creates a test set of equal size to the one used in the latter. - no_validation (bool): If set to false, does not create a validation set for AIDER
Pytorch Lightning Precision
- lightning_precision (16-mixed/32-true): 16-bit floating point mixed precision or 32-bit floating point precision
Model settings
- network (str): 'TakuNet' is the only available model
- input_channels (int): Number of channels of the input images, default is 3 for RGB
- dense (bool): Enables or disables dense connections in TakuNet
- ckpts_path (str): Path of the checkpoints to be used in inference (filename included)
Optimization parameters
- optimizer (str): Which optimizer to be used in training (available:
adam
,adamw
,sgd
,rmsprop
) - scheduler (str): Learning rate schedulers used in training (available:
cosine
,cyclic
,step
,lambda
). These are set insrc/networks/LightningNet.py
- scheduler_per_epoch (bool): Update the learning rate at the end of each epoch
- learning_rate (float): Initial learning rate
- learning_rate_decay (float): Decay used by schedulers
- learning_rate_decay_steps (float): Decay steps used by schedulers
- min_learning_rate (float): Minimum learning rate value
- warmup_epochs (int): Learning rate warmup epochs
- warmup_steps (int): Learning rate warmup steps
- weight_decay (float): Weight decay used by the optimizer
- weight_decay_end (float): Uses the same scheduler as learning rate, thus this set the min value
- update_freq (int): Update frequency for training steps
- label_smoothing (float [0,1]): Sets label smoothing for cross-entropy loss
- model_ema (bool): Whether to use model exponential moving average
- alpha (float): Alpha value for RMSprop
- momentum (float): Momentum value for RMSprop and SGD
- class_weights (list of float): Class weights to be used in cross-entropy loss
Export
- onnx_opset_version (int): set onnx opset version for exported model
Embedded device inference scripts are located in the embedded
folder, and require a proper configuration for each specific target device. The main configuration file is located in embedded/configs/TakuNet.yml
.
cd src
python3 embedded/main.py --cfg-path embdedded/configs/TakuNet.yml
Inference configuration parameters ⚙️
The inference script has to adapt based on the target execution device. Thus you need to properly set a few parameters before launching the main script.- onnx_model_path: where the exported onnx file is located
- engine_model_path: where to store the TensorRT engine
- use_tensorrt: wether to enable TensorRT, to be used only on Jetson devices (set to false on Raspberry Pi)
- fp16_mode: true if your ONNX model is half-precision, else false if it has been exported with float-32 precision
- dataset_size: test are conducted on randomly generated images (Torchvision FakeData) since we only want to measure inference speed. We set a number of images equal to 2600 and we drop the first 100 to compensate for the warm-up time
- img_size: specifies the shape of the image (AIDER and AIDERV2 have different image shape)
- num_classes: must be the same number of classes used during model training
- batch_size: size of the batch of images to be processed in parallel (default 1)
- old_jetpack: This option allows the model to be optimized using TensorRT on older Jetson devices. Since this process is not straightforward, it may still encounter issues or errors. However, we encourage you to try it and report any issues you encounter.
TensorRT Export
To properly export a model exploiting TensorRT optimization, you need to set use_tensorrt: true
in embedded/config/TakuNet.yml
. The optimization should take place on the hardware device and requires onnx checkpoints to be already exported.
You may face some issues when trying to compress the model through TensorRT on older Jetson devices such as NVIDIA Jetson Nano (Maxwell) or NVIDIA Jetson TX1. In such cases, we suggest to lower the ONNX opset version, and set old_jetpack: true
during inference.
Performance on Edge Devices
Embedded devices require a stable input voltage to operate effectively. Improper use of power supplies, including unsuitable cables, may result in degraded and unstable performance. In some cases, such misuse could potentially cause permanent damage to the devices.To maximize the performance of embedded devices, it is recommended to stop any application or service that may interfere with their operation. These can introduce unnecessary overhead or cause resource contention, potentially impacting the efficiency and responsiveness of the devices.
For optimal performance with TakuNet, we recommend performing a fresh OS installation. Furthermore, active termal cooling should be installed (if not already present) to avoid thermal throttling.
If you find this code useful for your research, please consider citing:
WARNING! The paper has been accepted at WACVW 2025, but the official proceeding paper is not available yet! The following is an example of how the bibtex citation would look like. Do not use it, wait for the official one, the conference will start the 28th february 2025.
@InProceedings{Rossi_2025_WACV,
author = {Rossi, Daniel and Borghi, Guido and Vezzani, Roberto},
title = {TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},
month = {January},
year = {2025},
pages = {}
}
This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).
- Attribution (BY): You must give appropriate credit to the original author(s), provide a link to the license, and indicate if changes were made.
- NonCommercial (NC): This work may not be used for commercial purposes.
- ShareAlike (SA): If you remix, transform, or build upon this work, you must distribute your contributions under the same license as the original.
For the full legal text of the license, please refer to https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.
If you are interested in using this work for commercial purposes, please contact us.