TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios

Daniel Rossi, Guido Borghi, Roberto Vezzani

University of Modena and Reggio Emilia, Italy

Paper

Table of Contents 🔑

Introduction
Installation
Usage
Additional Information
Citation
License

Introduction 🎙

TakuNet is a convolutional architecture designed to be extremely efficient when deployed on embedded systems. Extensive experiments on AIDER and AIDERV2 demostrate that TakuNet is able to achieve near-state-of-the-art accuracy while being extremely efficient in terms of number of parameters, memory footprint and FLOPs among the competitors.

A deeper inspection on models performance on a few embedded devices such as Raspberry Pis and NVIDIA Jetson Orin Nano show that TakuNet can achieve a very large speedup in terms of Frame per Second against competitors, mostly on recent embedded architectures. Since TakuNet is trained with float-16 resolution, its optimization through TensorRT on NVIDIA hardware accelerator does not approximate the model weights.

Installation ⌨️

TakuNet code exploits docker container to simplify code distribution and execution on different devices and hardware architectures. If you have already installed docker on your machine, you can skip the docker setup step.

This work was developed on a Ubuntu-24.04.1-LTS-based system with NVIDIA Drivers 560.35.03, equipped with an Intel i5 8600K, 16GB DDR4 2666MHz, NVIDIA RTX 3090 24GB. Training were performed on a different machine, composed of an Intel i7 12700F and NVIDIA RTX 4070ti Super. On the other hand, experiments on Raspberry Pi(s) were conducted through the docker container running on Raspbian Bookworm, while NVIDIA Jetpack 6.1 was installed on the Jetson Orin Nano Devkit device.

Docker setup 🚢

Install docker on a Linux based machine possibly

wget https://get.docker.com/ -O get_docker.sh
chmod +x get_docker.sh
bash get_docker.sh

Once docker has been installed, install nvidia-docker for GPU support

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

update and install the nvidia container tool

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

configure nvidia container toolkit

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Add required permissions to your user in order to perform actions with docker on containers
```
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker
```

Repository setup 📂

Clone the repository

git clone https://github.com/DanielRossi1/TakuNet.git

Build the docker container
```
cd TakuNet/docker
./build.sh
```

Once the container has finished to build, run it. the run script supports directory mount through arguments. Directories are mounted in /home/user/...

# Just run the container
./run

# run the container and mount a directory (e.g. the one which contains the dataset). Here you will find AIDER in /home/user/AIDER
./run -d /home/your-username/path-to-data/AIDER

Usage 🧰

The execution interface is really simple, it consists of a bash script which launches the main.py script, automatically loading the arguments and configurations specified in TakuNet's configuration file: configs/TakuNet.yml.

cd src
./launch.sh TakuNet

Configuration file parameters ⚙️:

Base Settings

num_epochs (int): Total number of training epochs
batch_size (int): Batch size used for training
seed (int): Random seed set for training and testing
experiment_name (str): Name of the folder that will be created for the training, or sourced for testing. It will be created in src/runs/. Multiple runs over the same experiment name will overwrite logs.
main_runs_folder (path): Here you specify the train and test output path
pin_memory (bool): Torchvision dataloader pin memory
mode (Train/Test/Export): You can choose to train, test, or export the model in ONNX format

Logging

tensorboard (bool): Whether to use TensorBoard for logging
wandb (bool): Whether to use Weights and Biases (wandb) for logging
gradcam (bool): enable GradCam for gradient flow inspection

Dataset and Data loading

num_workers (int): Number of threads used by the dataloader
persistent_workers (bool): torchvision dataloader persistent workers
dataset (AIDER/AIDERV2): Specifies the dataset, and in particular the dataloader, to be used
data_path (path): The path where the actual dataset is stored on your docker container (e.g. /home/user/Data/AIDER)
num_classes (int): Number of output classes of the model. AIDER has 5 classes while AIDERV2 has 4 classes of different images.
img_height (int): Images are resized by default, this sets the height.
img_width (int): Images are resized by default, this sets the width.
augment (bool): Enables or disables data augmentation
k_fold (int): Number of folds for k-fold cross-validation. Works only on AIDER since AIDERV2 has its own stand-alone validation set.
split (proportional/exact): Defines how to split the AIDER dataset. proportional follows the same proportions used in the EmergencyNet paper, while exact creates a test set of equal size to the one used in the latter.
no_validation (bool): If set to false, does not create a validation set for AIDER

Pytorch Lightning Precision

lightning_precision (16-mixed/32-true): 16-bit floating point mixed precision or 32-bit floating point precision

Model settings

network (str): 'TakuNet' is the only available model
input_channels (int): Number of channels of the input images, default is 3 for RGB
dense (bool): Enables or disables dense connections in TakuNet
ckpts_path (str): Path of the checkpoints to be used in inference (filename included)

Optimization parameters

optimizer (str): Which optimizer to be used in training (available: adam, adamw, sgd, rmsprop)
scheduler (str): Learning rate schedulers used in training (available: cosine, cyclic, step, lambda). These are set in src/networks/LightningNet.py
scheduler_per_epoch (bool): Update the learning rate at the end of each epoch
learning_rate (float): Initial learning rate
learning_rate_decay (float): Decay used by schedulers
learning_rate_decay_steps (float): Decay steps used by schedulers
min_learning_rate (float): Minimum learning rate value
warmup_epochs (int): Learning rate warmup epochs
warmup_steps (int): Learning rate warmup steps
weight_decay (float): Weight decay used by the optimizer
weight_decay_end (float): Uses the same scheduler as learning rate, thus this set the min value
update_freq (int): Update frequency for training steps
label_smoothing (float [0,1]): Sets label smoothing for cross-entropy loss
model_ema (bool): Whether to use model exponential moving average
alpha (float): Alpha value for RMSprop
momentum (float): Momentum value for RMSprop and SGD
class_weights (list of float): Class weights to be used in cross-entropy loss

Export

onnx_opset_version (int): set onnx opset version for exported model

Inference on Edge Devices 🔋

Embedded device inference scripts are located in the embedded folder, and require a proper configuration for each specific target device. The main configuration file is located in embedded/configs/TakuNet.yml.

cd src
python3 embedded/main.py --cfg-path embdedded/configs/TakuNet.yml

Inference configuration parameters ⚙️

The inference script has to adapt based on the target execution device. Thus you need to properly set a few parameters before launching the main script.

onnx_model_path: where the exported onnx file is located
engine_model_path: where to store the TensorRT engine
use_tensorrt: wether to enable TensorRT, to be used only on Jetson devices (set to false on Raspberry Pi)
fp16_mode: true if your ONNX model is half-precision, else false if it has been exported with float-32 precision
dataset_size: test are conducted on randomly generated images (Torchvision FakeData) since we only want to measure inference speed. We set a number of images equal to 2600 and we drop the first 100 to compensate for the warm-up time
img_size: specifies the shape of the image (AIDER and AIDERV2 have different image shape)
num_classes: must be the same number of classes used during model training
batch_size: size of the batch of images to be processed in parallel (default 1)
old_jetpack: This option allows the model to be optimized using TensorRT on older Jetson devices. Since this process is not straightforward, it may still encounter issues or errors. However, we encourage you to try it and report any issues you encounter.

Additional Information 🔍

TensorRT Export

To properly export a model exploiting TensorRT optimization, you need to set use_tensorrt: true in embedded/config/TakuNet.yml. The optimization should take place on the hardware device and requires onnx checkpoints to be already exported.

You may face some issues when trying to compress the model through TensorRT on older Jetson devices such as NVIDIA Jetson Nano (Maxwell) or NVIDIA Jetson TX1. In such cases, we suggest to lower the ONNX opset version, and set old_jetpack: true during inference.

Performance on Edge Devices

Embedded devices require a stable input voltage to operate effectively. Improper use of power supplies, including unsuitable cables, may result in degraded and unstable performance. In some cases, such misuse could potentially cause permanent damage to the devices.

To maximize the performance of embedded devices, it is recommended to stop any application or service that may interfere with their operation. These can introduce unnecessary overhead or cause resource contention, potentially impacting the efficiency and responsiveness of the devices.

For optimal performance with TakuNet, we recommend performing a fresh OS installation. Furthermore, active termal cooling should be installed (if not already present) to avoid thermal throttling.

Citation 📝

If you find this code useful for your research, please consider citing:

WARNING! The paper has been accepted at WACVW 2025, but the official proceeding paper is not available yet! The following is an example of how the bibtex citation would look like. Do not use it, wait for the official one, the conference will start the 28th february 2025.

@InProceedings{Rossi_2025_WACV,
    author    = {Rossi, Daniel and Borghi, Guido and Vezzani, Roberto},
    title     = {TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},
    month     = {January},
    year      = {2025},
    pages     = {}
}

License 📜

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

Summary of Terms

Attribution (BY): You must give appropriate credit to the original author(s), provide a link to the license, and indicate if changes were made.
NonCommercial (NC): This work may not be used for commercial purposes.
ShareAlike (SA): If you remix, transform, or build upon this work, you must distribute your contributions under the same license as the original.

For the full legal text of the license, please refer to https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.

Commercial Use

If you are interested in using this work for commercial purposes, please contact us.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docker		docker
media		media
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios

Paper

Table of Contents 🔑

Introduction 🎙

Installation ⌨️

Docker setup 🚢

Repository setup 📂

Usage 🧰

Configuration file parameters ⚙️:

Inference on Edge Devices 🔋

Additional Information 🔍

Citation 📝

License 📜

Summary of Terms

Commercial Use

About

Releases

Packages

Languages

DanielRossi1/TakuNet

Folders and files

Latest commit

History

Repository files navigation

TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios

Paper

Table of Contents 🔑

Introduction 🎙

Installation ⌨️

Docker setup 🚢

Repository setup 📂

Usage 🧰

Configuration file parameters ⚙️:

Inference on Edge Devices 🔋

Additional Information 🔍

Citation 📝

License 📜

Summary of Terms

Commercial Use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages