Skip to content

DanielRossi1/TakuNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios

Daniel Rossi, Guido Borghi, Roberto Vezzani

University of Modena and Reggio Emilia, Italy

TakuNet Architecture

Table of Contents 🔑

  1. Introduction
  2. Installation
  3. Usage
  4. Additional Information
  5. Citation
  6. License

Introduction 🎙

TakuNet is a convolutional architecture designed to be extremely efficient when deployed on embedded systems. Extensive experiments on AIDER and AIDERV2 demostrate that TakuNet is able to achieve near-state-of-the-art accuracy while being extremely efficient in terms of number of parameters, memory footprint and FLOPs among the competitors.

Model performance evaluation on embedded platforms. The fps were calculated by taking the mean latency value over multiple runs of the model with batch size 1. TakuNet’s fps on Jetson Orin are obtained after TensorRT optimization.

A deeper inspection on models performance on a few embedded devices such as Raspberry Pis and NVIDIA Jetson Orin Nano show that TakuNet can achieve a very large speedup in terms of Frame per Second against competitors, mostly on recent embedded architectures. Since TakuNet is trained with float-16 resolution, its optimization through TensorRT on NVIDIA hardware accelerator does not approximate the model weights.

Installation ⌨️

TakuNet code exploits docker container to simplify code distribution and execution on different devices and hardware architectures. If you have already installed docker on your machine, you can skip the docker setup step.

This work was developed on a Ubuntu-24.04.1-LTS-based system with NVIDIA Drivers 560.35.03, equipped with an Intel i5 8600K, 16GB DDR4 2666MHz, NVIDIA RTX 3090 24GB. Training were performed on a different machine, composed of an Intel i7 12700F and NVIDIA RTX 4070ti Super. On the other hand, experiments on Raspberry Pi(s) were conducted through the docker container running on Raspbian Bookworm, while NVIDIA Jetpack 6.1 was installed on the Jetson Orin Nano Devkit device.

Docker setup 🚢

  1. Install docker on a Linux based machine possibly
    wget https://get.docker.com/ -O get_docker.sh
    chmod +x get_docker.sh
    bash get_docker.sh
    
  2. Once docker has been installed, install nvidia-docker for GPU support
    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
    && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
    
    update and install the nvidia container tool
    sudo apt-get update
    sudo apt-get install -y nvidia-container-toolkit
    
    configure nvidia container toolkit
    sudo nvidia-ctk runtime configure --runtime=docker
    sudo systemctl restart docker
    
  3. Add required permissions to your user in order to perform actions with docker on containers
    sudo groupadd docker
    sudo usermod -aG docker $USER
    newgrp docker
    

Repository setup 📂

  1. Clone the repository
    git clone https://github.com/DanielRossi1/TakuNet.git
    
  2. Build the docker container
    cd TakuNet/docker
    ./build.sh
    
  3. Once the container has finished to build, run it. the run script supports directory mount through arguments. Directories are mounted in /home/user/...
    # Just run the container
    ./run
    
    # run the container and mount a directory (e.g. the one which contains the dataset). Here you will find AIDER in /home/user/AIDER
    ./run -d /home/your-username/path-to-data/AIDER
    

Usage 🧰

The execution interface is really simple, it consists of a bash script which launches the main.py script, automatically loading the arguments and configurations specified in TakuNet's configuration file: configs/TakuNet.yml.

cd src
./launch.sh TakuNet

Configuration file parameters ⚙️:

Base Settings
  • num_epochs (int): Total number of training epochs
  • batch_size (int): Batch size used for training
  • seed (int): Random seed set for training and testing
  • experiment_name (str): Name of the folder that will be created for the training, or sourced for testing. It will be created in src/runs/. Multiple runs over the same experiment name will overwrite logs.
  • main_runs_folder (path): Here you specify the train and test output path
  • pin_memory (bool): Torchvision dataloader pin memory
  • mode (Train/Test/Export): You can choose to train, test, or export the model in ONNX format
Logging
  • tensorboard (bool): Whether to use TensorBoard for logging
  • wandb (bool): Whether to use Weights and Biases (wandb) for logging
  • gradcam (bool): enable GradCam for gradient flow inspection
Dataset and Data loading
  • num_workers (int): Number of threads used by the dataloader
  • persistent_workers (bool): torchvision dataloader persistent workers
  • dataset (AIDER/AIDERV2): Specifies the dataset, and in particular the dataloader, to be used
  • data_path (path): The path where the actual dataset is stored on your docker container (e.g. /home/user/Data/AIDER)
  • num_classes (int): Number of output classes of the model. AIDER has 5 classes while AIDERV2 has 4 classes of different images.
  • img_height (int): Images are resized by default, this sets the height.
  • img_width (int): Images are resized by default, this sets the width.
  • augment (bool): Enables or disables data augmentation
  • k_fold (int): Number of folds for k-fold cross-validation. Works only on AIDER since AIDERV2 has its own stand-alone validation set.
  • split (proportional/exact): Defines how to split the AIDER dataset. proportional follows the same proportions used in the EmergencyNet paper, while exact creates a test set of equal size to the one used in the latter.
  • no_validation (bool): If set to false, does not create a validation set for AIDER
Pytorch Lightning Precision
  • lightning_precision (16-mixed/32-true): 16-bit floating point mixed precision or 32-bit floating point precision
Model settings
  • network (str): 'TakuNet' is the only available model
  • input_channels (int): Number of channels of the input images, default is 3 for RGB
  • dense (bool): Enables or disables dense connections in TakuNet
  • ckpts_path (str): Path of the checkpoints to be used in inference (filename included)
Optimization parameters
  • optimizer (str): Which optimizer to be used in training (available: adam, adamw, sgd, rmsprop)
  • scheduler (str): Learning rate schedulers used in training (available: cosine, cyclic, step, lambda). These are set in src/networks/LightningNet.py
  • scheduler_per_epoch (bool): Update the learning rate at the end of each epoch
  • learning_rate (float): Initial learning rate
  • learning_rate_decay (float): Decay used by schedulers
  • learning_rate_decay_steps (float): Decay steps used by schedulers
  • min_learning_rate (float): Minimum learning rate value
  • warmup_epochs (int): Learning rate warmup epochs
  • warmup_steps (int): Learning rate warmup steps
  • weight_decay (float): Weight decay used by the optimizer
  • weight_decay_end (float): Uses the same scheduler as learning rate, thus this set the min value
  • update_freq (int): Update frequency for training steps
  • label_smoothing (float [0,1]): Sets label smoothing for cross-entropy loss
  • model_ema (bool): Whether to use model exponential moving average
  • alpha (float): Alpha value for RMSprop
  • momentum (float): Momentum value for RMSprop and SGD
  • class_weights (list of float): Class weights to be used in cross-entropy loss
Export
  • onnx_opset_version (int): set onnx opset version for exported model

Inference on Edge Devices 🔋

Embedded device inference scripts are located in the embedded folder, and require a proper configuration for each specific target device. The main configuration file is located in embedded/configs/TakuNet.yml.

cd src
python3 embedded/main.py --cfg-path embdedded/configs/TakuNet.yml
Inference configuration parameters ⚙️ The inference script has to adapt based on the target execution device. Thus you need to properly set a few parameters before launching the main script.
  • onnx_model_path: where the exported onnx file is located
  • engine_model_path: where to store the TensorRT engine
  • use_tensorrt: wether to enable TensorRT, to be used only on Jetson devices (set to false on Raspberry Pi)
  • fp16_mode: true if your ONNX model is half-precision, else false if it has been exported with float-32 precision
  • dataset_size: test are conducted on randomly generated images (Torchvision FakeData) since we only want to measure inference speed. We set a number of images equal to 2600 and we drop the first 100 to compensate for the warm-up time
  • img_size: specifies the shape of the image (AIDER and AIDERV2 have different image shape)
  • num_classes: must be the same number of classes used during model training
  • batch_size: size of the batch of images to be processed in parallel (default 1)
  • old_jetpack: This option allows the model to be optimized using TensorRT on older Jetson devices. Since this process is not straightforward, it may still encounter issues or errors. However, we encourage you to try it and report any issues you encounter.

Additional Information 🔍

TensorRT Export

To properly export a model exploiting TensorRT optimization, you need to set use_tensorrt: true in embedded/config/TakuNet.yml. The optimization should take place on the hardware device and requires onnx checkpoints to be already exported.

You may face some issues when trying to compress the model through TensorRT on older Jetson devices such as NVIDIA Jetson Nano (Maxwell) or NVIDIA Jetson TX1. In such cases, we suggest to lower the ONNX opset version, and set old_jetpack: true during inference.

Performance on Edge Devices Embedded devices require a stable input voltage to operate effectively. Improper use of power supplies, including unsuitable cables, may result in degraded and unstable performance. In some cases, such misuse could potentially cause permanent damage to the devices.

To maximize the performance of embedded devices, it is recommended to stop any application or service that may interfere with their operation. These can introduce unnecessary overhead or cause resource contention, potentially impacting the efficiency and responsiveness of the devices.

For optimal performance with TakuNet, we recommend performing a fresh OS installation. Furthermore, active termal cooling should be installed (if not already present) to avoid thermal throttling.

Citation 📝

If you find this code useful for your research, please consider citing:

WARNING! The paper has been accepted at WACVW 2025, but the official proceeding paper is not available yet! The following is an example of how the bibtex citation would look like. Do not use it, wait for the official one, the conference will start the 28th february 2025.

@InProceedings{Rossi_2025_WACV,
    author    = {Rossi, Daniel and Borghi, Guido and Vezzani, Roberto},
    title     = {TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},
    month     = {January},
    year      = {2025},
    pages     = {}
}

License 📜

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

Summary of Terms

  • Attribution (BY): You must give appropriate credit to the original author(s), provide a link to the license, and indicate if changes were made.
  • NonCommercial (NC): This work may not be used for commercial purposes.
  • ShareAlike (SA): If you remix, transform, or build upon this work, you must distribute your contributions under the same license as the original.

For the full legal text of the license, please refer to https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.

Commercial Use

If you are interested in using this work for commercial purposes, please contact us.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published