Skip to content

Commit

Permalink
Tensor Core Float16 Precision (#24)
Browse files Browse the repository at this point in the history
* option to set torch matmul precision for tensor cores

* updated readme
  • Loading branch information
kozlov721 authored May 8, 2024
1 parent ca57063 commit d1d71f0
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 17 deletions.
35 changes: 18 additions & 17 deletions configs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,23 +142,24 @@ To store and load the data we use LuxonisDataset and LuxonisLoader. For specific

Here you can change everything related to actual training of the model.

| Key | Type | Default value | Description |
| ----------------------- | --------------------------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| batch_size | int | 32 | batch size used for training |
| accumulate_grad_batches | int | 1 | number of batches for gradient accumulation |
| use_weighted_sampler | bool | False | bool if use WeightedRandomSampler for training, only works with classification tasks |
| epochs | int | 100 | number of training epochs |
| num_workers | int | 2 | number of workers for data loading |
| train_metrics_interval | int | -1 | frequency of computing metrics on train data, -1 if don't perform |
| validation_interval | int | 1 | frequency of computing metrics on validation data |
| num_log_images | int | 4 | maximum number of images to visualize and log |
| skip_last_batch | bool | True | whether to skip last batch while training |
| accelerator | Literal\["auto", "cpu", "gpu"\] | "auto" | What accelerator to use for training. |
| devices | int \| list\[int\] \| str | "auto" | Either specify how many devices to use (int), list specific devices, or use "auto" for automatic configuration based on the selected accelerator |
| strategy | Literal\["auto", "ddp"\] | "auto" | What strategy to use for training. |
| num_sanity_val_steps | int | 2 | Number of sanity validation steps performed before training. |
| profiler | Literal\["simple", "advanced"\] \| None | None | PL profiler for GPU/CPU/RAM utilization analysis |
| verbose | bool | True | Print all intermediate results to console. |
| Key | Type | Default value | Description |
| ----------------------- | ---------------------------------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| batch_size | int | 32 | batch size used for training |
| accumulate_grad_batches | int | 1 | number of batches for gradient accumulation |
| use_weighted_sampler | bool | False | bool if use WeightedRandomSampler for training, only works with classification tasks |
| epochs | int | 100 | number of training epochs |
| num_workers | int | 2 | number of workers for data loading |
| train_metrics_interval | int | -1 | frequency of computing metrics on train data, -1 if don't perform |
| validation_interval | int | 1 | frequency of computing metrics on validation data |
| num_log_images | int | 4 | maximum number of images to visualize and log |
| skip_last_batch | bool | True | whether to skip last batch while training |
| accelerator | Literal\["auto", "cpu", "gpu"\] | "auto" | What accelerator to use for training. |
| devices | int \| list\[int\] \| str | "auto" | Either specify how many devices to use (int), list specific devices, or use "auto" for automatic configuration based on the selected accelerator |
| matmul_precision | Literal\["medium", "high", "highest"\] \| None | None | Sets the internal precision of float32 matrix multiplications. |
| strategy | Literal\["auto", "ddp"\] | "auto" | What strategy to use for training. |
| num_sanity_val_steps | int | 2 | Number of sanity validation steps performed before training. |
| profiler | Literal\["simple", "advanced"\] \| None | None | PL profiler for GPU/CPU/RAM utilization analysis |
| verbose | bool | True | Print all intermediate results to console. |

### Preprocessing

Expand Down
4 changes: 4 additions & 0 deletions luxonis_train/core/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from logging import getLogger
from typing import Any, Literal

import torch
from lightning.pytorch.utilities import rank_zero_only # type: ignore
from luxonis_ml.utils import LuxonisFileSystem

Expand Down Expand Up @@ -39,6 +40,9 @@ def __init__(
"""
super().__init__(cfg, opts)

if self.cfg.trainer.matmul_precision is not None:
torch.set_float32_matmul_precision(self.cfg.trainer.matmul_precision)

if resume is not None:
self.resume = str(LuxonisFileSystem.download(resume, self.run_save_dir))
else:
Expand Down
1 change: 1 addition & 0 deletions luxonis_train/utils/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ class TrainerConfig(CustomBaseModel):
strategy: Literal["auto", "ddp"] = "auto"
num_sanity_val_steps: int = 2
profiler: Literal["simple", "advanced"] | None = None
matmul_precision: Literal["medium", "high", "highest"] | None = None
verbose: bool = True

batch_size: int = 32
Expand Down

0 comments on commit d1d71f0

Please sign in to comment.