SIGTERM Handling #21

kozlov721 · 2024-04-11T16:58:38Z

Added handling of the SIGTERM signal. The current state of the training is saved and can be later resumed using the --resume flag.

github-actions · 2024-04-11T17:12:04Z

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines	Covered	Coverage	Threshold	Status
4675	3751	80%	0%	🟢

New Files

No new covered files...

Modified Files

File	Coverage	Status
luxonis_train/main.py	50%	🟢
luxonis_train/callbacks/luxonis_progress_bar.py	88%	🟢
luxonis_train/core/trainer.py	61%	🟢
TOTAL	66%	🟢

updated for commit: 1659d57 by action🐍

github-actions · 2024-04-11T17:16:32Z

Test Results

6 files 6 suites 1h 3m 56s ⏱️
57 tests 57 ✅ 0 💤 0 ❌
342 runs 342 ✅ 0 💤 0 ❌

Results for commit 1659d57.

conorsim · 2024-04-14T22:57:35Z

luxonis_train/core/trainer.py

@@ -56,7 +90,7 @@ def _upload_logs(self) -> None:

    def _trainer_fit(self, *args, **kwargs):
        try:
-            self.pl_trainer.fit(*args, **kwargs)
+            self.pl_trainer.fit(*args, ckpt_path=self.resume, **kwargs)


Unimportant for this PR, but when we resume from checkpoint, should we consider also resuming the state of some hyperparameters? I'm not sure if that is already abstracted and/or a feature of pytorch lightning.

Yes, we should probably store all states (optimizer, scheduler, ...) so we can truly continue with the training.

Yeah, pytorch lightning does this automatically.

* handling SIGTERM signal * resume argument takes path

kozlov721 added 2 commits April 11, 2024 18:39

handling SIGTERM signal

d5854f3

resume argument takes path

1659d57

kozlov721 added the enhancement New feature or request label Apr 11, 2024

kozlov721 requested review from tersekmatija and conorsim April 11, 2024 16:58

kozlov721 self-assigned this Apr 11, 2024

conorsim approved these changes Apr 14, 2024

View reviewed changes

kozlov721 merged commit f425fdb into dev Apr 15, 2024
10 checks passed

kozlov721 deleted the feature/system-signals-handling branch April 15, 2024 18:22

kozlov721 added a commit that referenced this pull request Oct 9, 2024

SIGTERM Handling (#21)

d0740d0

* handling SIGTERM signal * resume argument takes path

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIGTERM Handling #21

SIGTERM Handling #21

kozlov721 commented Apr 11, 2024

github-actions bot commented Apr 11, 2024

github-actions bot commented Apr 11, 2024

conorsim Apr 14, 2024

tersekmatija Apr 15, 2024

kozlov721 Apr 15, 2024

SIGTERM Handling #21

SIGTERM Handling #21

Conversation

kozlov721 commented Apr 11, 2024

github-actions bot commented Apr 11, 2024

☂️ Python Coverage

Overall Coverage

New Files

Modified Files

github-actions bot commented Apr 11, 2024

Test Results

conorsim Apr 14, 2024

Choose a reason for hiding this comment

tersekmatija Apr 15, 2024

Choose a reason for hiding this comment

kozlov721 Apr 15, 2024

Choose a reason for hiding this comment