Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGTERM Handling #21

Merged
merged 2 commits into from
Apr 15, 2024
Merged

SIGTERM Handling #21

merged 2 commits into from
Apr 15, 2024

Conversation

kozlov721
Copy link
Collaborator

Added handling of the SIGTERM signal. The current state of the training is saved and can be later resumed using the --resume flag.

@kozlov721 kozlov721 added the enhancement New feature or request label Apr 11, 2024
@kozlov721 kozlov721 self-assigned this Apr 11, 2024
Copy link

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
4675 3751 80% 0% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
luxonis_train/main.py 50% 🟢
luxonis_train/callbacks/luxonis_progress_bar.py 88% 🟢
luxonis_train/core/trainer.py 61% 🟢
TOTAL 66% 🟢

updated for commit: 1659d57 by action🐍

Copy link

Test Results

  6 files    6 suites   1h 3m 56s ⏱️
 57 tests  57 ✅ 0 💤 0 ❌
342 runs  342 ✅ 0 💤 0 ❌

Results for commit 1659d57.

@@ -56,7 +90,7 @@ def _upload_logs(self) -> None:

def _trainer_fit(self, *args, **kwargs):
try:
self.pl_trainer.fit(*args, **kwargs)
self.pl_trainer.fit(*args, ckpt_path=self.resume, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unimportant for this PR, but when we resume from checkpoint, should we consider also resuming the state of some hyperparameters? I'm not sure if that is already abstracted and/or a feature of pytorch lightning.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should probably store all states (optimizer, scheduler, ...) so we can truly continue with the training.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, pytorch lightning does this automatically.

@kozlov721 kozlov721 merged commit f425fdb into dev Apr 15, 2024
10 checks passed
@kozlov721 kozlov721 deleted the feature/system-signals-handling branch April 15, 2024 18:22
kozlov721 added a commit that referenced this pull request Oct 9, 2024
* handling SIGTERM signal

* resume argument takes path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants