Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/initsteps race condition #8145

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Conversation

ak88
Copy link
Contributor

@ak88 ak88 commented Jan 31, 2025

Fixes #8122

Changes

When a IStep fails it can cause the main process to hang in an Autoreset Event.
I have refactored the flow in EthereumStepsManager to be based on awaitable tasks instead, avoiding any race conditions and greatly simplyfying it.

  • List the changes
  • Steps and their dependencies are handled with Task instead of reset event.

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: Description

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

Optional. Remove if not applicable.

Documentation

Requires documentation update

  • Yes
  • No

If yes, link the PR to the docs update or the issue with the details labeled docs. Remove if not applicable.

Requires explanation in Release Notes

  • Yes
  • No

If yes, fill in the details here. Remove if not applicable.

Remarks

Optional. Remove if not applicable.

@ak88 ak88 marked this pull request as ready for review January 31, 2025 12:38
@LukaszRozmej LukaszRozmej requested a review from asdacap January 31, 2025 15:06
Copy link
Member

@LukaszRozmej LukaszRozmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like going to task, I think we can avoid changing IStep interface and make InitStep wrapper (can be private class of StepManager) rather than base class.

src/Nethermind/Nethermind.Init/Steps/InitStep.cs Outdated Show resolved Hide resolved
src/Nethermind/Nethermind.Init/Steps/InitStep.cs Outdated Show resolved Hide resolved
src/Nethermind/Nethermind.Init/Steps/InitStep.cs Outdated Show resolved Hide resolved
}
catch
{
_taskCompletedSource.SetCanceled();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be set exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would create a cascade effect where all depending steps would write an error to the log.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comment about why it is that way then?

_allPending.Enqueue(task);
}
else
createdSteps.Add(step.GetType(), new StepWrapper(step));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dependency is mapped by StepBaseType

@@ -153,7 +153,7 @@ public StepA(NethermindApi runnerContext)
}
}

[RunnerStepDependencies(typeof(StepC))]
[RunnerStepDependencies(typeof(StepCStandard))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the previous test case is correct, the dependency is declared by base type not the subtype. Its a subtle thing that plugins rely on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see. If that is the intended behavior there can be an issue if steps are inhering from another step like InitDatabaseSnapshot and something then uses that as a dependency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many scary issue that I don't want to bring in my sleep.

Copy link
Member

@LukaszRozmej LukaszRozmej Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my original implementation, that wasn't an issue, steps were grouped by base step for dependency resolution

that is why _allStepsByBaseType existed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sometimes client hangs in startup, when a required plugin fails to load.
3 participants