-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul service lifecycle for Structured Concurrency #130
Conversation
695cde5
to
5d4f7f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm extremely excited about this and a big +1. Thanks for driving this @FranzBusch! However there are some details that we should discuss:
thanks for putting together @FranzBusch looks very exciting and welcome advancement. will take a deeper look in a week or so, but please continue to iterate with the team until then |
@swift-server-bot add to allowlist |
.package(url: "https://github.com/apple/swift-metrics.git", "1.0.0" ..< "3.0.0"), | ||
.package(url: "https://github.com/swift-server/swift-backtrace.git", from: "1.1.1"), | ||
.package(url: "https://github.com/apple/swift-nio.git", from: "2.0.0"), // used in tests | ||
.package(url: "https://github.com/apple/swift-docc-plugin", from: "1.0.0"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emitting metric on application start and shutdown has been very valuable in real world production services, the canonical example is a monitoring this to detect an a spike in application restart because of crashes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I agree. But ServiceLifecycle created the weird situation in the past where ServiceLifecycle depended on a MetricSystem that adopters wanted to setup with ServiceLifecycle. Basically the first use occurred even before the system was actually ready to create Counters. A good example for this are MetricSystems that are sending their data to a server themselves.
Because of this, I'm a big fan of dropping the swift-metrics dependency here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see your point. Maybe the solution is for this library to take an abstract handler that can emit the specific metrics, or even something like a delegate so that the application / service can do the bridging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I agree. But ServiceLifecycle created the weird situation in the past where ServiceLifecycle depended on a MetricSystem that adopters wanted to setup with ServiceLifecycle. Basically the first use occurred even before the system was actually ready to create Counters. A good example for this are MetricSystems that are sending their data to a server themselves.
I don't think this applies anymore and we can safely take the dependency on swift-metrics
. A potential metric backend should expose itself as a Service
with a run()
method. Furthermore, it should handle receiving metrics before the run()
method is called and buffer them.
So the setup that I would recommend here is the following
// In Bootstrap.swift
import FooMetricsBackend
@main
struct Bootstrap {
static func main() async throws {
let fooMetricsBackend = FooMetricsBackend()
MetricsSystem.bootstrap(fooMetricsBackend)
let someService = SomeService()
let serviceRunner = ServiceRunner(services: [fooMetricsBackend, someService], configuration: ...)
try await serviceRunner.run()
}
}
The thing we should discuss is if we want to provide these metrics out of the box or not. A user could just wrap the call to serviceRunner.run()
inside metric counters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. i tend to think we do want to provide these metrics out of the box, same as we provide logging information and depend on swift-log. could be convinced otherwise if there is strong reason not to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on this, I would like to get the PR merged without metrics and then add it on top since the PR is getting quite big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this looks great.
Slightly concerned about how this will be released. If it is just tagged as another alpha release it will most likely break a bunch of libraries.
Thanks for the great feedback already. I just pushed a new change that removes the await withShutdownGracefulHandler {
// Your work here
} onGracefulShutdown: {
// Your graceful shutdown logic here
} The great thing is that this automatically propagates to child tasks since it is backed by a task local. This means that server authors mostly just have to start the quiescing on their NIO based channel and any work that happens in for example a gRPC streaming request will be able to listen to graceful shutdown without any manual work! I think this works out amazingly! I would love to get feedback on this. The other thing where I am still a bit torn is the long running vs non long running work. Using a protocol approach also feels brittle because once you conform to the |
Nit on the name: |
6e675b3
to
754bf5c
Compare
6f3bbee
to
2efa2a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much nicer already! Will go in depth tomorrow.
/// - Note: This method is mostly relevant for testing graceful shutdown. In your application, the ``ServiceRunner`` | ||
/// should be the instance that triggers the graceful shutdown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I use this in a test case?
Sources/ServiceLifecycle/Docs.docc/How to adopt ServiceLifecycle in libraries.md
Outdated
Show resolved
Hide resolved
Just discussed this some more with @fabianfett.
|
Services don't have an initialising state, and could therefore all be initialising at the same time. As soon as one service initialisation hits a suspension point another service can start initialising. So being careful about shutdown ordering doesn't seem that important. |
You are correct that services don't have initialization stages because that is happening before they are passed to the runner. Simple example, you have an HTTPclient and a gRPC server. In your server you are using the http client to make requests on a RPC call. We need to make sure the client is running until the server is done. Otherwise, you might not be able to handle a gRPC request because the client is already shutdown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great!
Sources/ServiceLifecycle/Docs.docc/How to adopt ServiceLifecycle in libraries.md
Outdated
Show resolved
Hide resolved
Sources/ServiceLifecycle/Docs.docc/How to adopt ServiceLifecycle in libraries.md
Outdated
Show resolved
Hide resolved
One of the nice things that the original service-lifecycle had was you could create a child service-lifecycle |
# Motivation # Modification # Result
…nested service runner graceful shutdown
816a92f
to
c9f8424
Compare
Got around to address all the outstanding comments. Thanks again for the amazing feedback so far!
@sliemeobn I added two new APIs in the latest commit @ktoso @tomerd @glbrntt @fabianfett I would like to move forward with merging this PR soonish and tagging a new 2.0.0-alpha for this. The current releases are 1.x.x-alpha and I think using a new SemVer major is good here to imply the significant changes of API. There are three open things: Support of short running servicesThis has come up multiple times during the review but so far we haven't found a compelling use-case for this that isn't handled with structured concurrency itself. I would like to defer solving this until we get more real world experience and concrete needs emerge. We have room to add this to the API by simply extending the Metrics@tomerd You brought up built-in metrics for startup and shutdown. In general, I like the idea but would like to defer this to keep the PR smaller. Failing checksThis package is only supporting 5.7+ so the checks for previous Swift versions are failing. Once we merge this PR, we should remove those checks. |
8c9eb36
to
e59ed5c
Compare
Going ahead with merging this PR and solving the outstanding questions in follow-up PRs |
Motivation
Since the release of Swift Concurrency the server ecosystem has adopted
async
/await
in a lot of APIs making the user experience amazing. Another thing that Swift Concurrency introduced is Structured Concurrency which allows to create a task tree with parent to child relations. Structured Concurrency has many benefits such as automatic cancellation and task local propagation. The current state of this library predates the introduction of Swift Concurrency and is not inline anymore with how applications are structured that want to leverage Concurrency.Modification
This PR overhauls the implementation of service lifecycle completely. The reason for this is that the current implementation is very focused on how NIO based applications work before the introduction of Concurrency. Furthermore, Structured Concurrency is actually replacing part of the current functionality by providing new scoping mechanisms. The overhauled implementation provides two primitive types. Anew
Service
protocol and aServiceRunner
. The former is providing a clear API which services have to conform to so that they can be run by theServiceRunner
.An example usage of the types looks like this
Result
We now have a service lifecycle library that integrates nicely with Structured Concurrency. Its value add is that it solves to complex setup of the task group with the signal handling. Furthermore, it provides a currency type
Service
that can be passed to inject services.