Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul service lifecycle for Structured Concurrency #130

Merged
merged 27 commits into from
Apr 27, 2023

Conversation

FranzBusch
Copy link
Contributor

@FranzBusch FranzBusch commented Mar 10, 2023

Motivation

Since the release of Swift Concurrency the server ecosystem has adopted async/await in a lot of APIs making the user experience amazing. Another thing that Swift Concurrency introduced is Structured Concurrency which allows to create a task tree with parent to child relations. Structured Concurrency has many benefits such as automatic cancellation and task local propagation. The current state of this library predates the introduction of Swift Concurrency and is not inline anymore with how applications are structured that want to leverage Concurrency.

Modification

This PR overhauls the implementation of service lifecycle completely. The reason for this is that the current implementation is very focused on how NIO based applications work before the introduction of Concurrency. Furthermore, Structured Concurrency is actually replacing part of the current functionality by providing new scoping mechanisms. The overhauled implementation provides two primitive types. Anew Service protocol and a ServiceRunner. The former is providing a clear API which services have to conform to so that they can be run by the ServiceRunner.

An example usage of the types looks like this

actor FooService: Service {
    func run() async throws {}
}
actor BarService: Service {
    func run() async throws {}
}

let fooService = FooService()
let barService = BarService()

let runner = ServiceRunner(
   services: [fooService, barService],
   configuration: .init(gracefulShutdownSignals: [.sigterm]),
   logger: logger
)

try await runner.run()

Result

We now have a service lifecycle library that integrates nicely with Structured Concurrency. Its value add is that it solves to complex setup of the task group with the signal handling. Furthermore, it provides a currency type Service that can be passed to inject services.

@FranzBusch FranzBusch force-pushed the fb-structured-service-lifecycle branch from 695cde5 to 5d4f7f3 Compare March 10, 2023 15:21
Copy link
Member

@fabianfett fabianfett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm extremely excited about this and a big +1. Thanks for driving this @FranzBusch! However there are some details that we should discuss:

Sources/ServiceLifecycle/CancellableContinuation.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/Service.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/Service.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/CancellableContinuation.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/ServiceRunner.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/ServiceRunner.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/ServiceRunner.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/ServiceRunner.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/ServiceRunner.swift Outdated Show resolved Hide resolved
@tomerd
Copy link
Contributor

tomerd commented Mar 12, 2023

thanks for putting together @FranzBusch looks very exciting and welcome advancement. will take a deeper look in a week or so, but please continue to iterate with the team until then

@tomerd
Copy link
Contributor

tomerd commented Mar 12, 2023

@swift-server-bot add to allowlist

.package(url: "https://github.com/apple/swift-metrics.git", "1.0.0" ..< "3.0.0"),
.package(url: "https://github.com/swift-server/swift-backtrace.git", from: "1.1.1"),
.package(url: "https://github.com/apple/swift-nio.git", from: "2.0.0"), // used in tests
.package(url: "https://github.com/apple/swift-docc-plugin", from: "1.0.0"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emitting metric on application start and shutdown has been very valuable in real world production services, the canonical example is a monitoring this to detect an a spike in application restart because of crashes

Copy link
Member

@fabianfett fabianfett Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I agree. But ServiceLifecycle created the weird situation in the past where ServiceLifecycle depended on a MetricSystem that adopters wanted to setup with ServiceLifecycle. Basically the first use occurred even before the system was actually ready to create Counters. A good example for this are MetricSystems that are sending their data to a server themselves.

Because of this, I'm a big fan of dropping the swift-metrics dependency here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see your point. Maybe the solution is for this library to take an abstract handler that can emit the specific metrics, or even something like a delegate so that the application / service can do the bridging

Copy link
Contributor Author

@FranzBusch FranzBusch Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I agree. But ServiceLifecycle created the weird situation in the past where ServiceLifecycle depended on a MetricSystem that adopters wanted to setup with ServiceLifecycle. Basically the first use occurred even before the system was actually ready to create Counters. A good example for this are MetricSystems that are sending their data to a server themselves.

I don't think this applies anymore and we can safely take the dependency on swift-metrics. A potential metric backend should expose itself as a Service with a run() method. Furthermore, it should handle receiving metrics before the run() method is called and buffer them.

So the setup that I would recommend here is the following

// In Bootstrap.swift
import FooMetricsBackend

@main
struct Bootstrap {
    static func main() async throws {
        let fooMetricsBackend = FooMetricsBackend()
        MetricsSystem.bootstrap(fooMetricsBackend)
        
        let someService = SomeService()

        let serviceRunner = ServiceRunner(services: [fooMetricsBackend, someService], configuration: ...)
        try await serviceRunner.run()
    }
}

The thing we should discuss is if we want to provide these metrics out of the box or not. A user could just wrap the call to serviceRunner.run() inside metric counters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. i tend to think we do want to provide these metrics out of the box, same as we provide logging information and depend on swift-log. could be convinced otherwise if there is strong reason not to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on this, I would like to get the PR merged without metrics and then add it on top since the PR is getting quite big.

Copy link
Member

@adam-fowler adam-fowler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this looks great.
Slightly concerned about how this will be released. If it is just tagged as another alpha release it will most likely break a bunch of libraries.

Sources/ServiceLifecycle/Service.swift Outdated Show resolved Hide resolved
@FranzBusch
Copy link
Contributor Author

Thanks for the great feedback already. I just pushed a new change that removes the shutdownGracefully method and the default implementations. The new approach is very similar to how task cancellation listening works.

await withShutdownGracefulHandler {
    // Your work here
} onGracefulShutdown: {
    // Your graceful shutdown logic here
}

The great thing is that this automatically propagates to child tasks since it is backed by a task local. This means that server authors mostly just have to start the quiescing on their NIO based channel and any work that happens in for example a gRPC streaming request will be able to listen to graceful shutdown without any manual work! I think this works out amazingly!

I would love to get feedback on this. The other thing where I am still a bit torn is the long running vs non long running work. Using a protocol approach also feels brittle because once you conform to the LongRunningService protocol you cannot remove it anymore without breaking API. My current proposal would be to just remove that flag altogether and expect every run() method to be long running. I haven't come across a need for it myself and we can always add it in later.

@Lukasa
Copy link

Lukasa commented Mar 13, 2023

Nit on the name: withGracefulShutdownHandler.

@FranzBusch FranzBusch force-pushed the fb-structured-service-lifecycle branch from 6e675b3 to 754bf5c Compare March 13, 2023 17:05
Package.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/GracefulShutdown.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/GracefulShutdown.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/GracefulShutdown.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/GracefulShutdown.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/ServiceRunnerConfiguration.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/ServiceRunnerConfiguration.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/ServiceRunnerError.swift Outdated Show resolved Hide resolved
Sources/UnixSignals/UnixSignalsSequence.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/Service.swift Outdated Show resolved Hide resolved
@FranzBusch FranzBusch force-pushed the fb-structured-service-lifecycle branch from 6f3bbee to 2efa2a0 Compare March 14, 2023 13:50
Copy link
Member

@fabianfett fabianfett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much nicer already! Will go in depth tomorrow.

Comment on lines 52 to 53
/// - Note: This method is mostly relevant for testing graceful shutdown. In your application, the ``ServiceRunner``
/// should be the instance that triggers the graceful shutdown.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I use this in a test case?

@FranzBusch
Copy link
Contributor Author

Just discussed this some more with @fabianfett.

  • With the new withGracefulShutdownHandler approach I missed making sure all services are shutting down in order. To fix this we should send the graceful shutdown signal down each service child task and wait for the respective run method to finish before sending it to next child task.
  • The testing interface that I created is not really fitting. I will experiment with a with based API for testing instead
  • Oftentimes we have services that have run() methods where they want to treat graceful shutdown similar to cancellation. These methods are often iterating an AsyncSequence. We should provide convenience APIs here to make this work.

@adam-fowler
Copy link
Member

Just discussed this some more with @fabianfett.

  • With the new withGracefulShutdownHandler approach I missed making sure all services are shutting down in order. To fix this we should send the graceful shutdown signal down each service child task and wait for the respective run method to finish before sending it to next child task.

Services don't have an initialising state, and could therefore all be initialising at the same time. As soon as one service initialisation hits a suspension point another service can start initialising. So being careful about shutdown ordering doesn't seem that important.

@FranzBusch
Copy link
Contributor Author

Services don't have an initialising state, and could therefore all be initialising at the same time. As soon as one service initialisation hits a suspension point another service can start initialising. So being careful about shutdown ordering doesn't seem that important.

You are correct that services don't have initialization stages because that is happening before they are passed to the runner.
However, it is super important that we call the shutdown handlers in the reverse order and make sure a service is shutdown before we go to the next one.

Simple example, you have an HTTPclient and a gRPC server. In your server you are using the http client to make requests on a RPC call. We need to make sure the client is running until the server is done. Otherwise, you might not be able to handle a gRPC request because the client is already shutdown.

Copy link
Collaborator

@gjcairo gjcairo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!

Sources/ServiceLifecycle/GracefulShutdown.swift Outdated Show resolved Hide resolved
Sources/UnixSignals/UnixSignal.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/ServiceRunner.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/ServiceRunnerConfiguration.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/ServiceRunnerError.swift Outdated Show resolved Hide resolved
Sources/ServiceLifecycle/Docs.docc/index.md Outdated Show resolved Hide resolved
@adam-fowler
Copy link
Member

One of the nice things that the original service-lifecycle had was you could create a child service-lifecycle ComponentLifecycle consisting of multiple services and register that with your main service-lifecycle. The main service-lifecycle would then manage the lifecycle of these services in the child thus creating a hierarchy of services instead of a flat array. I'm not sure how you would do that with the new implementation.

@FranzBusch FranzBusch force-pushed the fb-structured-service-lifecycle branch from 816a92f to c9f8424 Compare April 25, 2023 15:43
@FranzBusch
Copy link
Contributor Author

FranzBusch commented Apr 25, 2023

Got around to address all the outstanding comments. Thanks again for the amazing feedback so far!

For shutting down while in retry I'd need two things:

  • a "Task.sleep" that cancels itself on shutdown (or better a "generic block" that cancels itself on shutdown)
  • a way to know "is this task requested to shut down"

Both are are harder to do than it seems (at least for me ; ) and require a lot of ceremony at the moment.

So, I am suggesting to add something along the lines of:

  • Task.isGracefulShutdownRequested
  • withCancellationOnGracefulShutdown { try await Task.sleep() }

@sliemeobn I added two new APIs in the latest commit cancelOnGracefulShutdown and Task.isShuttingDownGracefully this should make it easier to implement your use-case and we saw in other projects that they are in general helpful. I also changed the default behaviours to tolerate being run outside of a ServiceGroup and it will just run the operation closures. I think this is the right thing because graceful shutdown is something totally optional and it might never happen.

@ktoso @tomerd @glbrntt @fabianfett I would like to move forward with merging this PR soonish and tagging a new 2.0.0-alpha for this. The current releases are 1.x.x-alpha and I think using a new SemVer major is good here to imply the significant changes of API. There are three open things:

Support of short running services

This has come up multiple times during the review but so far we haven't found a compelling use-case for this that isn't handled with structured concurrency itself. I would like to defer solving this until we get more real world experience and concrete needs emerge. We have room to add this to the API by simply extending the ServiceGroupConfiguration.

Metrics

@tomerd You brought up built-in metrics for startup and shutdown. In general, I like the idea but would like to defer this to keep the PR smaller.

Failing checks

This package is only supporting 5.7+ so the checks for previous Swift versions are failing. Once we merge this PR, we should remove those checks.

@FranzBusch FranzBusch force-pushed the fb-structured-service-lifecycle branch from 8c9eb36 to e59ed5c Compare April 25, 2023 16:18
@FranzBusch
Copy link
Contributor Author

Going ahead with merging this PR and solving the outstanding questions in follow-up PRs

@FranzBusch FranzBusch merged commit fafab8f into main Apr 27, 2023
@FranzBusch FranzBusch deleted the fb-structured-service-lifecycle branch April 27, 2023 10:57
@FranzBusch FranzBusch added the ⚠️ semver/major Breaks existing public API. label Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚠️ semver/major Breaks existing public API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.