Overhaul service lifecycle for Structured Concurrency #130

FranzBusch · 2023-03-10T15:20:03Z

Motivation

Since the release of Swift Concurrency the server ecosystem has adopted async/await in a lot of APIs making the user experience amazing. Another thing that Swift Concurrency introduced is Structured Concurrency which allows to create a task tree with parent to child relations. Structured Concurrency has many benefits such as automatic cancellation and task local propagation. The current state of this library predates the introduction of Swift Concurrency and is not inline anymore with how applications are structured that want to leverage Concurrency.

Modification

This PR overhauls the implementation of service lifecycle completely. The reason for this is that the current implementation is very focused on how NIO based applications work before the introduction of Concurrency. Furthermore, Structured Concurrency is actually replacing part of the current functionality by providing new scoping mechanisms. The overhauled implementation provides two primitive types. Anew Service protocol and a ServiceRunner. The former is providing a clear API which services have to conform to so that they can be run by the ServiceRunner.

An example usage of the types looks like this

actor FooService: Service {
    func run() async throws {}
}
actor BarService: Service {
    func run() async throws {}
}

let fooService = FooService()
let barService = BarService()

let runner = ServiceRunner(
   services: [fooService, barService],
   configuration: .init(gracefulShutdownSignals: [.sigterm]),
   logger: logger
)

try await runner.run()

Result

We now have a service lifecycle library that integrates nicely with Structured Concurrency. Its value add is that it solves to complex setup of the task group with the signal handling. Furthermore, it provides a currency type Service that can be passed to inject services.

fabianfett

I'm extremely excited about this and a big +1. Thanks for driving this @FranzBusch! However there are some details that we should discuss:

Sources/ServiceLifecycle/CancellableContinuation.swift

Sources/ServiceLifecycle/Service.swift

Sources/ServiceLifecycle/CancellableContinuation.swift

Sources/ServiceLifecycle/ServiceRunner.swift

Sources/ServiceLifecycle/Service.swift

tomerd · 2023-03-12T18:03:23Z

thanks for putting together @FranzBusch looks very exciting and welcome advancement. will take a deeper look in a week or so, but please continue to iterate with the team until then

tomerd · 2023-03-12T18:03:34Z

@swift-server-bot add to allowlist

tomerd · 2023-03-12T18:06:14Z

Package.swift

-        .package(url: "https://github.com/apple/swift-metrics.git", "1.0.0" ..< "3.0.0"),
-        .package(url: "https://github.com/swift-server/swift-backtrace.git", from: "1.1.1"),
-        .package(url: "https://github.com/apple/swift-nio.git", from: "2.0.0"), // used in tests
-        .package(url: "https://github.com/apple/swift-docc-plugin", from: "1.0.0"),


emitting metric on application start and shutdown has been very valuable in real world production services, the canonical example is a monitoring this to detect an a spike in application restart because of crashes

Generally I agree. But ServiceLifecycle created the weird situation in the past where ServiceLifecycle depended on a MetricSystem that adopters wanted to setup with ServiceLifecycle. Basically the first use occurred even before the system was actually ready to create Counters. A good example for this are MetricSystems that are sending their data to a server themselves.

Because of this, I'm a big fan of dropping the swift-metrics dependency here.

I can see your point. Maybe the solution is for this library to take an abstract handler that can emit the specific metrics, or even something like a delegate so that the application / service can do the bridging

Generally I agree. But ServiceLifecycle created the weird situation in the past where ServiceLifecycle depended on a MetricSystem that adopters wanted to setup with ServiceLifecycle. Basically the first use occurred even before the system was actually ready to create Counters. A good example for this are MetricSystems that are sending their data to a server themselves.

I don't think this applies anymore and we can safely take the dependency on swift-metrics. A potential metric backend should expose itself as a Service with a run() method. Furthermore, it should handle receiving metrics before the run() method is called and buffer them.

So the setup that I would recommend here is the following

// In Bootstrap.swift import FooMetricsBackend @main struct Bootstrap { static func main() async throws { let fooMetricsBackend = FooMetricsBackend() MetricsSystem.bootstrap(fooMetricsBackend) let someService = SomeService() let serviceRunner = ServiceRunner(services: [fooMetricsBackend, someService], configuration: ...) try await serviceRunner.run() } }

The thing we should discuss is if we want to provide these metrics out of the box or not. A user could just wrap the call to serviceRunner.run() inside metric counters.

right. i tend to think we do want to provide these metrics out of the box, same as we provide logging information and depend on swift-log. could be convinced otherwise if there is strong reason not to.

Following up on this, I would like to get the PR merged without metrics and then add it on top since the PR is getting quite big.

adam-fowler

In general this looks great.
Slightly concerned about how this will be released. If it is just tagged as another alpha release it will most likely break a bunch of libraries.

Sources/ServiceLifecycle/Service.swift

FranzBusch · 2023-03-13T16:19:01Z

Thanks for the great feedback already. I just pushed a new change that removes the shutdownGracefully method and the default implementations. The new approach is very similar to how task cancellation listening works.

await withShutdownGracefulHandler {
    // Your work here
} onGracefulShutdown: {
    // Your graceful shutdown logic here
}

The great thing is that this automatically propagates to child tasks since it is backed by a task local. This means that server authors mostly just have to start the quiescing on their NIO based channel and any work that happens in for example a gRPC streaming request will be able to listen to graceful shutdown without any manual work! I think this works out amazingly!

I would love to get feedback on this. The other thing where I am still a bit torn is the long running vs non long running work. Using a protocol approach also feels brittle because once you conform to the LongRunningService protocol you cannot remove it anymore without breaking API. My current proposal would be to just remove that flag altogether and expect every run() method to be long running. I haven't come across a need for it myself and we can always add it in later.

Lukasa · 2023-03-13T16:22:33Z

Nit on the name: withGracefulShutdownHandler.

Package.swift

Sources/ServiceLifecycle/GracefulShutdown.swift

Sources/ServiceLifecycle/ServiceRunnerConfiguration.swift

Sources/ServiceLifecycle/ServiceRunnerError.swift

Sources/UnixSignals/UnixSignalsSequence.swift

Sources/ServiceLifecycle/Service.swift

fabianfett

Looks much nicer already! Will go in depth tomorrow.

fabianfett · 2023-03-14T21:29:43Z

Sources/ServiceLifecycle/GracefulShutdown.swift

+    /// - Note: This method is mostly relevant for testing graceful shutdown. In your application, the ``ServiceRunner``
+    /// should be the instance that triggers the graceful shutdown.


How do I use this in a test case?

Sources/ServiceLifecycle/Docs.docc/How to adopt ServiceLifecycle in libraries.md

Sources/ServiceLifecycle/ServiceRunner.swift

FranzBusch · 2023-03-15T12:49:12Z

Just discussed this some more with @fabianfett.

With the new withGracefulShutdownHandler approach I missed making sure all services are shutting down in order. To fix this we should send the graceful shutdown signal down each service child task and wait for the respective run method to finish before sending it to next child task.
The testing interface that I created is not really fitting. I will experiment with a with based API for testing instead
Oftentimes we have services that have run() methods where they want to treat graceful shutdown similar to cancellation. These methods are often iterating an AsyncSequence. We should provide convenience APIs here to make this work.

adam-fowler · 2023-03-15T16:41:47Z

Just discussed this some more with @fabianfett.

With the new withGracefulShutdownHandler approach I missed making sure all services are shutting down in order. To fix this we should send the graceful shutdown signal down each service child task and wait for the respective run method to finish before sending it to next child task.

Services don't have an initialising state, and could therefore all be initialising at the same time. As soon as one service initialisation hits a suspension point another service can start initialising. So being careful about shutdown ordering doesn't seem that important.

Sources/ServiceLifecycle/GracefulShutdown.swift

FranzBusch · 2023-03-15T17:54:04Z

Services don't have an initialising state, and could therefore all be initialising at the same time. As soon as one service initialisation hits a suspension point another service can start initialising. So being careful about shutdown ordering doesn't seem that important.

You are correct that services don't have initialization stages because that is happening before they are passed to the runner.
However, it is super important that we call the shutdown handlers in the reverse order and make sure a service is shutdown before we go to the next one.

Simple example, you have an HTTPclient and a gRPC server. In your server you are using the http client to make requests on a RPC call. We need to make sure the client is running until the server is done. Otherwise, you might not be able to handle a gRPC request because the client is already shutdown.

gjcairo

Looking great!

Sources/ServiceLifecycle/Docs.docc/How to adopt ServiceLifecycle in libraries.md

Sources/ServiceLifecycle/GracefulShutdown.swift

Sources/UnixSignals/UnixSignal.swift

Sources/ServiceLifecycle/ServiceRunner.swift

Sources/ServiceLifecycle/ServiceRunnerConfiguration.swift

Sources/ServiceLifecycle/ServiceRunnerError.swift

Sources/ServiceLifecycle/Docs.docc/index.md

Tests/ServiceLifecycleTests/GracefulShutdownTests.swift

adam-fowler · 2023-03-20T11:34:51Z

One of the nice things that the original service-lifecycle had was you could create a child service-lifecycle ComponentLifecycle consisting of multiple services and register that with your main service-lifecycle. The main service-lifecycle would then manage the lifecycle of these services in the child thus creating a hierarchy of services instead of a flat array. I'm not sure how you would do that with the new implementation.

# Motivation # Modification # Result

…rom swift-nio

…nested service runner graceful shutdown

…iceRunner`

FranzBusch · 2023-04-25T16:13:13Z

Got around to address all the outstanding comments. Thanks again for the amazing feedback so far!

For shutting down while in retry I'd need two things:

a "Task.sleep" that cancels itself on shutdown (or better a "generic block" that cancels itself on shutdown)

a way to know "is this task requested to shut down"

Both are are harder to do than it seems (at least for me ; ) and require a lot of ceremony at the moment.

So, I am suggesting to add something along the lines of:

Task.isGracefulShutdownRequested

withCancellationOnGracefulShutdown { try await Task.sleep() }

@sliemeobn I added two new APIs in the latest commit cancelOnGracefulShutdown and Task.isShuttingDownGracefully this should make it easier to implement your use-case and we saw in other projects that they are in general helpful. I also changed the default behaviours to tolerate being run outside of a ServiceGroup and it will just run the operation closures. I think this is the right thing because graceful shutdown is something totally optional and it might never happen.

@ktoso @tomerd @glbrntt @fabianfett I would like to move forward with merging this PR soonish and tagging a new 2.0.0-alpha for this. The current releases are 1.x.x-alpha and I think using a new SemVer major is good here to imply the significant changes of API. There are three open things:

Support of short running services

This has come up multiple times during the review but so far we haven't found a compelling use-case for this that isn't handled with structured concurrency itself. I would like to defer solving this until we get more real world experience and concrete needs emerge. We have room to add this to the API by simply extending the ServiceGroupConfiguration.

Metrics

@tomerd You brought up built-in metrics for startup and shutdown. In general, I like the idea but would like to defer this to keep the PR smaller.

Failing checks

This package is only supporting 5.7+ so the checks for previous Swift versions are failing. Once we merge this PR, we should remove those checks.

FranzBusch · 2023-04-27T10:55:57Z

Going ahead with merging this PR and solving the outstanding questions in follow-up PRs

FranzBusch force-pushed the fb-structured-service-lifecycle branch from 695cde5 to 5d4f7f3 Compare March 10, 2023 15:21

FranzBusch requested review from tomerd, yim-lee, glbrntt, ktoso and Lukasa March 10, 2023 15:39

fabianfett reviewed Mar 12, 2023

View reviewed changes

FranzBusch commented Mar 12, 2023

View reviewed changes

Sources/ServiceLifecycle/Service.swift Outdated Show resolved Hide resolved

tomerd reviewed Mar 12, 2023

View reviewed changes

adam-fowler reviewed Mar 13, 2023

View reviewed changes

Sources/ServiceLifecycle/Service.swift Outdated Show resolved Hide resolved

FranzBusch requested review from tomerd, fabianfett, adam-fowler and PeterAdams-A March 13, 2023 16:19

FranzBusch force-pushed the fb-structured-service-lifecycle branch from 6e675b3 to 754bf5c Compare March 13, 2023 17:05

glbrntt reviewed Mar 13, 2023

View reviewed changes

FranzBusch force-pushed the fb-structured-service-lifecycle branch from 6f3bbee to 2efa2a0 Compare March 14, 2023 13:50

fabianfett reviewed Mar 14, 2023

View reviewed changes

tachyonics reviewed Mar 14, 2023

View reviewed changes

Sources/ServiceLifecycle/Docs.docc/How to adopt ServiceLifecycle in libraries.md Outdated Show resolved Hide resolved

tachyonics reviewed Mar 14, 2023

View reviewed changes

Sources/ServiceLifecycle/ServiceRunner.swift Outdated Show resolved Hide resolved

adam-fowler reviewed Mar 15, 2023

View reviewed changes

Sources/ServiceLifecycle/GracefulShutdown.swift Show resolved Hide resolved

gjcairo reviewed Mar 16, 2023

View reviewed changes

FranzBusch added 19 commits April 25, 2023 16:43

Remove long running property

97de525

Documentation

e1c3013

Fix tests

364e85d

Ensure ordering of shutdown

020fbfa

Test

d1e717f

# Motivation # Modification # Result

Introduce ServiceLifecycleTestKit

68aaebb

Add AsyncCancelOngracefulShutdownSequence

be3e460

Tests and docs for graceful shutdown

bcfccdf

# Motivation # Modification # Result

Remove unsafe flags

8c5c58a

Add article for application authors

590ea1e

Add correct licence and notice information for the things we copied f…

a530153

…rom swift-nio

Code review

2e22daa

Remove @_unsafeInheritExectuor

9797c76

Code review

35af255

Change log levels, rename to withGracefulShutdownHandler and support …

d7a56c3

…nested service runner graceful shutdown

Add public shutdownGracefully method to the service runner

7cab86b

Code review and expose new cancelOnGracefulShutdown method

7e412d6

Extend graceful shutdown APIs and tolerate running outside of a `Serv…

b6834b3

…iceRunner`

Rename to ServiceGroup

c9f8424

FranzBusch force-pushed the fb-structured-service-lifecycle branch from 816a92f to c9f8424 Compare April 25, 2023 15:43

FranzBusch added 3 commits April 25, 2023 16:49

Enable Swift 5.6 support

c8d0efb

Update soundess dockerfile

8dfbcdd

Update Readme

e82d688

Make 5.6 build work

e59ed5c

FranzBusch force-pushed the fb-structured-service-lifecycle branch from 8c9eb36 to e59ed5c Compare April 25, 2023 16:18

FranzBusch merged commit fafab8f into main Apr 27, 2023

FranzBusch deleted the fb-structured-service-lifecycle branch April 27, 2023 10:57

FranzBusch added the ⚠️ semver/major Breaks existing public API. label Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhaul service lifecycle for Structured Concurrency #130

Overhaul service lifecycle for Structured Concurrency #130

FranzBusch commented Mar 10, 2023 •

edited

Loading

fabianfett left a comment

tomerd commented Mar 12, 2023

tomerd commented Mar 12, 2023

tomerd Mar 12, 2023

fabianfett Mar 13, 2023 •

edited

Loading

tomerd Mar 13, 2023

FranzBusch Mar 13, 2023 •

edited

Loading

tomerd Apr 6, 2023

FranzBusch Apr 25, 2023

adam-fowler left a comment

FranzBusch commented Mar 13, 2023

Lukasa commented Mar 13, 2023

fabianfett left a comment

fabianfett Mar 14, 2023

FranzBusch commented Mar 15, 2023

adam-fowler commented Mar 15, 2023

FranzBusch commented Mar 15, 2023

gjcairo left a comment

adam-fowler commented Mar 20, 2023

FranzBusch commented Apr 25, 2023 •

edited

Loading

FranzBusch commented Apr 27, 2023

		/// - Note: This method is mostly relevant for testing graceful shutdown. In your application, the ``ServiceRunner``
		/// should be the instance that triggers the graceful shutdown.

Overhaul service lifecycle for Structured Concurrency #130

Overhaul service lifecycle for Structured Concurrency #130

Conversation

FranzBusch commented Mar 10, 2023 • edited Loading

Motivation

Modification

Result

fabianfett left a comment

Choose a reason for hiding this comment

tomerd commented Mar 12, 2023

tomerd commented Mar 12, 2023

tomerd Mar 12, 2023

Choose a reason for hiding this comment

fabianfett Mar 13, 2023 • edited Loading

Choose a reason for hiding this comment

tomerd Mar 13, 2023

Choose a reason for hiding this comment

FranzBusch Mar 13, 2023 • edited Loading

Choose a reason for hiding this comment

tomerd Apr 6, 2023

Choose a reason for hiding this comment

FranzBusch Apr 25, 2023

Choose a reason for hiding this comment

adam-fowler left a comment

Choose a reason for hiding this comment

FranzBusch commented Mar 13, 2023

Lukasa commented Mar 13, 2023

fabianfett left a comment

Choose a reason for hiding this comment

fabianfett Mar 14, 2023

Choose a reason for hiding this comment

FranzBusch commented Mar 15, 2023

adam-fowler commented Mar 15, 2023

FranzBusch commented Mar 15, 2023

gjcairo left a comment

Choose a reason for hiding this comment

adam-fowler commented Mar 20, 2023

FranzBusch commented Apr 25, 2023 • edited Loading

Support of short running services

Metrics

Failing checks

FranzBusch commented Apr 27, 2023

FranzBusch commented Mar 10, 2023 •

edited

Loading

fabianfett Mar 13, 2023 •

edited

Loading

FranzBusch Mar 13, 2023 •

edited

Loading

FranzBusch commented Apr 25, 2023 •

edited

Loading