Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Thrift and HTTP client-side fault injection. #666

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

HiramSilvey
Copy link

@HiramSilvey HiramSilvey commented Dec 6, 2024

💸 TL;DR

This PR introduces client-side fault injection capability to HTTP and Thrift clients. The behavior is controlled via new X-Bp-Fault headers, and does nothing if they aren't present, are invalid, or don't match the outgoing request.

📜 Details

If clients are currently passing matching X-Bp-Fault headers around and they happen to be valid according to this change, then unintended fault injection could take place. This is extremely unlikely to occur given how specific the headers and their valid values are, but I wanted to call it out as technically it could affect requests unintentionally.

🧪 Testing Steps / Validation

Thorough unit tests were added across the common library and protocol-specific code.

✅ Checks

  • CI tests (if present) are passing
  • Adheres to code style for repo
  • Contributor License Agreement (CLA) completed if not a Reddit employee

TODO: Determine how to short-circuit Thrift requests.
TODO: Refactor into a common shared function with func parameters.
Need to add logging, tests, and potentially special Thrift error logic.
Need to adjust server name matching either for tests or
permanently. The issue with using server prefix instead of the
<service>.<namespace> paradigm is that care needs to be taken to
ensure the prefix doesn't match too many destinations. Blast radius
can explode in that model.

Prometheus exports and logging are next after that.
@HiramSilvey HiramSilvey marked this pull request as ready for review December 6, 2024 22:24
@HiramSilvey HiramSilvey requested a review from a team as a code owner December 6, 2024 22:24
@HiramSilvey HiramSilvey requested review from fishy, kylelemons and pacejackson and removed request for a team December 6, 2024 22:24
faults/common.go Outdated Show resolved Hide resolved
faults/common.go Outdated Show resolved Hide resolved
faults/common.go Outdated Show resolved Hide resolved
faults/common.go Outdated Show resolved Hide resolved
faults/common.go Outdated Show resolved Hide resolved
httpbp/client_middlewares.go Outdated Show resolved Hide resolved
httpbp/client_middlewares.go Outdated Show resolved Hide resolved
httpbp/client_middlewares_test.go Outdated Show resolved Hide resolved
httpbp/client_middlewares.go Outdated Show resolved Hide resolved
thriftbp/client_middlewares.go Show resolved Hide resolved
@HiramSilvey HiramSilvey requested a review from fishy December 10, 2024 16:53
faults/common.go Outdated Show resolved Hide resolved
faults/common.go Outdated Show resolved Hide resolved
faults/common.go Outdated Show resolved Hide resolved
@HiramSilvey HiramSilvey requested a review from fishy December 10, 2024 20:29
internal/faults/common.go Outdated Show resolved Hide resolved
internal/faults/common.go Outdated Show resolved Hide resolved
internal/faults/common.go Outdated Show resolved Hide resolved
thriftbp/client_middlewares.go Outdated Show resolved Hide resolved
@HiramSilvey HiramSilvey requested a review from fishy December 11, 2024 14:28
@HiramSilvey HiramSilvey requested a review from fishy December 11, 2024 17:29
@HiramSilvey
Copy link
Author

HiramSilvey commented Dec 11, 2024

Hmmm requested feedback too soon. Fixing now that I see failures.

Edit: OK, good to go.

@HiramSilvey
Copy link
Author

FYI I just updated to remove the single random integer implementation altogether as per https://github.com/grpc/proposal/blob/master/A33-Fault-Injection.md#evaluate-possibility-fraction.

Copy link
Member

@fishy fishy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but please wait for others' reviews.

httpbp/client_middlewares.go Show resolved Hide resolved
httpbp/client_middlewares_test.go Outdated Show resolved Hide resolved
@HiramSilvey
Copy link
Author

@fishy @pacejackson thanks for the reviews. I just made 1 small change based on the GRPC fault injection implementation: short circuit the injected delay if the request context is cancelled. Ref: https://github.com/grpc/grpc-go/blob/38a8b9a7057218c210a599d79f3e1d07f84d18c5/xds/internal/httpfilter/fault/fault.go#L196

I changed it to exit the fault injection middleware and continue processing as per normal. If you could take a quick look to just make sure that's OK that would be appreciated. Thanks!

@HiramSilvey
Copy link
Author

lgtm but please wait for others' reviews.

@fishy it looks like @kylelemons won't be available to review this PR until the new year -- should this be blocked until then or is it OK to proceed with the 2 approvals I have now? If there is someone else I should get a sooner review from instead, let me know. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants