feat(debug): Log a privacy preserving hash of IP and UserAgent to assist in rate limiting debugging #148
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📝 Summary
We are seeing abuse of our endpoint that is bypassing our firewall rules. To help determine how they are getting past the firewall we need to log the "source" of the traffic in a privacy-preserving way.
What I came up with is thexxhash( x-forwarded-for + user-agent )
or more specifically the left-most non-private IP in theX-Forwarded-For
and theUser-Agent
headers.However, after discussing w/ @0x416e746f6e he made the good point that
User-Agent
can be gamed by randomly generating a new UA w/ each request. Furthermore, the fingerprint should not be long-term stable, as that would allow a malicious actor with access to Flashbot logs the ability to track user behavior long-term. So now the hash is salted w/ the current timestamp truncated to the hour, and the UA is removed. And finally, the truncated timestamp is XOR'ed w/ a random uint64 at startup to prevent "rainbow table" attacks where a malicious RPC operator exhaustively hashes all IP/timestamp combinations to determine the source IP of a fingerprint.In addition, we currently use
proxyd
as our upstream which by default uses theX-Forwarded-For
header for rate limiting. Sincexxhash
is auint64
it actually fits perfectly in an IPv6 address, so we convert the fingerprint to a fake IPv6 address in the reserved "example address" documentation prefix from https://datatracker.ietf.org/doc/html/rfc3849 and insert this "IP" as theX-Forwarded-For
field.⛱ Motivation and Context
Give us to means to perform log aggregation to see if the offending traffic is coming from a single or multiple sources.
📚 References
✅ I have run these commands
make lint
make test
go mod tidy