Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] make it possible to attibute S3 GET/List requests to specific queries. #686

Open
1 task done
antonysouthworth-halter opened this issue Jul 11, 2024 · 2 comments
Labels
pkg:dbt-athena Issue affects dbt-athena type:enhancement New feature or request

Comments

@antonysouthworth-halter
Copy link
Contributor

antonysouthworth-halter commented Jul 11, 2024

Is this your first time submitting a feature request?

  • I have searched the existing issues, and I could not find an existing issue for this feature

Describe the feature

When trying to trace down the origins of S3 GET requests it is currently quite difficult to attribute a given GET request with a specific Athena query.

We already set the User-Agent for all requests (#49) which is awesome and means you can tell that they are coming from the adapter; my proposal is to take that a step further and potentially include some kind of identifier for the exact query being run? Unfortunately this can't be the query execution ID from Athena but perhaps we could inject the ClientRequestToken into the User-Agent header value for the StartQueryExecution call?

That way the ClientRequestToken of the original StartQueryExecution call will show up in CloudTrail logs for the S3 GET requests and therefore provide a lineage chain from StartQueryExecution call to GetObject request and back again (check the responseElements of the CloudTrail log for the StartQueryExecution request to get the queryExecutionId).

Never mind, see comment. Also it doesn't need to be the ClientRequestToken it can just be anything.

Describe alternatives you've considered

Haven't really thought of any. Ideas welcome!

Who will this benefit?

Folks trying to use the adapter at any kind of scale.

Are you interested in contributing this feature?

Potentially? There's a decent chance I just try to do this on a local fork.

Anything else?

No response

@antonysouthworth-halter
Copy link
Contributor Author

🙃 Not sure how I ended up thinking that the User-Agent is passed from Athena through to S3 GET, it's clear as day that the User-Agent for the GETs is just athena.amazonaws.com. Must have got some wires crossed somewhere.

Regardless if we had some way of attributing the requests that would be good!

@nicor88
Copy link
Contributor

nicor88 commented Jul 12, 2024

Here it's where we set the user-agent for the calls done from boto3 to Athena. But given the fact that the caller for the S3 calls is athena itself, I'm unsure if the information that you need are passed down to S3. It will be indeed amazing to have in the S3 trails more informations about the caller, for example the query execution id to track down who did the Get/List.

Shall we raise this as a feedback for the athena team itself? I'm usure that we can do much here.

@mikealfare mikealfare added the pkg:dbt-athena Issue affects dbt-athena label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:dbt-athena Issue affects dbt-athena type:enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants