AWS identity verification #2121

bemoody · 2023-10-24T22:55:43Z

Context

Pull #2086 added the ability to mirror project content on Amazon S3. This is now working and we are in the process of uploading open-access projects from PhysioNet.

The changes here will be needed once we start uploading restricted/credentialed projects, so that we can securely grant access to authorized users. (Identity verification aside, there are also some more significant changes that are needed for handling restricted/credentialed projects; see issue #2094.)

In brief: currently (in the old system Felipe set up), people are asked to self-report their AWS account number, and any person or service within that account would be allowed to access restricted data.

With these changes, in contrast, people will be asked to verify their personal AWS identity; subsequently, we'll be able to grant access only to verified identities (the latter part is yet to be implemented.)

Why

DUAs for MIMIC and other databases require that data is only shared with authorized individuals (each person must register on PhysioNet and be credentialed.) We want to enable cloud access for better performance, but complying with these DUAs requires knowing who is being granted permission to use these cloud services.

Moreover, although each user is ultimately responsible for data security, we want to encourage good practices. People may be using AWS for all sorts of reasons unrelated to PhysioNet. Giving themselves permission to access MIMIC through their personal account should not also grant permission to all of those unrelated and possibly-less-trusted services.

Some people may be using organizational AWS accounts rather than personal ones. Maybe we want to discourage this, or maybe not, but we can't prevent it. One member of an organization having access shouldn't grant access to everyone in the organization.

There is a lot about AWS authentication that is still a bit mysterious to me, but my gut feeling is that the "IAM user" level is the right level of authentication for PhysioNet and MIMIC.

It has been suggested that we could ask people to self-report their AWS username (or ARN?) in addition to their account number. And yes, that would be an improvement; but it has the disadvantage that usernames are variable-length, and may not be long-term stable. Better would be to ask people to self-report their AWS userid, but that's not easy for people to find and more likely to cause mistakes.

Finally, I can imagine that in the future there may be other reasons for wanting to associate a PhysioNet account with an AWS account, and having a strong verification process could enable more interesting forms of integration.

How identity verification works

The concept is that we would have a special-purpose S3 bucket which allows access only if the path matches the requester's AWS account and userid. To prove your identity, you generate a signature for a URL that can only be accessed by you, and paste that signed URL into a form on the site.

The process would be:

You go to your cloud settings page on PhysioNet.
We tell you to run the command aws sts get-caller-identity.
You copy the output into the form.
We then tell you to run a command like aws s3 presign s3://asdfghjk/physionet.org-verification/[email protected]/userid=AIDAABCDEFGHIJKL/account=112233445566/username=barackobama/.
You copy the output into the form.
We verify the format of the URL and submit it to AWS to verify the signature.

Wait a minute, what's this "userid" thing you keep talking about?

https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html#identifiers-unique-ids

Setup and testing

Using this feature requires creating a special-purpose S3 bucket (a bucket which probably should not be used for anything else.)

For the time being, you can test this by setting AWS_VERIFICATION_BUCKET_NAME to bm-uverify-test1. I will delete that bucket once we've set up a permanent replacement under the PhysioNet AWS account.

If you want to see exactly how the verification bucket is created, and test it yourself, see the instructions in deploy/README.md.

Background

Although this implementation is guided by the needs of PhysioNet, my goal has been to design a general-purpose authentication protocol that could be used by any site that needs to verify cross-account AWS identities.

This is inspired in part by the technique used by Hashicorp Vault and discussed here:

and similarly: https://stackoverflow.com/a/76099155

We could use the same method, but it would require the person to download and run a small program (and that program involves some pretty hairy digging into the AWS API.)

The method proposed here, in contrast, only requires the person to install the official AWS CLI and run a couple of commands. I think that this is easier to understand and therefore paradoxically more secure (see if you can spot the security flaw in the StackOverflow answer.)

For information about why this works, see AWS documentation on policy variables:
https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_variables.html

Also see the AWS CLI documentation:
https://docs.aws.amazon.com/cli/latest/reference/sts/get-caller-identity.html
https://docs.aws.amazon.com/cli/latest/reference/s3/presign.html

bemoody · 2023-10-30T19:47:45Z

Just using the GetObject permission doesn't quite work - if the resource doesn't exist then we get a 403 even if we have permission to access it.

https://stackoverflow.com/a/19038017 explains why this is (which makes sense), though it's not 100% clear how the backend is interpreting the policy, and I'd like to find some official AWS docs clarifying the matter.

Adding ListBucket permission seems to make it work (and raises the question of whether the GetObject permission itself is needed.)

bemoody · 2023-10-30T19:57:17Z

Unfortunately, patching in X-Amz-Expected-Bucket-Owner doesn't work. Don't know if there's a way to include that header in the aws s3 presign command. We can live without it, it just feels a little less safe.

bemoody · 2023-10-30T21:28:08Z

Now with UI and should be working!

bemoody · 2023-10-30T22:11:15Z

Nits:

modal before deleting
better error message for unsupported (non-AIDA) identity
save original Arn so we can show it on the edit_cloud page?

bemoody · 2023-11-01T18:32:36Z

Hmm. Some sources say that Amazon presigned URLs have a maximum expiration time of one week. Maybe that's true for the v4 signature? It's definitely not true for the v2 signature, I just generated one that was valid for a year and it was accepted.

physionet-django/user/awsverification.py

bemoody · 2023-11-07T21:46:40Z

github-advanced-security, you're adorable. It's a fair point that I should rework that regexp to reduce backtrackability (though I'm not sure it's possible to completely fix the issue; python re is annoying that way.)

As for the rest of it, get outta here. ;)

physionet-django/user/awsverification.py

bemoody · 2023-12-05T17:09:31Z

I believe this approach should work if we use S3 Access Points to grant access. If we use S3 Access Grants then it is less clear - can we still grant access based on userid or do we have to use ARN?

So we may have to decide which approach to take before deciding how to validate identities. :(

bemoody · 2023-12-12T17:15:07Z

So we may have to decide which approach to take before deciding how to validate identities.

I'm still not clear about the access-grant thing and how it would work.

But just to note: this code could be changed to verify the requester's aws:username in addition to their aws:userid. Arguably it makes sense for us to do that regardless.

So although I think it's questionable, security-wise, to rely on cross-account IAM ARNs, it's not impossible and also wouldn't preclude other approaches.

bemoody · 2024-01-29T23:59:35Z

Note that using presigned URLs seems to be broken with bucket regions other than us-east-1: boto/boto3#3015

This affects older versions of awscli. Newer versions don't use boto3 and are possibly unaffected.

For the time being, PhysioNet should use us-east-1 for user verification, and other sites should do the same.

Unrelated to user verification, this might also cause problems if we want to store data in regions other than us-east-1.

physionet-django/user/awsverification.py

+        # If the signature is correct, and the identity is correct as
+        # determined by the bucket policy, then S3 should return a 404
+        # response (because the resource doesn't, in fact, exist.)
+        response = session.get(signed_url)


physionet-django/user/awsverification.py

+
+        # As a sanity check, verify that S3 returns a 403 response if
+        # the AWS signature is missing.
+        response = session.get(unsigned_url)


This module contains functions for authenticating a person's AWS identity (account number, username, and user ID) by means of signed URLs. Amazon S3 authenticates clients using a per-request "signature" that incorporates the request path and headers together with a secret key held by the client. This means that the client can pre-compute this signature and send it to someone else, allowing the recipient to perform that request on that client's behalf, without revealing the secret key itself. We can arrange to create an S3 URL that can only be accessed by a particular AWS identity, and then ask someone to pre-compute the signature that they would use to access that resource (which they can do using the AWS CLI or other S3-compatible tools and libraries.) If we then submit that signature to S3 and it succeeds, we know that the requester holds the secret key for that identity. In fact, the resource in question doesn't need to actually exist, as long as we can tell the difference between an unauthorized request (HTTP 403) and an authorized request for something that doesn't exist (HTTP 404). AWS supports many types of identities. This module only supports "IAM user" identities (arn:aws:iam::*:user/*) and not any other types, both as a matter of policy (data access permissions are granted to individual people, not to groups, organizations, or computers) and because IAM users have a stable, fixed-length user ID.

We want to verify a person's AWS identity in order to permit them to access restricted resources via direct cloud APIs; and possibly for other purposes in the future. An AWS account ID is not an identity. An account may contain many identities (known as "userids" or "unique IDs"), which might or might not belong to the same person. (Even if they do all belong to the same person, it doesn't mean that person wants or should want to give all of their identities the ability to access sensitive data.) Here, we add fields to store the userid alongside the account ID and ARN (which may be of interest in the future), and the date and time that these credentials were verified.

In order for a person to verify their AWS identity, they need to provide a digital signature, in the form of a signed URL that includes their AWS account information in the path. We further require the URL to include the domain name of the site, and the user's primary email address, to prevent misuse. This signed URL can be generated using the AWS CLI. However, the URL must be exactly correct; if it is wrong, it is difficult to tell why. In order to hopefully avoid confusion, we first ask the person to run 'aws sts get-caller-identity'; based on that, we tell them the exact 'aws s3 presign' command they need to run.

bemoody · 2024-01-31T18:31:02Z

I'm banning this bot. I don't accept contributions from bots unless they are free and open source.

(Seriously. I am not interested in providing free labor to Microsoft to help them develop better tools to lock people into GitHub.)

bemoody · 2024-01-31T18:46:15Z

regarding boto3 being silly in test suite, see #2150 (comment)

In testing the new AWS user verification feature (see #2121), we found that the bucket policy written in the code didn't match the policy I had actually set for the `bm-uverify-test1` demo bucket. (Sorry, I guess I never tested the final version of that code.) We need to allow *any* principal, not only principals matching a pattern. This new version of the policy has been tested on my personal AWS account, so hopefully it should work when we do the same on the PhysioNet account.

bemoody force-pushed the user-aws-verification branch 4 times, most recently from ee7a7b4 to 859822d Compare October 30, 2023 19:45

bemoody force-pushed the user-aws-verification branch 2 times, most recently from 46ad8f2 to 078df56 Compare October 30, 2023 21:12

bemoody force-pushed the user-aws-verification branch 2 times, most recently from 15b6210 to b177957 Compare October 30, 2023 21:56

bemoody force-pushed the user-aws-verification branch 2 times, most recently from 65c007c to cab730e Compare November 7, 2023 19:25

github-advanced-security bot found potential problems Nov 7, 2023

View reviewed changes

physionet-django/user/awsverification.py Fixed Show fixed Hide fixed

physionet-django/user/awsverification.py Fixed Show fixed Hide fixed

physionet-django/user/awsverification.py Fixed Show fixed Hide fixed

bemoody force-pushed the user-aws-verification branch 2 times, most recently from b0b9e44 to ab2ab47 Compare November 10, 2023 23:36

github-advanced-security bot found potential problems Nov 10, 2023

View reviewed changes

physionet-django/user/awsverification.py Fixed Show fixed Hide fixed

bemoody force-pushed the user-aws-verification branch from ab2ab47 to cc9b991 Compare January 30, 2024 22:09

github-advanced-security bot found potential problems Jan 30, 2024

View reviewed changes

Benjamin Moody added 4 commits January 31, 2024 13:14

Add settings and documentation for AWS verification.

b8a7975

Add test cases for AWS identity verification.

e36f7a4

bemoody force-pushed the user-aws-verification branch from cc9b991 to e36f7a4 Compare January 31, 2024 18:39

bemoody marked this pull request as ready for review January 31, 2024 19:17

bemoody changed the title ~~[WIP] AWS identity verification~~ AWS identity verification Jan 31, 2024

Chrystinne self-assigned this Apr 17, 2024

tompollard merged commit c0a644d into dev Jul 9, 2024
8 checks passed

tompollard deleted the user-aws-verification branch July 9, 2024 14:16

tompollard restored the user-aws-verification branch July 9, 2024 14:22

bemoody mentioned this pull request Jul 24, 2024

Fix Principal in AWS verification bucket policy #2266

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS identity verification #2121

AWS identity verification #2121

bemoody commented Oct 24, 2023 •

edited

Loading

bemoody commented Oct 30, 2023

bemoody commented Oct 30, 2023

bemoody commented Oct 30, 2023

bemoody commented Oct 30, 2023

bemoody commented Nov 1, 2023

bemoody commented Nov 7, 2023

bemoody commented Dec 5, 2023

bemoody commented Dec 12, 2023

bemoody commented Jan 29, 2024

bemoody commented Jan 31, 2024

bemoody commented Jan 31, 2024

AWS identity verification #2121

AWS identity verification #2121

Conversation

bemoody commented Oct 24, 2023 • edited Loading

Context

Why

How identity verification works

Wait a minute, what's this "userid" thing you keep talking about?

Setup and testing

Background

bemoody commented Oct 30, 2023

bemoody commented Oct 30, 2023

bemoody commented Oct 30, 2023

bemoody commented Oct 30, 2023

bemoody commented Nov 1, 2023

bemoody commented Nov 7, 2023

bemoody commented Dec 5, 2023

bemoody commented Dec 12, 2023

bemoody commented Jan 29, 2024

bemoody commented Jan 31, 2024

bemoody commented Jan 31, 2024

bemoody commented Oct 24, 2023 •

edited

Loading