-
Notifications
You must be signed in to change notification settings - Fork 40
S3 for CRAB
This is a high level descriptions of the plan for integrating CERN S3 storage service into CRAB
Initially we will use S3 to replace CRABCache as described in Wiki page: CRABCache replacement with S3
Later on we can look into extending use to store and server all logs which are now accessed on schedd's via htttp. Other uses may come up later.
S3 allows to create a storage container via OpenStack. We ask for a storage quota in our OpenStack project and size has to be negotiated with CERN of course, but several TB's are a no issue) This storage is part of CEPH and I/O is limited since there is some kind of gateway. It is not like EOS where the server runs on the same host which has the disks, so they can provide GB/s (Dan VanDerSteer said). Access to this storage is via the same interface as Amazon Web Services S3 (or course).
NOTE: with OpenStack we create storage containers and give them a name. We can later access those with boto client and in there those containers are mapped to "buckets" with same name.
For our OpenStack project we can get a set of keys that we need to keep secret. Using those we can create buckets inside our storage.the container and manage objects (files) in there. Key holder can decide if a bucket (or an object in a bucket) is public or private. Key holder can also create pre-signed URL's with an expiration time to make it easy for clients to e.g upload objects to a bucket.
Note: even if we create multiple buckets, we manage them with the same set of keys and they must all fit into same overall storage quota. We can not set size limit on different buckets unless we put them in different OpenStack projects.
Objects can be stored as /dir1/dir2/../filename and those dirs can be used to list and count things, so we can keep track of use.
More details and how-to's tailored for our case in this document from Prajesh, (originally on google docs here )
We decided to use S3 via the boto3
python client, and use py3 on SL7. There are already rpms' for this in cmsdist/comp and it also comes as a dependency of Rucio client.
$ # prepare a file with the secrets:
$ cat credentials
[default]
aws_access_key_id = <put you key here without “ ” >
aws_secret_access_key = <access key>
$ export AWS_SHARED_CREDENTIALS_FILE=credentials
Then in python
import boto3
endpoint='https://s3.cern.ch'
conn = boto3.client('s3', endpoint_url=endpoint)
AWS has a sophisticated Identity and Access Management tool. But CERN implementation has nothing like that at the moment. We only have:
- public objects, they are so public that not even CERN SSO will be required to access
- private obejcts, they are so private that the master keys are needed
- And for anything else, there's pre-signed URLs
CERN IT Storage group has opened a dedicated Mattermost channel for us
A few useful links :