-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare OpenProteinSet #28
Comments
Has this been looked into? I could take a look at it if someone could help me sanity check it. |
I have not yet. Wanna collaborate over it @cmvcordova? I can start pulling the latest version on the cluster (there is a fairly old one already, but there is no point in using it if that decreases reproducibility) though I'm not 100% clear on what kind of preprocessing is needed. |
Let's do it! We can probably ping the rest of the team in the discord channel as we progress, to ensure we're on the right track |
Quick update: We're currently facing issues with downloading the dataset on the ingress node. Zipped files are approximately 3.3 TB which exceeds any user's limit. After contacting the StabilityAI team, we'll redirect our approach to downloading directing to S3 using the spark cluster node instead. |
@NZ99 @cmvcordova -- I believe this is now completed based on latest conversation with Niccolo. Could you please confirm? |
Confirming OPS is on the cluster and accessible through s3://openbioml/ |
Download and prepare OpenProteinSet on the cluster, while deleting the old version on S3.
https://registry.opendata.aws/openfold/
The text was updated successfully, but these errors were encountered: