Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create upload_bytes_to_bucket Cloud Function for LSST #250

Open
wants to merge 21 commits into
base: develop
Choose a base branch
from

Conversation

hernandezc1
Copy link
Collaborator

@hernandezc1 hernandezc1 commented Jan 27, 2025

This PR creates the necessary changes to store LSST alerts in Google Cloud Storage buckets using the upload_bytes_to_bucket Cloud Function.

Here is an overview of the changes made in this PR:

  • Rename setup_broker/rubin directory to setup_broker/lsst
  • Rename consumer/rubin directory to consumer/lsst
    • PS_TOPIC_DEFAULT for VM updated: "${survey}-alerts -> "${survey}-alerts_raw"
    • The testid is now appended to the consumer.group.id in the VM's configuration files to prevent new deployments from picking up existing offsets
  • setup_broker.sh deploys all of the necessary GCP resources
  • Create cloud_functions/lsst/ps_to_gcs directory
    • main.py deduplicates the "incoming" alert stream, stores the alert (.avsc) in a GCS bucket, and publishes the original alert bytes (schema not included) to the lsst-alerts Pub/Sub topic; the schema version of each alert is attached in the Pub/Sub message attributes. The BigQuery subscriptions in Use BigQuery subscriptions to store LSST alert data #243 can use the schema version as a filter to write the alert data to the appropriate BigQuery table

@hernandezc1 hernandezc1 self-assigned this Jan 27, 2025
bucket_name = f"{PROJECT_ID}-{SURVEY}_alerts_{versiontag}"
if TESTID != "False":
bucket_name = f"{bucket_name}-{TESTID}"
BUCKETS[versiontag] = client.get_bucket(client.bucket(bucket_name, user_project=PROJECT_ID))
Copy link
Collaborator Author

@hernandezc1 hernandezc1 Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-defining the buckets and making a .get_bucket request for each instance rather than each alert is a temporary solution I discussed with @troyraen. For future schema implementations, this will need to be updated

@hernandezc1 hernandezc1 requested review from wmwv and troyraen February 17, 2025 21:05
@hernandezc1 hernandezc1 added Enhancement New feature or request Pipeline: Storage Components whose primary function is to store data labels Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request Pipeline: Storage Components whose primary function is to store data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant