Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

U/jrbogart/gcr tutorial #166

Merged
merged 4 commits into from
Jan 29, 2025
Merged

U/jrbogart/gcr tutorial #166

merged 4 commits into from
Jan 29, 2025

Conversation

JoanneBogart
Copy link
Collaborator

Demonstrate querying and accessing GCRCatalog-type catalogs via dataregistry.

@stuartmcalpine
Copy link
Collaborator

Overall I think it's fine. It won't continue to work as-is with the new changes to the connection to the production schema etc, so the mechanics will have to change a bit.

Also need to add a link to the note book in the docs somewhere. Do we want a page in the docs for this? (or for DESC specific applications of the registry)

@JoanneBogart
Copy link
Collaborator Author

Why won't the code here work as is? The connection to dataregistery happens with
GCRCatalogs.ConfigSource.set_config_source(dr=True)

which ultimately makes a connection with
dr_reg = DataRegistry(schema=dr_schema, root_dir=dr_root, site=dr_site)

where dr_schema is "lsst_desc_production"
It would be better if that value were obtained from schema.DEFAULT_SCHEMA_PRODUCTION rather than hard-coded as it is currently, but I think it will do the right thing. Or am I missing something?

Yes, there should be a link to the notebook somewhere. It looks like tutorial_python.rst would be the right place. I'll add that.

Copy link
Collaborator

@stuartmcalpine stuartmcalpine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it should work. Need to double check testing it with the other PR.

Only comments relate to changes other PR will make relating to the schema to connect to

"source": [
"from dataregistry import DataRegistry\n",
"from dataregistry.schema import DEFAULT_SCHEMA_PRODUCTION\n",
"\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no DEFAULT_SCHEMA_PRODUCTION anymore.

Options would be

  • Import DEFAULT_NAMESPACE and connect to schema=f"{DEFAULT_NAMESPACE}_production"
  • Specify or not the DEFAULT_NAMESPACE (but no need), and choose query_mode="production" to limit queries to production schema

"source": [
"### Dataset properties\n",
"\n",
"Recall that a `DataRegistry` instance has a member `Query` which provides all the query services, the principal one being the ability to ask for values of attributes of datasets, subject to one or more filters. If you haven't already, we recommend you take a look at the tutorial \"Getting started: Part 3 - Simple queries\" before proceeding further.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query getting started is now "Part 2"

@stuartmcalpine stuartmcalpine self-requested a review January 29, 2025 20:11
@JoanneBogart JoanneBogart merged commit 7c0dcdd into main Jan 29, 2025
26 checks passed
@JoanneBogart JoanneBogart deleted the u/jrbogart/gcr-tutorial branch January 29, 2025 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants