Link schemas #170

stuartmcalpine · 2024-12-13T12:28:26Z

Have the working and production schemas work more seamlessly together.

Now when connecting to the registry the user only specifies the working schema. Through the provenance table, the associated production schema is automatically known.

The fundamental operations are the same for registering/modifying/deleting entries. For production entries, when initiating the DataRegtistry instance, users must flag production_mode=True to set the production schema to be the target (the specified schema during initiation is always the "working" one).

Now when querying both schemas (working and production) are searched. This is done through two independent queries to each schema and the results concatenated. For this reason I've removed the sqlalchemy CursorResult as a return option, as there is no real easy way to combine them. Now the only returns are the property_dict and the dataframe.

Removed TableMetadata class, put the reflection and information within the DbConnection object.

Thoughts

Change instances of "schema" to "database" (in terms of wording/reference)? More familiar sounding, and now the idea of working with specific schemas is removed and pushed behind the scenes.
Have a flag to ignore searching the production schema during queries?

Todo

JoanneBogart

There are some comments associated with particular lines, and I haven't yet looked in detail at all the files under dataregistry_cli or tests, though I expect they're probably ok, except for the code to test aliases. If the tests (which I originally wrote!) really exercised everything, something should have failed.
Among the alias routines, resolve_alias and find_aliasesat least need some revision. The calls to _render_filter are missing the new schema argument. We need to decide how to handle the possibility of two schemas. Either for aliases we can do the same as is now done for datasets (combine the results of two queries) or else add a schema argument. In the latter case, should it have a default? I'm leaning towards "no", because it doesn't seem like it has a natural default, but I could be convinced otherwise.

JoanneBogart · 2024-12-16T19:33:09Z

src/dataregistry/db_basic.py

+            results = conn.execute(stmt)
+            r = results.fetchone()
+        if r is None:
+            warnings.warn(


Instead of issuing a warning, why even make the query when called in the context of database creation? In the old way of doing things, there was the get_db_version flag to indicate whether or not the query should be made. Is it possible to do something similar here? One way might be to add another argument creation_mode defaulting to False when setting up a DbConnection. It would be set to True when create_registry_schema.py creates connections. (I haven't thought this through that carefully but I think it would do.)

I've added a creation_mode flag for the DbConnection which skips querying the provenance table during reflection. Now if nothing is found in the provenance table, an exception is raised rather than a warning.

Looks good. Please add a comment for creation_mode under "Parameters" in the docstring for DbConnection.init

src/dataregistry/db_basic.py

src/dataregistry/query.py

…reation

stuartmcalpine · 2025-01-03T15:23:10Z

I will move the renaming instances of "schema" to "database" to a new PR, and update the docs to reflect

JoanneBogart

I've made some minor suggestions inline. Concerning resolve_alias, I think we need to provide a way to look in the production schema. I suggest for now we add an argument schema or perhaps a boolean, e.g. is_working which defaults to True. Then the returns should include either the schema name or a boolean indicating whether the entry was from the production schema or not. I'm not sure how best to indicate this for the sqlite case and the case when working schema = production schema. Maybe boolean for input argument but schema name for return?

JoanneBogart · 2025-01-06T18:35:33Z

src/dataregistry/db_basic.py

+            provenance table. In the default mode both schemas
+            working/production are avaliable for queries, but new
+            entries/modifications are done to the working schema. To create new
+            entries/modifications to production entries, `production_mode` must


If production_mode is True you can get away with a little less work. You've alreaady done metadata.reflect(self.engine, self.schema) at line 253 so you don't need metadata.reflect(self.engine, self._prod_schema). The code in lines 276 - 283 should be modified to recognize this case.

The schema that is passed to the DbConnection is always the working schema, though we might change this to collection as we discussed.

production_mode does not mean it only connects to the production schema, it just means it defaults to the production schema for entries and modifications during that instance. I've tried to make the doc string a bit clearer.

But in all cases, i.e., production_mode true or not, it always reflects/connects to both schemas, as querying always searches both schemas regardless.

JoanneBogart · 2025-01-06T19:11:34Z

src/dataregistry/db_basic.py

+
+                if column.name in all_columns:
+                    duplicates.add(column.name)
+                all_columns.append(column.name)


Shouldn't this be

all_columns.append(column.name)

No need to append it if it's already there.

Not sure what you mean by your recommendation (that is the line that is in there already). I assume you mean no need to all_columns to accumulate multiple instances of the same column name, I have made it a set rather than a list.

I meant that the code could read

if att not in column_list.keys(): column_list[att] = [temp_column_list[att]] else: column_list[att].extend(tmp_column_list[att])

but it doesn't matter much. It's ok the way it is.

src/dataregistry/query.py

JoanneBogart

I made a couple new, very minor, comments, but basically it looks ok except for one thing: I believeresolve_alias does not cover a case I would like to see covered (working schema is the active but the user wishes to resolve a production alias). As far as I can tell one would have to make a separate DbConnection with production mode set in order to do it. I propose we leave addressing that issue for a separate PR.

stuartmcalpine added 4 commits December 13, 2024 13:19

Link schemas

45207d0

Remove redundant TableMetadata class

2fa218a

Fix CLI query

ab36caf

Fix sqlite

000c36d

stuartmcalpine requested a review from JoanneBogart December 13, 2024 15:54

JoanneBogart requested changes Dec 17, 2024

View reviewed changes

stuartmcalpine added 7 commits December 19, 2024 12:44

Add flag for that skips querying the provenance table during schema c…

433f140

…reation

Remove db version function

f7a1c19

Add docstring to _render_filter function

6710663

Tidy reflect function

e37a038

Add duplicate_column_names list to db_connection to help with querying

615a157

Fix query test

444f477

Fix find_aliases function

4e1a4f5

stuartmcalpine requested a review from JoanneBogart January 3, 2025 15:22

Fix sqlite tests

2cd2ee0

JoanneBogart requested changes Jan 6, 2025

View reviewed changes

address reviewer comments

f459196

stuartmcalpine requested a review from JoanneBogart January 14, 2025 14:31

Update changelog

e6f662b

JoanneBogart approved these changes Jan 15, 2025

View reviewed changes

stuartmcalpine added 2 commits January 16, 2025 15:16

Add doc string

284afbd

Apply code reformatting

463ee9e

stuartmcalpine merged commit 84272e4 into main Jan 16, 2025
26 checks passed

stuartmcalpine deleted the u/stuart/link_schemas branch January 16, 2025 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Link schemas #170

Link schemas #170

stuartmcalpine commented Dec 13, 2024 •

edited

Loading

JoanneBogart left a comment

JoanneBogart Dec 16, 2024

stuartmcalpine Dec 19, 2024

JoanneBogart Jan 15, 2025

stuartmcalpine commented Jan 3, 2025

JoanneBogart left a comment

JoanneBogart Jan 6, 2025

stuartmcalpine Jan 14, 2025

JoanneBogart Jan 6, 2025

stuartmcalpine Jan 14, 2025

JoanneBogart Jan 15, 2025

JoanneBogart left a comment

Link schemas #170

Link schemas #170

Conversation

stuartmcalpine commented Dec 13, 2024 • edited Loading

Thoughts

Todo

JoanneBogart left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuartmcalpine commented Jan 3, 2025

JoanneBogart left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoanneBogart left a comment

Choose a reason for hiding this comment

stuartmcalpine commented Dec 13, 2024 •

edited

Loading