-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Central schema yaml #89
Central schema yaml #89
Conversation
This contains all the tables and row entries in a central location. Now the schema creation script, the CLI and the docs can use this, so we don't have to worry about missing somewhere when adding a new entry to a table.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change all those occurrences of "row" (the ones I've marked and anything similar) to "column". That's the main thing. There are also one or two very minor things.
scripts/create_registry_db.py
Outdated
schema_yaml = load_schema() | ||
|
||
|
||
def _get_rows(schema, table): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name should be _get_columns, or _get_column_definitions rather than _get_rows. There are no rows here, only column names and definitions. "Rows" normally refers to rows of data within a table. It's true that the properties of a column can be thought of as a row in a table, but, unless there is some extra explanatory text, that is not the first thing people will think of.
All of the remaining review comments for create_registry_db.py
make essentially the same point. I see now that the code being replaced had the same confusing (to me) nomenclature; I apologize for not catching this earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've captured all instances of row -> column now
foreign_key_table: "dataset" | ||
foreign_key_row: "dataset_id" | ||
description: "Dataset this alias is linked to" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment here that, for any given row, exactly one of input_id
and input_production_id
is non-null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case something like this comes up again: I was actually thinking of a yaml comment preceding the table definition, e.g.
# For each row in this table, exactly one of input_id and input_production_id is non-null
but the way you've done it is perfectly ok, perhaps preferable.
I've changed all the instances of row->column where needed. I've tweaked the register CLI a bit to make sure it's now able to accept all the properties (it was missing creation date before for example). Also the only other change (other than your review comments) is in the last commit, where I reduce the register doc strings (which were pretty large). Basically any input args that are column names have a ** to refer to the docs for their description. The desciption of what the column represents doesn't help the code. Plus in the spirit of this PR its to reduce places in the code where these descriptions are duplicated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
foreign_key_table: "dataset" | ||
foreign_key_row: "dataset_id" | ||
description: "Dataset this alias is linked to" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case something like this comes up again: I was actually thinking of a yaml comment preceding the table definition, e.g.
# For each row in this table, exactly one of input_id and input_production_id is non-null
but the way you've done it is perfectly ok, perhaps preferable.
The schema information is replicated in three places
So we don't have to replicate all the entries each time in these three cases, I've put the schema in a yaml file, which all three sections above can read.
This tidies the code a lot, but also when we add a new row in a table we only have to do it in one place.