New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add support for tables with primary keys not named `id` #11

Open

WisdomWolf wants to merge 2 commits into benchling:master from WisdomWolf:master

WisdomWolf commented Dec 16, 2020

No description provided.

WisdomWolf added 2 commits

December 15, 2020 21:44


          adds support for primary keys with names other than id

30a4baf


          bumps version

9663dab

somakrdas self-requested a review

January 17, 2021 22:28

somakrdas requested changes

View reviewed changes

Contributor

somakrdas left a comment

Hi @WisdomWolf, thanks for your contribution! I enjoyed reviewing it and left my comments below. Sorry for my delay.

Would you be up for sharing the column name that motivated this change?
I'd love to learn about other conventions and use cases.

BTW, I wanted to offer - I'm happy to help with implementing some of these comments if you'd like. Let me know if interested.

I've also added @vineetgopal as a secondary PR reviewer. @vineetgopal, could you take a look after I approve on my end?

sqlalchemy_batch_inserts/__init__.py

@@ @@ -4,7 +4,7 @@ @@
               import sqlalchemy
-              __version__ = "0.0.4"
+              __version__ = "0.0.5"

Contributor

somakrdas Jan 21, 2021

This brought up a good discussion on if we should separate functional and version changes. Currently our answer is yes, let's separate functional from version changes. That way, we can separately decide how to bump the version.

Accordingly, could you revert this line?

Suggested change

__version__ = "0.0.5"

After this merges, I will (1) bump this and setup.py to a new version and (2) release.

sqlalchemy_batch_inserts/__init__.py

@@ @@ -19,7 +19,7 @@ def _get_column_python_type(column): @@
               def _has_normal_id_primary_key(base_mapper):
-                  """Check if the primary key for base_mapper is an auto-incrementing integer `id` column"""
+                  """Check if the primary key for base_mapper is an auto-incrementing integer column"""

Contributor

somakrdas Jan 21, 2021

👍 - could you generalize the remaining instances of id in this module, like symbol names and comments?

For example, _has_normal_id_primary_key → _has_normal_primary_key.

Hoping to generalize all instances of id except any examples that should remain as-is.

sqlalchemy_batch_inserts/__init__.py

Comment on lines +35 to +37

+                          python_column_type == int
+                          and primary_key_col.autoincrement in ("auto", True)
+                          and primary_key_col.table == base_mapper.local_table

Contributor

somakrdas Jan 21, 2021

Could you follow the single-indent style from before?

Suggested change

      
                        python_column_type == int
          
                        and primary_key_col.autoincrement in ("auto", True)
          
                        and primary_key_col.table == base_mapper.local_table
          
                    python_column_type == int
          
                    and primary_key_col.autoincrement in ("auto", True)
          
                    and primary_key_col.table == base_mapper.local_table

We should eventually set up a code formatter for this codebase to automate.

sqlalchemy_batch_inserts/__init__.py

-                      and primary_key_col.autoincrement in ("auto", True)
-                      and primary_key_col.table == base_mapper.local_table
+                          python_column_type == int
+                          and primary_key_col.autoincrement in ("auto", True)

Contributor

somakrdas Jan 22, 2021 •

edited

Loading

I noticed an existing issue that _has_normal_id_primary_key returns true for primary columns that are not auto-incremented.

For example, in my company's codebase, we have Source table with a destination_id pkey which is a fkey to Destination.id. _has_normal_id_primary_key for this table returns true, even though it is not auto-incremented.

Notably, primary_key_col.autoincrement is auto so it passes this check, even though ultimately SQLAlchemy doesn't auto-increment this pkey. https://docs.sqlalchemy.org/en/13/core/metadata.html?highlight=autoincrement#sqlalchemy.schema.Column.params.autoincrement explains that auto only results in auto-increment if

Column is integer derived - checked on L35
Column is pkey - checked on L23
Column is not fkey - not checked - could check by adding not primary_key_col.foreign_keys to this check

Even though this is an existing issue, for that codebase, it started appearing with non-id pkeys, so I think it's worth fixing in this PR.
I'm curious what is the best way to check that primary_key_col is auto-incremented with SQLA. Some ideas:

[My preference] Add not primary_key_col.foreign_keys to this check
Replace L36 with primary_key_col is primary_key_col.table._autoincrement_column from https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/sql/schema.py#L3775 , though it'd be an undocumented, underscore-prefixed attribute

Open to others! CC: @vineetgopal

sqlalchemy_batch_inserts/__init__.py

                   )
+              def _get_primary_key_name(base_mapper):
+                  primary_key_cols = base_mapper.primary_key

Contributor

somakrdas Jan 22, 2021

I noticed an existing issue that the module assumes the attribute name and column name are the same, but there could be exceptions to that if a caller defines attribute_name = db.Column("column_name", db.Integer, primary_key=True). This could've affected an id column name and could come up with non-id pkey column names.
My company's codebase has some columns like these, though none are pkeys, so I'm not sure how often it comes up in practice. So, to make the assumption clear, could we assert that the attribute name and column name are the same?

Here's a start, but open to simpler implementations:

[primary_key_col_attr] = [c for c in base_mapper.column_attrs if primary_key_col in c.columns]
assert primary_key_col.name == primary_key_col_attr.key, \
    "_get_primary_key_name only supports columns whose name equals their attribute name"

sqlalchemy_batch_inserts/__init__.py

+                  assert _has_normal_id_primary_key(base_mapper), \
+                      "_get_id_sequence_name only supports single auto increment primary keys"
+                  primary_key = _get_primary_key_name(base_mapper)
+                  return f"{base_mapper.entity.__tablename__}_{primary_key}_seq"

Contributor

somakrdas Jan 22, 2021

This brought up a good discussion on how backward compatible we should be with Python 2. For this release, our answer is yes, let's remain backward compatible.

Accordingly, could you change to the old-style string formatting from before?

Suggested change

      
                return f"{base_mapper.entity.__tablename__}_{primary_key}_seq"
          
            "%s_%s_seq" % (base_mapper.entity.__tablename__, primary_key)

sqlalchemy_batch_inserts/__init__.py

-                              raise AssertionError("Expected models to have auto-incrementing `id` primary key")
+                              raise AssertionError("Expected models to have auto-incrementing primary key")
+                      primary_key = _get_primary_key_name(base_mapper)

Contributor

somakrdas Jan 22, 2021 •

edited

Loading

Could you rename to primary_key_name to distinguish this pkey name (like the _get_primary_key_name function does) from other possible variables like pkey column (primary_key_col) or value?

Suggested change

      
                    primary_key = _get_primary_key_name(base_mapper)
          
                    primary_key_name = _get_primary_key_name(base_mapper)

somakrdas requested a review from vineetgopal

January 22, 2021 04:35

Author

WisdomWolf commented Feb 4, 2021

Sorry it's taken me a while to get back to this, been pretty busy recently. I should be able to address most of these requests, hopefully in the next few days.

vineetgopal requested changes

View reviewed changes

Contributor

vineetgopal left a comment

Thanks WisdomWolf for putting this up! I'm requesting changes for now as you're changing things re: @somakrdas's comments, but happy to review again afterwards.

Contributor

somakrdas commented Apr 24, 2021

Hi @WisdomWolf, wanted to share an update with you on the SQLAlchemy side - SQLAlchemy now supports "batch inserts" (with retrieval of newly generated primary key values) natively in version 1.4: https://groups.google.com/g/sqlalchemy/c/GyAZTThJi2I/m/k14nv8HzEgAJ , https://docs.sqlalchemy.org/en/14/changelog/migration_14.html#psycopg2-dialect-features-execute-values-with-returning-for-insert-statements-by-default

@vineetgopal and I will discuss this more on our end for sqlalchemy_batch_inserts generally, but I wanted to share this with you (and anyone following along) just in case it affects your plans for this PR, since upgrading SQLAlchemy to 1.4 may handle your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet