Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1877449:Exception should be thrown when create df with null value and nullable set to False #2849

Merged
merged 9 commits into from
Jan 15, 2025

Conversation

sfc-gh-yuwang
Copy link
Collaborator

@sfc-gh-yuwang sfc-gh-yuwang commented Jan 10, 2025

…lable column

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1877449

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
  3. Please describe how your code solves the related issue.

    This bug caused by ARRAY_BIND_THRESHOLD parameter in snowpark, we use it to decide whether we insert the data into temporary table when creating a dataframe. Currently the threshold is 512. So if row number * colum number <512, no insert query is executed, which means no error will be thrown when NULL value is in non-nullable column. This mean we had a false behavior before and some of the test needs to be fixed

when create a dataframe where number of columns * number of rows < 512, instead of inserting data into snowflake table, such query is executed:

SELECT "KEY" FROM ( SELECT $1 AS "KEY" FROM VALUES (NULL :: INT)) LIMIT 10
this query would ignore all nullable settings on all columns, making snowpark have different behaviors when creating small and big dataframes.

This query cannot be modified so that it have the same behvior of inserting null value into a non-nullable table. So I add some assertions in our code to throw the same error, giving user consistent experiences.

@sfc-gh-yuwang sfc-gh-yuwang marked this pull request as ready for review January 14, 2025 17:55
@sfc-gh-yuwang sfc-gh-yuwang requested review from a team as code owners January 14, 2025 17:55
Comment on lines 168 to 171
for i in range(rows_to_compare):
for j in range(len(self.output)):
if self.data[i][j] is None and not self.output[j].nullable:
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially a lot fewer comparisons if you restructured like this.

Suggested change
for i in range(rows_to_compare):
for j in range(len(self.output)):
if self.data[i][j] is None and not self.output[j].nullable:
return True
for j in range(len(self.output)):
if not self.output[j].nullable:
for i in range(rows_to_compare):
if self.data[i][j] is None:
return True

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, will change

CHANGELOG.md Outdated Show resolved Hide resolved
tests/integ/test_dataframe.py Outdated Show resolved Hide resolved
@sfc-gh-yuwang sfc-gh-yuwang merged commit 1ed17cc into main Jan 15, 2025
53 checks passed
@sfc-gh-yuwang sfc-gh-yuwang deleted the SNOW-1877449 branch January 15, 2025 21:19
@github-actions github-actions bot locked and limited conversation to collaborators Jan 15, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants