-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove parsing table name in row_filter
#1689
Merged
Merged
Changes from 2 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
3573cfb
Deprecate for 0.9.0 release
geruh 2fdd981
Deprecate for 0.9.0 release
geruh a7f6bb0
Joining column ref's on nested columns
geruh 7bb4e7d
Add another test with nested and dot name
geruh ab3fc86
Add test for parser with quoted dot column
geruh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the comment, I think this is what we want:
We should not raise a
ValueError
, and allow for nested fields:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looping in the expert @sungwy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello!
There's two missing features in PyIceberg right now with expression parsing:
I think @Fokko 's suggested change will make it work for both those cases, and I think there will be value in testing the Reference binding for both cases in
tests/expressions/test_expressions.py
.I can think of three test cases:
Where the last one should throw an error that's propagated from the Schema having ambiguous names.
I had originally thought that this work would be a lot more complicated because we would want to distinguish the nested fields from simple field names with "." delimiters. But I think we can just rely on PyIceberg not supporting ambiguous field names for now as we introduce nested parsing support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool I did see @sungwy had a PR open for this #965, and there were some concerns with this approach. Particularity this comment #965 (comment) unless I'm missing something. Let me catch up here and see if I can add this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I made the changes to include the joining logic for the column references. As long as there's parity between a bound schema and the expression, I don’t see any issues.
One thing I noticed is that if we have a nested field with a dot, like
foo.bar,
and a field nested inside it asbaz
, it's treated asfoo.bar.baz
, and the expression respects that.Additionally, it's worth noting that the parser currently doesn’t allow a quoted identifier to represent a single field containing a dot, such as
"foo.bar".baz
. Due to the expectations set here: https://github.com/apache/iceberg-python/blob/main/pyiceberg/expressions/parser.py#L82There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for making the change!
I think its worth adding a test for this and show that its currently not supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest one I want to unlock here are nested fields.
I think the probability of having a collation between a nested field, and a field with a dot is low. If this becomes a problem, we could always extend (in a separate PR) the
Reference
class to allow passing in a tuple to make it explicit.Before doing this, we would also need to establish a way to express this in the SQL-like syntax:
As an example from Databricks: https://docs.databricks.com/aws/en/sql/language-manual/functions/dotsign