-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement catalog filter for KedroDataCatalog
#4449
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
…talog-filter Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already discussed on the call that this looks good! 👍 Left some minor comments.
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
manually tested, looks good! 👍🏾
After the conversation with @idanov, we concluded that this method doesn't bring much new functionality, so it makes sense to keep the old naming. However, we're still unsure about its necessity. Our initial thinking was that the original Based on the above we still have two options:
Please share your feedback on these two points (naming and necessity) 🙏 |
The 3 things I don't like as a user today:
|
I don't follow, could you elaborate? How would the code would look like with and without this PR? |
pattern = re.compile(regex, flags=regex_flags)
filtered_ds = [ds_name for ds_name in catalog.keys() if pattern.search(ds_name)] # or iterate via values/items The point is that since dataset names can be easily accessed, it's also easy to apply custom filtering on top, and we don't need a dedicated method for it. However, it's a bit more complicated with dataset types, so filtering by types could add some value. |
You're technically correct, but I think there is something to be said for the loss of DX / convenience for a function designed to use in interactive mode. |
self, | ||
name_regex: str | None = None, | ||
name_regex_flags: int | re.RegexFlag = re.IGNORECASE, | ||
type_regex: str | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I've read the PR description about filter by type I though that will be basically instanceof
filtering, allowing to pass Type
parameter, not string based class name, moreover a fully qualified one, which is more laborious to obtain for a plugin developer than just doing data_catalog.filter(by_type=PluginSpecificDataset)
. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I hadn't even considered that, I thought it would be the string representation people do in the YAML config! To be honest, both would be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that as well but in case we allow both Type and string type. So extending the suggested implementation sounds good to me.
Thanks @ElenaKhaustova , that's what I imagined. In my view, given that our users aren't necessarily software engineers, our APIs shouldn't just be perfect partitions of the set of possible use cases, but allow for some convenience methods that, as @datajoely says, alleviate the burden a bit. |
I still feel like So my vote would be: introduce |
Could you please clarify whether you mean keeping the existing |
Description
Implemented
KedroDataCatalog.filter()
to filter datasets by name and type.Solves #3917
Reasoning: #3917 (comment)
Development notes
Implemented on top of #4448 <-- please review it first.
Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file