Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set-returning UDFs #1604

Open
lightjacket opened this issue Jan 18, 2022 · 4 comments
Open

Set-returning UDFs #1604

lightjacket opened this issue Jan 18, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@lightjacket
Copy link

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I'm looking at add an inverted index on my data and use it as the first step in more complicated data transformations. There's particular optimization that I think might be better done outside of the DataFusion engine.

Describe the solution you'd like
I'm imagining something like the following, where inverted_index_search returns an arbitrary number of rows.

SELECT inverted_index_search('some query');
+------+
| id   |
+------+
|  1   |
|  2   |
+------+

I'd like to be able to write a udf (or udaf, or something new) that can return as many rows as it needs.

Describe alternatives you've considered
I looked at having my UDF return an array, but once I had the array I could not figure out how to unnest it. Issue #212 looks to cover that, but I think it might be better to just have the function return multiple rows directly.

Additional context
Apologies if I missed something that is already available for this.

@lightjacket lightjacket added the enhancement New feature or request label Jan 18, 2022
@jimexist
Copy link
Member

@lightjacket in my opinion a better idea is to return a list in one row?

@lightjacket
Copy link
Author

@jimexist thanks for the suggestion! I did consider that as an alternative but would need to unnest the result in order to be able to join against additional tables like so:
Set-returning

SELECT * FROM 
(SELECT inverted_index_search('some query') as item_id) t
LEFT JOIN additional_data ON t.item_id = additional_data.item_id;

Single row w/ a list

SELECT * FROM 
(SELECT unnest(inverted_index_search('some query')) as item_id) t
LEFT JOIN additional_data ON t.item_id = additional_data.item_id;

So if the unnest function existed, I would be unblocked. I think if I could create a set-returning UDF I could also implement unnest myself as a user and wouldn't need to wait on a built-in unnest.

If there is an alternative to unnest to get that join I am looking for though, that would absolutely work. I just haven't found it yet if it exists.

@jimexist
Copy link
Member

let's add unnest as it is more general:

@lightjacket
Copy link
Author

I'd argue that unnest is less general (I'd think unnest would be a specific implementation of a set-returning UDF), but the unnest function certainly solves my use-case so if that's the preference, it certainly works for me. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants