-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allow spread operators in to-many relationships #3640
base: main
Are you sure you want to change the base?
Conversation
1a02416
to
dd89c12
Compare
My approach right now is to generate this query for a to-many request: curl 'localhost:3000/clients?select=name,...projects(name,id)' SELECT "test"."clients"."name",
"clients_projects_1"."name",
"clients_projects_1"."id"
FROM "test"."clients"
LEFT JOIN LATERAL (
SELECT json_agg("projects_1"."name") AS "name",
json_agg("projects_1"."id") AS "id"
FROM "test"."projects" AS "projects_1"
WHERE "projects_1"."client_id" = "test"."clients"."id"
) AS "clients_projects_1" ON TRUE Right now this gives the expected result. But aggregates are not working correctly, because they are designed to be selected in the top query with a SELECT "test"."clients"."name",
json_agg("clients_projects_1"."name") AS "name",
json_agg("clients_projects_1"."id") AS "id"
FROM "test"."clients"
LEFT JOIN LATERAL (
SELECT "projects_1"."name",
"projects_1"."id"
FROM "test"."projects" AS "projects_1"
WHERE "projects_1"."client_id" = "test"."clients"."id"
) AS "clients_projects_1" ON TRUE
GROUP BY "test"."clients"."name" Not sure which one is better/easier right now... I'm thinking the latter. |
Having the json_agg in the outer query would make the query cleaner, imho. |
dd89c12
to
dce8597
Compare
Some caveats I encountered: Repeated values and orderDo we want to keep repeated values in the results? For example (not the best use case, just to illustrate): curl 'localhost:3000/project?select=name,...tasks(tasks:name,due_dates:due_date)' [
{
"name": "project 1",
"tasks": ["task 1", "task 2", "task 3", "task 4"],
"due_dates": [null, "2024-08-08", "2024-08-08", null]
}
] Here we're repeating Nested To-Many SpreadsI have a doubt on what to expect with nested to-many spreads. For example, on a non-nested to-many spread like this one: curl 'localhost:3000/entities?select=name,...child_entities(children:name)' We would expect: [
{"name": "entity 1", "children": ["child entity 1", "child entity 2"]},
{"name": "entity 2", "children": ["child entity 3"]},
"..."
] But what if we nest another to-many spread embedding with a new column to aggregate: curl 'localhost:3000/entities?select=name,...child_entities(children:name,...grandchild_entities(grandchildren:name))' I understand that we're hoisting all the aggregates to the top level, and not grouping by the intermediate columns ( [
{"name": "entity 1", "children": ["child entity 1", "child entity 2"], "grandchildren": ["grandchild entity 1", "grandchild entity 2", "..."]},
{"name": "entity 2", "children": ["child entity 3"], "grandchildren": []},
"..."
] This cannot be achieved by a simple SELECT "api"."entities"."name",
json_agg(DISTINCT "entities_child_entities_1"."children") AS "children",
json_agg(DISTINCT "entities_child_entities_1"."grandchildren") AS "grandchildren"
FROM "api"."entities"
LEFT JOIN LATERAL (
SELECT "child_entities_1"."name" AS "children",
"child_entities_grandchild_entities_2"."grandchildren" AS "grandchildren"
FROM "api"."child_entities" AS "child_entities_1"
LEFT JOIN LATERAL (
SELECT "grandchild_entities_2"."name" AS "grandchildren"
FROM "api"."grandchild_entities" AS "grandchild_entities_2"
WHERE "grandchild_entities_2"."parent_id" = "child_entities_1"."id"
) AS "child_entities_grandchild_entities_2" ON TRUE
WHERE "child_entities_1"."parent_id" = "api"."entities"."id"
) AS "entities_child_entities_1" ON TRUE
GROUP BY "api"."entities"."name"; If there is no sensible interpretation of the query, another option is to prohibit these intermediate columns altogether (aggregates like sum, avg, etc. should still be possible). |
6507878
to
bd93514
Compare
38abc0e
to
6e64707
Compare
OK, this is what I got implemented so far. For example, using the tables in our spec test:
curl 'localhost:3000/factories?select=name,...processes(processes:name,...supervisors(supervisors:name))' [
{
"name": "Factory C",
"processes": ["Process C1", "Process C2", "Process XX"],
"supervisors": ["Peter", "Peter", null]
},
{
"name": "Factory B",
"process": ["Process B1", "Process B1", "Process B2", "Process B2"],
"supervisors": ["Peter", "Sarah", "Mary", "John"]
},
{
"name": "Factory A",
"process": ["Process A1", "Process A2"],
"supervisors": ["Mary", "John"]
},
{
"name": "Factory D",
"process": [null],
"supervisors": [null]
}
]⏎ [
{
"name":"Factory C",
"processes":["Process C1", "Process C2", "Process XX"],
"supervisors":[{"name": "Peter"}, {"name": "Peter"}, null]},
{
"name":"Factory B",
"processes":["Process B1", "Process B1", "Process B2", "Process B2"],
"supervisors":[{"name": "Peter"}, {"name": "Sarah"}, {"name": "Mary"}, {"name": "John"}]},
{
"name":"Factory A",
"processes":["Process A1", "Process A2"],
"supervisors":[{"name": "Mary"}, {"name": "John"}]},
{
"name":"Factory D",
"processes":[null],
"supervisors":[null]
}
] As I mentioned in previous comments, some values will repeat, since we're grouping by the factory There's a problem when the embeddings have no values, as seen in the |
9002110
to
b3e5483
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature should be ready for review now.
I'm leaving the Edit: Nvm. I figured that it should be OK to include that feature here too, although in different commits...
for DISTINCT
and NOT NULL
for another PR to keep it cleaner.
Here are some comments on the changes done:
67e6419
to
87a13ef
Compare
87a13ef
to
19466a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terrific work with the test cases, very extensive. My head explodes, though.
Because we just discussed commit message / prefixes in another PR - what's your opinion on docs/feat commits? Should they be split like in this PR or do they belong together, i.e. was the idea to squash this?
I think they should go into the same feat:
commit. A feature without docs is not a feature.
Makes sense, yes. I'll squash them to avoid problems when merging. |
02d8308
to
e969f91
Compare
dca7c2d
to
daf47a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some advances I'm making that are ready for reviewing:
- Non-flattend arrays as specified here feat: allow spread operators in to-many relationships #3640 (comment)
ORDER BY
inside the aggregate (with some caveats mentioned below)
The aggregates on the whole relationship are not yet implemented, e.g. ...to_many(count()).sum()
.
src/PostgREST/Query/QueryBuilder.hs
Outdated
Spread{rsSpreadSel, rsAggAlias} -> | ||
if relSpread == Just ToManySpread then | ||
let | ||
selection = selectJsonArray <> (if null rsSpreadSel then mempty else ", ") <> intercalateSnippet ", " (pgFmtSpreadSelectItem True rsAggAlias order <$> rsSpreadSel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to "SELECT json_agg(<subquery_alias>) AS "<subquery_alias>" to use it for not.is.null
or !inner
conditions. Can be seen how it's added in the previous comment's example.
Does it make sense to leave this out for this PR? This seems already complex enough :) |
I wanted to include it since it would solve what's mentioned in the original issue #3041 (under Spread on Count). But yes, I would consider it a separate feature, we could leave it for another PR and don't let this one close the issue completely. |
I looked at the issue again and I think we need to take the following into account:
and
We mostly discussed the second case, in which I argued that I expect an array of counts as a return, matching the array of supervisors. But we didn't really discuss the first case, which seems to be the case in the issue. I think the first case should not return a single item array, but indeed the overall count. The basic idea would be: We use array aggregation for x2m embeddings. But once we aggregate inside this embedding without any GROUP BY columns, then we don't have an x2m embedding anymore, but an x2o. The "relation" we are embedding is guaranteed to return only one row. Taking this into account I wonder whether we actually need the more complex syntax |
Yes, that's a nice approach, I agree. There wouldn't be a need for the more complex syntax anymore. I haven't checked yet but there may be some caveats with nested spreads and this implementation. I'll let you know if I find something along the way. |
Thinking about the The motivation comes from the comment on #3041 (comment). But I think the main use case is just forming an array of one column and running aggregates on them, for this we wouldn't need to worry about ORDER. Perhaps we could leave multiple columns for later? |
c3ee0e2
to
4600a2b
Compare
The ordering is now complete and ready for review. If everything's OK there would be no need to leave for later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only looked at the docs and the AggregateFunctionsSpec.hs file. Still have a few questions, let's discuss those first, before I continue with the other test file.
Maybe not everything we discussed is implemented or maybe I missed something else, not sure.
Ah, I guess I missed the fact that you only changed the ordering stuff, not anything else :D |
Yup 😄, the "do not wrap into arrays if there's no GROUP BY columns" is still a WIP. |
NVM, it should be complete now. |
Codecov keeps complaining but I don' think I can appease it any further. Almost all the % is due to the new types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through all the test-cases again and they look great to me. Output as expected. That's terrific!
One thing that could maybe be tested is a combination of aggregate and "spread returns array of objects". I think there is one case already where something similar is tested, but not with the aggregate inside what will later be the object in the array. Can this happen, does it work?
Now, I am looking at the generated SQL query. Some notes:
- I see lines like
json_agg("factories_processes_1")::jsonb AS "factories_processes_1"
. Why do we aggregate as json and then cast to jsonb? We could aggregate viajsonb_agg
directly. Although, maybe we don't need to cast to jsonb at all? Not sure where that comes from. - Otherwise the random queries I looked at look good, I guess.
I did not look at the code.
a8ecc05
to
7d3fd72
Compare
Looks like this was added here: 1c60b50#diff-07da031ac18ee14d9fbb3f65e58ffad3565001cb0249ce92ff55c36ffd50dcf0R106 There is another {"code":"42883","details":null,"hint":null,"message":"could not identify an equality operator for type json"} I remember getting those errors when testing for this PR, I think it needed to have a
I don't think there's a practical use for aggregates inside an array of objects, since that's done to a to-one spread, although it works: curl 'localhost:3000/supervisors?select=...processes(name,factories(name,count()))' [
{"name":["Process A1", "Process B2"],"factories":[{"name": "Factory A", "count": 1}, {"name": "Factory B", "count": 1}]},
{"name":["Process A2", "Process B2"],"factories":[{"name": "Factory A", "count": 1}, {"name": "Factory B", "count": 1}]},
{"name":["Process B1", "Process C1", "Process C2"],"factories":[{"name": "Factory B", "count": 1}, {"name": "Factory C", "count": 1}, {"name": "Factory C", "count": 1}]},
{"name":["Process B1"],"factories":[{"name": "Factory B", "count": 1}]},
{"name":[],"factories":[]}
] Or maybe do you mean something a bit more complex? like: curl 'localhost:3000/supervisors?select=...processes(name,factories(name,...factory_buildings(buildings:count())))' [
{"name":["Process A1", "Process B2"],"factories":[{"name": "Factory A", "buildings": 2}, {"name": "Factory B", "buildings": 2}]},
{"name":["Process A2", "Process B2"],"factories":[{"name": "Factory A", "buildings": 2}, {"name": "Factory B", "buildings": 2}]},
{"name":["Process B1", "Process C1", "Process C2"],"factories":[{"name": "Factory B", "buildings": 2}, {"name": "Factory C", "buildings": 1}, {"name": "Factory C", "buildings": 1}]},
{"name":["Process B1"],"factories":[{"name": "Factory B", "buildings": 2}]},
{"name":[],"factories":[]}
] For an array of arrays (of objects) it also works. I added these to the tests. LMK if that's not it. |
👍
Very true :)
Cool! |
I'm not deep enough into the Planner and Query Builder to be able to properly review the code side of things. @steve-chavez could you look at it again? There have been quite some changes since your last review, I think. |
Closes #3041