feat: allow spread operators in to-many relationships #3640

laurenceisla · 2024-07-05T00:56:21Z

laurenceisla · 2024-07-06T00:13:57Z

My approach right now is to generate this query for a to-many request:

curl 'localhost:3000/clients?select=name,...projects(name,id)'

SELECT "test"."clients"."name",
       "clients_projects_1"."name",
       "clients_projects_1"."id"
FROM "test"."clients"
LEFT JOIN LATERAL (
  SELECT json_agg("projects_1"."name") AS "name",
         json_agg("projects_1"."id") AS "id"
  FROM "test"."projects" AS "projects_1"
  WHERE "projects_1"."client_id" = "test"."clients"."id"
) AS "clients_projects_1" ON TRUE

Right now this gives the expected result. But aggregates are not working correctly, because they are designed to be selected in the top query with a GROUP BY. A solution would be to not do the json_agg() inside the sub-query and do it in the top one and treat it as another aggregate (with GROUP BY). Like this:

SELECT "test"."clients"."name",
      json_agg("clients_projects_1"."name") AS "name",
      json_agg("clients_projects_1"."id") AS "id"
FROM "test"."clients"
LEFT JOIN LATERAL (
  SELECT "projects_1"."name",
         "projects_1"."id"
  FROM "test"."projects" AS "projects_1"
  WHERE "projects_1"."client_id" = "test"."clients"."id"
) AS "clients_projects_1" ON TRUE
GROUP BY "test"."clients"."name"

Not sure which one is better/easier right now... I'm thinking the latter.

src/PostgREST/Error.hs

wolfgangwalther · 2024-07-06T09:29:21Z

Right now this gives the expected result. But aggregates are not working correctly, because they are designed to be selected in the top query with a GROUP BY. A solution would be to not do the json_agg() inside the sub-query and do it in the top one and treat it as another aggregate (with GROUP BY).

Having the json_agg in the outer query would make the query cleaner, imho.

laurenceisla · 2024-07-17T01:19:59Z

Some caveats I encountered:

Repeated values and order

Do we want to keep repeated values in the results? For example (not the best use case, just to illustrate):

curl 'localhost:3000/project?select=name,...tasks(tasks:name,due_dates:due_date)'

[
  {
    "name": "project 1",
    "tasks": ["task 1", "task 2", "task 3", "task 4"],
    "due_dates": [null, "2024-08-08", "2024-08-08", null]
  }
]

Here we're repeating null and "2024-08-08", so maybe we don't want to do this and just return [null, "2024-08-08"] (perhaps also remove null values?). Doing this will not guarantee the same dimensions for the same aggregated relationship, and definitely not the order of the results (which wasn't guaranteed before either). Doing a DISTINCT inside the json_agg() is a possible solution (the next caveat has an example query).

Nested To-Many Spreads

I have a doubt on what to expect with nested to-many spreads. For example, on a non-nested to-many spread like this one:

curl 'localhost:3000/entities?select=name,...child_entities(children:name)'

We would expect:

[
  {"name": "entity 1", "children": ["child entity 1", "child entity 2"]},
  {"name": "entity 2", "children": ["child entity 3"]},
  "..."
]

But what if we nest another to-many spread embedding with a new column to aggregate:

curl 'localhost:3000/entities?select=name,...child_entities(children:name,...grandchild_entities(grandchildren:name))'

I understand that we're hoisting all the aggregates to the top level, and not grouping by the intermediate columns (entities.name), because they should also be aggregated. I'm assuming that the result should be the same as above but also with the aggregated grandchild_entities.name.

[
  {"name": "entity 1", "children": ["child entity 1", "child entity 2"], "grandchildren": ["grandchild entity 1", "grandchild entity 2", "..."]},
  {"name": "entity 2", "children": ["child entity 3"], "grandchildren": []},
  "..."
]

This cannot be achieved by a simple GROUP BY, because duplicated values will be returned by entities.name (which, perhaps, is not what we want). A solution would also be to use DISTINCT. The query would look like this:

SELECT "api"."entities"."name",
       json_agg(DISTINCT "entities_child_entities_1"."children") AS "children",
       json_agg(DISTINCT "entities_child_entities_1"."grandchildren") AS "grandchildren"
FROM "api"."entities"
LEFT JOIN LATERAL (
  SELECT "child_entities_1"."name" AS "children",
         "child_entities_grandchild_entities_2"."grandchildren" AS "grandchildren"
  FROM "api"."child_entities" AS "child_entities_1"
  LEFT JOIN LATERAL (
    SELECT "grandchild_entities_2"."name" AS "grandchildren"
    FROM "api"."grandchild_entities" AS "grandchild_entities_2"
    WHERE "grandchild_entities_2"."parent_id" = "child_entities_1"."id"
  ) AS "child_entities_grandchild_entities_2" ON TRUE
  WHERE "child_entities_1"."parent_id" = "api"."entities"."id"
) AS "entities_child_entities_1" ON TRUE
GROUP BY "api"."entities"."name";

If there is no sensible interpretation of the query, another option is to prohibit these intermediate columns altogether (aggregates like sum, avg, etc. should still be possible).

src/PostgREST/Plan.hs

laurenceisla · 2024-09-07T01:54:20Z

OK, this is what I got implemented so far. For example, using the tables in our spec test:

Factories <-02M-> processes <-M2M-> supervisors

curl 'localhost:3000/factories?select=name,...processes(processes:name,...supervisors(supervisors:name))'

[
 {
  "name": "Factory C",
  "processes": ["Process C1", "Process C2", "Process XX"],
  "supervisors": ["Peter", "Peter", null]
 },
 {
  "name": "Factory B",
  "process": ["Process B1", "Process B1", "Process B2", "Process B2"],
  "supervisors": ["Peter", "Sarah", "Mary", "John"]
 },
 {
  "name": "Factory A",
  "process": ["Process A1", "Process A2"],
  "supervisors": ["Mary", "John"]
 },
 {
  "name": "Factory D",
  "process": [null],
  "supervisors": [null]
 }
]⏎

[
  {
  	"name":"Factory C",
  	"processes":["Process C1", "Process C2", "Process XX"],
  	"supervisors":[{"name": "Peter"}, {"name": "Peter"}, null]},
  {
  	"name":"Factory B",
  	"processes":["Process B1", "Process B1", "Process B2", "Process B2"],
  	"supervisors":[{"name": "Peter"}, {"name": "Sarah"}, {"name": "Mary"}, {"name": "John"}]},
  {
  	"name":"Factory A",
  	"processes":["Process A1", "Process A2"],
  	"supervisors":[{"name": "Mary"}, {"name": "John"}]},
  {
  	"name":"Factory D",
  	"processes":[null],
	"supervisors":[null]
  }
]

As I mentioned in previous comments, some values will repeat, since we're grouping by the factory "name" without doing a DISTINCT or NOT NULL. The next step would be to implement the .. operator as mentioned here: #3640 (comment), it shouldn't be too complicated.

There's a problem when the embeddings have no values, as seen in the "Factory D" example, which has no processes and no supervisors. This is the same issue as this SO question. One solution is to do a COALESCE(NULLIF(..., '[null]'), '[]'), but this does not take into consideration valid null values (the row exists but the column value is null). The best solution is in one of the answers (filtering by the PK of the relationship), but it doesn't seem like a trivial task.

laurenceisla

This feature should be ready for review now.

~~I'm leaving the .. for DISTINCT and NOT NULL for another PR to keep it cleaner.~~ Edit: Nvm. I figured that it should be OK to include that feature here too, although in different commits.

Here are some comments on the changes done:

src/PostgREST/Query/SqlFragment.hs

src/PostgREST/ApiRequest/Types.hs

src/PostgREST/Plan/ReadPlan.hs

docs/references/api/resource_embedding.rst

wolfgangwalther

Terrific work with the test cases, very extensive. My head explodes, though.

Because we just discussed commit message / prefixes in another PR - what's your opinion on docs/feat commits? Should they be split like in this PR or do they belong together, i.e. was the idea to squash this?

I think they should go into the same feat: commit. A feature without docs is not a feature.

docs/references/api/resource_embedding.rst

test/spec/Feature/Query/SpreadQueriesSpec.hs

laurenceisla · 2024-11-05T22:52:23Z

what's your opinion on docs/feat commits? Should they be split like in this PR or do they belong together, i.e. was the idea to squash this?

I think they should go into the same feat: commit. A feature without docs is not a feature.

Makes sense, yes. I'll squash them to avoid problems when merging.

docs/references/api/resource_embedding.rst

laurenceisla

Some advances I'm making that are ready for reviewing:

Non-flattend arrays as specified here feat: allow spread operators in to-many relationships #3640 (comment)
ORDER BY inside the aggregate (with some caveats mentioned below)

The aggregates on the whole relationship are not yet implemented, e.g. ...to_many(count()).sum().

docs/references/api/resource_embedding.rst

laurenceisla · 2024-11-23T19:03:07Z

src/PostgREST/Query/QueryBuilder.hs

+      Spread{rsSpreadSel, rsAggAlias} ->
+        if relSpread == Just ToManySpread then
+          let
+            selection = selectJsonArray <> (if null rsSpreadSel then mempty else ", ") <> intercalateSnippet ", " (pgFmtSpreadSelectItem True rsAggAlias order <$> rsSpreadSel)


Still need to "SELECT json_agg(<subquery_alias>) AS "<subquery_alias>" to use it for not.is.null or !inner conditions. Can be seen how it's added in the previous comment's example.

wolfgangwalther · 2024-11-23T19:30:57Z

The aggregates on the whole relationship are not yet implemented, e.g. ...to_many(count()).sum().

Does it make sense to leave this out for this PR? This seems already complex enough :)

laurenceisla · 2024-11-23T20:29:54Z

Does it make sense to leave this out for this PR? This seems already complex enough :)

I wanted to include it since it would solve what's mentioned in the original issue #3041 (under Spread on Count). But yes, I would consider it a separate feature, we could leave it for another PR and don't let this one close the issue completely.

wolfgangwalther · 2024-12-04T18:21:30Z

Does it make sense to leave this out for this PR? This seems already complex enough :)

I wanted to include it since it would solve what's mentioned in the original issue #3041 (under Spread on Count). But yes, I would consider it a separate feature, we could leave it for another PR and don't let this one close the issue completely.

I looked at the issue again and I think we need to take the following into account:

get "/processes?select=process:name,...supervisors(count())

and

get "/processes?select=process:name,...supervisors(supervisor:name,count())

We mostly discussed the second case, in which I argued that I expect an array of counts as a return, matching the array of supervisors.

But we didn't really discuss the first case, which seems to be the case in the issue. I think the first case should not return a single item array, but indeed the overall count.

The basic idea would be: We use array aggregation for x2m embeddings. But once we aggregate inside this embedding without any GROUP BY columns, then we don't have an x2m embedding anymore, but an x2o. The "relation" we are embedding is guaranteed to return only one row.

Taking this into account I wonder whether we actually need the more complex syntax ...to_many(x, count()).sum() or so syntax. I don't think so.

laurenceisla · 2024-12-05T15:18:09Z

The basic idea would be: We use array aggregation for x2m embeddings. But once we aggregate inside this embedding without any GROUP BY columns, then we don't have an x2m embedding anymore, but an x2o. The "relation" we are embedding is guaranteed to return only one row.

Yes, that's a nice approach, I agree. There wouldn't be a need for the more complex syntax anymore. I haven't checked yet but there may be some caveats with nested spreads and this implementation. I'll let you know if I find something along the way.

steve-chavez · 2024-12-09T21:35:18Z

The order of the values inside the resulting array is unspecified.
ORDER BY inside the aggregate (with some caveats mentioned below)

Thinking about the ORDER issue, isn't the whole problem that multiple columns inside the spread are supported? /clients?select=name,...projects(name,id)

The motivation comes from the comment on #3041 (comment). But I think the main use case is just forming an array of one column and running aggregates on them, for this we wouldn't need to worry about ORDER.

Perhaps we could leave multiple columns for later?

laurenceisla · 2024-12-31T22:12:11Z

Perhaps we could leave multiple columns for later?

The ordering is now complete and ready for review. If everything's OK there would be no need to leave for later.

wolfgangwalther

I only looked at the docs and the AggregateFunctionsSpec.hs file. Still have a few questions, let's discuss those first, before I continue with the other test file.

Maybe not everything we discussed is implemented or maybe I missed something else, not sure.

test/spec/Feature/Query/AggregateFunctionsSpec.hs

wolfgangwalther · 2025-01-01T15:02:07Z

Ah, I guess I missed the fact that you only changed the ordering stuff, not anything else :D

laurenceisla · 2025-01-02T15:07:57Z

Ah, I guess I missed the fact that you only changed the ordering stuff, not anything else :D

Yup 😄, the "do not wrap into arrays if there's no GROUP BY columns" is still a WIP.

test/spec/Feature/Query/AggregateFunctionsSpec.hs

laurenceisla · 2025-01-02T19:22:23Z

Ah, I guess I missed the fact that you only changed the ordering stuff, not anything else :D

Yup 😄, the "do not wrap into arrays if there's no GROUP BY columns" is still a WIP.

NVM, it should be complete now.

laurenceisla · 2025-01-07T23:54:32Z

Codecov keeps complaining but I don' think I can appease it any further. Almost all the % is due to the new types.

wolfgangwalther

I went through all the test-cases again and they look great to me. Output as expected. That's terrific!

One thing that could maybe be tested is a combination of aggregate and "spread returns array of objects". I think there is one case already where something similar is tested, but not with the aggregate inside what will later be the object in the array. Can this happen, does it work?

Now, I am looking at the generated SQL query. Some notes:

I see lines like json_agg("factories_processes_1")::jsonb AS "factories_processes_1". Why do we aggregate as json and then cast to jsonb? We could aggregate via jsonb_agg directly. Although, maybe we don't need to cast to jsonb at all? Not sure where that comes from.
Otherwise the random queries I looked at look good, I guess.

I did not look at the code.

laurenceisla · 2025-01-22T01:17:26Z

I see lines like json_agg("factories_processes_1")::jsonb AS "factories_processes_1". Why do we aggregate as json and then cast to jsonb? We could aggregate via jsonb_agg directly. Although, maybe we don't need to cast to jsonb at all? Not sure where that comes from.

Looks like this was added here: 1c60b50#diff-07da031ac18ee14d9fbb3f65e58ffad3565001cb0249ce92ff55c36ffd50dcf0R106

There is another ::jsonb cast done there, to the row_to_json() function. If I remove it, it throws this error:

{"code":"42883","details":null,"hint":null,"message":"could not identify an equality operator for type json"}

I remember getting those errors when testing for this PR, I think it needed to have a jsonb column in the GROUP BY or it didn't work (in case we're grouping by json objects). Can't find an example right now. So I decided to leave it that way just in case there were other cases that weren't covered by tests yet.

One thing that could maybe be tested is a combination of aggregate and "spread returns array of objects". I think there is one case already where something similar is tested, but not with the aggregate inside what will later be the object in the array. Can this happen, does it work?

I don't think there's a practical use for aggregates inside an array of objects, since that's done to a to-one spread, although it works:

curl 'localhost:3000/supervisors?select=...processes(name,factories(name,count()))'

[
 {"name":["Process A1", "Process B2"],"factories":[{"name": "Factory A", "count": 1}, {"name": "Factory B", "count": 1}]},
 {"name":["Process A2", "Process B2"],"factories":[{"name": "Factory A", "count": 1}, {"name": "Factory B", "count": 1}]},
 {"name":["Process B1", "Process C1", "Process C2"],"factories":[{"name": "Factory B", "count": 1}, {"name": "Factory C", "count": 1}, {"name": "Factory C", "count": 1}]},
 {"name":["Process B1"],"factories":[{"name": "Factory B", "count": 1}]},
 {"name":[],"factories":[]}
]

Or maybe do you mean something a bit more complex? like:

curl 'localhost:3000/supervisors?select=...processes(name,factories(name,...factory_buildings(buildings:count())))'

[
 {"name":["Process A1", "Process B2"],"factories":[{"name": "Factory A", "buildings": 2}, {"name": "Factory B", "buildings": 2}]},
 {"name":["Process A2", "Process B2"],"factories":[{"name": "Factory A", "buildings": 2}, {"name": "Factory B", "buildings": 2}]},
 {"name":["Process B1", "Process C1", "Process C2"],"factories":[{"name": "Factory B", "buildings": 2}, {"name": "Factory C", "buildings": 1}, {"name": "Factory C", "buildings": 1}]},
 {"name":["Process B1"],"factories":[{"name": "Factory B", "buildings": 2}]},
 {"name":[],"factories":[]}
]

For an array of arrays (of objects) it also works. I added these to the tests. LMK if that's not it.

wolfgangwalther · 2025-01-25T10:38:44Z

So I decided to leave it that way just in case there were other cases that weren't covered by tests yet.

👍

I don't think there's a practical use for aggregates inside an array of objects, since that's done to a to-one spread

Very true :)

For an array of arrays (of objects) it also works. I added these to the tests.

Cool!

wolfgangwalther · 2025-01-25T10:49:48Z

I did not look at the code.

I'm not deep enough into the Planner and Query Builder to be able to properly review the code side of things. @steve-chavez could you look at it again? There have been quite some changes since your last review, I think.

laurenceisla mentioned this pull request Jul 5, 2024

Spread operator for many-to-many relationships, aliases, and aggregations #3041

Open

laurenceisla force-pushed the feat-spread-m2m branch from 1a02416 to dd89c12 Compare July 5, 2024 23:58

laurenceisla commented Jul 6, 2024

View reviewed changes

src/PostgREST/Error.hs Outdated Show resolved Hide resolved

laurenceisla force-pushed the feat-spread-m2m branch from dd89c12 to dce8597 Compare July 10, 2024 01:07

laurenceisla force-pushed the feat-spread-m2m branch 2 times, most recently from 6507878 to bd93514 Compare July 26, 2024 01:50

laurenceisla commented Jul 26, 2024

View reviewed changes

src/PostgREST/Plan.hs Outdated Show resolved Hide resolved

laurenceisla mentioned this pull request Aug 20, 2024

Fixes when using aggregates inside Spread Embedded Resources #3693

Merged

3 tasks

laurenceisla force-pushed the feat-spread-m2m branch 2 times, most recently from 38abc0e to 6e64707 Compare September 7, 2024 00:49

laurenceisla force-pushed the feat-spread-m2m branch 7 times, most recently from 9002110 to b3e5483 Compare September 18, 2024 23:37

laurenceisla commented Sep 18, 2024

View reviewed changes

src/PostgREST/Query/SqlFragment.hs Show resolved Hide resolved

src/PostgREST/Query/SqlFragment.hs Outdated Show resolved Hide resolved

src/PostgREST/ApiRequest/Types.hs Outdated Show resolved Hide resolved

src/PostgREST/Plan/ReadPlan.hs Outdated Show resolved Hide resolved

laurenceisla changed the title ~~feat: WIP allow spread operators in to-many relationships~~ feat: allow spread operators in to-many relationships Sep 18, 2024

laurenceisla marked this pull request as ready for review September 18, 2024 23:40

laurenceisla force-pushed the feat-spread-m2m branch 4 times, most recently from 67e6419 to 87a13ef Compare September 25, 2024 22:14

laurenceisla force-pushed the feat-spread-m2m branch from 87a13ef to 19466a8 Compare October 3, 2024 17:33

steve-chavez reviewed Oct 3, 2024

View reviewed changes

docs/references/api/resource_embedding.rst Show resolved Hide resolved

wolfgangwalther self-requested a review November 5, 2024 07:05

wolfgangwalther reviewed Nov 5, 2024

View reviewed changes

docs/references/api/resource_embedding.rst Outdated Show resolved Hide resolved

test/spec/Feature/Query/SpreadQueriesSpec.hs Outdated Show resolved Hide resolved

test/spec/Feature/Query/SpreadQueriesSpec.hs Outdated Show resolved Hide resolved

laurenceisla force-pushed the feat-spread-m2m branch from 02d8308 to e969f91 Compare November 6, 2024 00:19

wolfgangwalther reviewed Nov 6, 2024

View reviewed changes

docs/references/api/resource_embedding.rst Outdated Show resolved Hide resolved

laurenceisla force-pushed the feat-spread-m2m branch 3 times, most recently from dca7c2d to daf47a5 Compare November 23, 2024 18:38

laurenceisla commented Nov 23, 2024

View reviewed changes

laurenceisla force-pushed the feat-spread-m2m branch 2 times, most recently from c3ee0e2 to 4600a2b Compare December 31, 2024 21:58

wolfgangwalther reviewed Jan 1, 2025

View reviewed changes

laurenceisla commented Jan 2, 2025

View reviewed changes

test/spec/Feature/Query/AggregateFunctionsSpec.hs Show resolved Hide resolved

wolfgangwalther reviewed Jan 19, 2025

View reviewed changes

feat: allow spreading one-to-many and many-to-many embedded resources

7d3fd72

laurenceisla force-pushed the feat-spread-m2m branch from a8ecc05 to 7d3fd72 Compare January 22, 2025 01:16

wolfgangwalther requested a review from steve-chavez January 25, 2025 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: allow spread operators in to-many relationships #3640

feat: allow spread operators in to-many relationships #3640

laurenceisla commented Jul 5, 2024 •

edited

Loading

laurenceisla commented Jul 6, 2024 •

edited

Loading

wolfgangwalther commented Jul 6, 2024

laurenceisla commented Jul 17, 2024 •

edited

Loading

laurenceisla commented Sep 7, 2024

laurenceisla left a comment •

edited

Loading

wolfgangwalther left a comment

laurenceisla commented Nov 5, 2024

laurenceisla left a comment

laurenceisla Nov 23, 2024 •

edited

Loading

wolfgangwalther commented Nov 23, 2024

laurenceisla commented Nov 23, 2024 •

edited

Loading

wolfgangwalther commented Dec 4, 2024

laurenceisla commented Dec 5, 2024

steve-chavez commented Dec 9, 2024

laurenceisla commented Dec 31, 2024

wolfgangwalther left a comment

wolfgangwalther commented Jan 1, 2025

laurenceisla commented Jan 2, 2025

laurenceisla commented Jan 2, 2025

laurenceisla commented Jan 7, 2025

wolfgangwalther left a comment

laurenceisla commented Jan 22, 2025 •

edited

Loading

wolfgangwalther commented Jan 25, 2025

wolfgangwalther commented Jan 25, 2025

feat: allow spread operators in to-many relationships #3640

Are you sure you want to change the base?

feat: allow spread operators in to-many relationships #3640

Conversation

laurenceisla commented Jul 5, 2024 • edited Loading

laurenceisla commented Jul 6, 2024 • edited Loading

wolfgangwalther commented Jul 6, 2024

laurenceisla commented Jul 17, 2024 • edited Loading

Repeated values and order

Nested To-Many Spreads

laurenceisla commented Sep 7, 2024

laurenceisla left a comment • edited Loading

Choose a reason for hiding this comment

wolfgangwalther left a comment

Choose a reason for hiding this comment

laurenceisla commented Nov 5, 2024

laurenceisla left a comment

Choose a reason for hiding this comment

laurenceisla Nov 23, 2024 • edited Loading

Choose a reason for hiding this comment

wolfgangwalther commented Nov 23, 2024

laurenceisla commented Nov 23, 2024 • edited Loading

wolfgangwalther commented Dec 4, 2024

laurenceisla commented Dec 5, 2024

steve-chavez commented Dec 9, 2024

laurenceisla commented Dec 31, 2024

wolfgangwalther left a comment

Choose a reason for hiding this comment

wolfgangwalther commented Jan 1, 2025

laurenceisla commented Jan 2, 2025

laurenceisla commented Jan 2, 2025

laurenceisla commented Jan 7, 2025

wolfgangwalther left a comment

Choose a reason for hiding this comment

laurenceisla commented Jan 22, 2025 • edited Loading

wolfgangwalther commented Jan 25, 2025

wolfgangwalther commented Jan 25, 2025

laurenceisla commented Jul 5, 2024 •

edited

Loading

laurenceisla commented Jul 6, 2024 •

edited

Loading

laurenceisla commented Jul 17, 2024 •

edited

Loading

laurenceisla left a comment •

edited

Loading

laurenceisla Nov 23, 2024 •

edited

Loading

laurenceisla commented Nov 23, 2024 •

edited

Loading

laurenceisla commented Jan 22, 2025 •

edited

Loading