Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pipeline] Use conj! as the default pipeline rf #191

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alexander-yakushev
Copy link
Contributor

Another small but non-invasive improvement. This works best when many rows are processed, but shouldn't bring too much of an overhead if one/few rows are selected.

@camsaul
Copy link
Owner

camsaul commented Oct 14, 2024

Not sure this is really "non-invasive" if everything using query-reducible and select-reducible and the like has to be updated to use transduce instead of reduce (as evidenced by the tests you had to update)... I'm fairly certain this is going to require us to make changes in upstream Metabase code which makes it a breaking change. I'm honestly not sure the minor (?) performance benefits we get here are worth making the code harder to use correctly. (What are the performance benefits, btw? I would still love to see some benchmarks for these PRs)

@alexander-yakushev
Copy link
Contributor Author

alexander-yakushev commented Oct 14, 2024

I understand what you are saying. However, the tests that I've changed, for example, extend the method pipeline/transduce-execute-with-connection. I'm not sure there is a hard rule regarding this, but I would expect anything with the word "transduce" in the name to call the 1-arity of the reducing function. Otherwise, it's violating the "transduce contract," so to speak.

EDIT: this may be relevant https://clojure.org/reference/transducers#_creating_transducible_processes.

A completing process must call the completion operation on the final accumulated value exactly once.

It is possible to rewrite this PR without having to resort to 1-arity if the compatibility here is crucial.

Regarding the benchmarks: I did benchmarks for multiple changes at once. Each change is not very impressive number-wise on its own, but together they chip away quite a bit. I'll do separate benchmarks for this PR and post them.

Copy link

codecov bot commented Oct 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.65%. Comparing base (b412026) to head (08af6fc).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #191      +/-   ##
==========================================
+ Coverage   83.55%   83.65%   +0.09%     
==========================================
  Files          37       37              
  Lines        2506     2515       +9     
  Branches      212      212              
==========================================
+ Hits         2094     2104      +10     
+ Misses        200      199       -1     
  Partials      212      212              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alexander-yakushev
Copy link
Contributor Author

I did benchmarks of this against the current master and the allocations are 20% lower but timing results are inconclusive. Let's wait with this PR if favor of the others, and revisit it once other are merged.

@alexander-yakushev
Copy link
Contributor Author

alexander-yakushev commented Jan 5, 2025

Update

Here are the results for this PR on the usual select 10k rows benchmark:

- master
Time per call: 3.91 ms   Alloc per call: 8,448,707b   Iterations: 2577
Time per call: 3.84 ms   Alloc per call: 8,447,663b   Iterations: 2647
Time per call: 3.93 ms   Alloc per call: 8,447,364b   Iterations: 2703

- rf-conj!
Time per call: 3.56 ms   Alloc per call: 7,157,359b   Iterations: 2881
Time per call: 3.62 ms   Alloc per call: 7,156,383b   Iterations: 2814
Time per call: 3.53 ms   Alloc per call: 7,155,533b   Iterations: 2945

So, it nets ~8% time improvement and ~15% allocation reduction. The reduced allocations are quite tasty; this almost achieves the level of overhead that the raw next.jdbc has (it did ~5MB per select the last time I checked).

Given that you are concerned about how much change will be necessary in Metabase, I'm going to try this out in Metabase in advance.

UPD: Turns out, Metabase tests don't fail after this change (see metabase/metabase#51772). I suppose, Metabase already uses the rf "properly", invoking the one-arg arity as the completion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants