perf: Drop RowConverter from GroupOrderingPartial #14566

ctsk · 2025-02-09T21:32:34Z

Which issue does this PR close?

Closes Improve GroupOrderingPartial performance #14565.

Rationale for this change

Faster is better?

What changes are included in this PR?

Are these changes tested?

Yes, added a new test.

Are there any user-facing changes?

No.

(The input_schema argument of GroupOrderingPartial::try_new is no longer used and can be removed, this is not included in this PR)

ctsk · 2025-02-10T08:37:12Z

Micro-benchmark results

group_ordering_$n benchmarks the partial group ordering where the ordering contains $n columns. Each Columns consists of 8192 Int32s.

group_ordering_partial/order_indices_1
                        time:   [16.814 µs 17.340 µs 17.934 µs]
                        change: [-71.063% -70.318% -69.568%] (p = 0.00 < 0.05)
                        Performance has improved.
group_ordering_partial/order_indices_2
                        time:   [23.585 µs 23.907 µs 24.176 µs]
                        change: [-65.486% -64.711% -63.930%] (p = 0.00 < 0.05)
                        Performance has improved.
group_ordering_partial/order_indices_4
                        time:   [28.452 µs 28.603 µs 28.794 µs]
                        change: [-52.826% -52.404% -51.942%] (p = 0.00 < 0.05)
                        Performance has improved.
group_ordering_partial/order_indices_8
                        time:   [34.737 µs 34.828 µs 34.935 µs]
                        change: [-64.151% -63.876% -63.589%] (p = 0.00 < 0.05)
                        Performance has improved.

2010YOUY01 · 2025-02-13T06:46:47Z

Thank you, this looks good to me. Let's get the CI fixed.

https://github.com/apache/datafusion/blob/main/datafusion/core/tests/fuzz_cases/aggregate_fuzz.rs should have good coverage for edge cases in group by with partial ordering, so I think this change is safe

ctsk · 2025-02-13T15:14:00Z

Thank you for taking the time to review!

lewiszlw · 2025-02-14T03:36:11Z

datafusion/physical-plan/src/aggregates/order/partial.rs

 impl GroupOrderingPartial {
    pub fn try_new(
-        input_schema: &Schema,
+        _input_schema: &Schema,


Why not remove this arg?

This way we can avoid API change, though I don't have a strong preference

maybe we can add a comment or something as a follow on pr

Additionally, the input_schema argument in order::GroupOrdering::try_new will also be unnecessary

datafusion/datafusion/physical-plan/src/aggregates/order/mod.rs

Lines 43 to 58 in 1626c00

impl GroupOrdering {

/// Create a `GroupOrdering` for the specified ordering

pub fn try_new(

input_schema: &Schema,

mode: &InputOrderMode,

ordering: &LexOrdering,

) -> Result<Self> {

match mode {

InputOrderMode::Linear => Ok(GroupOrdering::None),

InputOrderMode::PartiallySorted(order_indices) => {

GroupOrderingPartial::try_new(input_schema, order_indices, ordering)

.map(GroupOrdering::Partial)

}

InputOrderMode::Sorted => Ok(GroupOrdering::Full(GroupOrderingFull::new())),

}

}

I've added a TODO comment to the code

datafusion/physical-plan/src/aggregates/order/partial.rs

github-actions bot added the physical-expr Physical Expressions label Feb 9, 2025

ctsk force-pushed the faster-partial-groupordering branch from d7e7376 to a7324c7 Compare February 10, 2025 08:18

ctsk force-pushed the faster-partial-groupordering branch from cccc8ea to 0d778f5 Compare February 10, 2025 08:40

ctsk changed the title ~~Drop RowConverter from GroupOrderingPartial~~ perf: Drop RowConverter from GroupOrderingPartial Feb 10, 2025

ctsk force-pushed the faster-partial-groupordering branch from 0d778f5 to ec21860 Compare February 12, 2025 16:04

ctsk added 6 commits February 13, 2025 12:09

Benchmark partial ordering

b22dab3

Drop RowConverter from GroupOrderingPartial

04528f8

cargo fmt

50df2b4

Add license header

1ae5d74

Mark unused function argument

5a88a02

cargo fmt

2961710

ctsk force-pushed the faster-partial-groupordering branch from ec21860 to 2961710 Compare February 13, 2025 11:11

2010YOUY01 approved these changes Feb 13, 2025

View reviewed changes

lewiszlw reviewed Feb 14, 2025

View reviewed changes

zhuqi-lucas reviewed Feb 15, 2025

View reviewed changes

datafusion/physical-plan/src/aggregates/order/partial.rs Outdated Show resolved Hide resolved

ctsk added 2 commits February 15, 2025 18:36

fix typo

b12ac6b

Add TODO to remove _input_schema parameter

8c882d9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Drop RowConverter from GroupOrderingPartial #14566

perf: Drop RowConverter from GroupOrderingPartial #14566

ctsk commented Feb 9, 2025

ctsk commented Feb 10, 2025

2010YOUY01 commented Feb 13, 2025

ctsk commented Feb 13, 2025

lewiszlw Feb 14, 2025

2010YOUY01 Feb 14, 2025

alamb Feb 15, 2025

ctsk Feb 15, 2025

ctsk Feb 15, 2025

	impl GroupOrdering {
	/// Create a `GroupOrdering` for the specified ordering
	pub fn try_new(
	input_schema: &Schema,
	mode: &InputOrderMode,
	ordering: &LexOrdering,
	) -> Result<Self> {
	match mode {
	InputOrderMode::Linear => Ok(GroupOrdering::None),
	InputOrderMode::PartiallySorted(order_indices) => {
	GroupOrderingPartial::try_new(input_schema, order_indices, ordering)
	.map(GroupOrdering::Partial)
	}
	InputOrderMode::Sorted => Ok(GroupOrdering::Full(GroupOrderingFull::new())),
	}
	}

perf: Drop RowConverter from GroupOrderingPartial #14566

Are you sure you want to change the base?

perf: Drop RowConverter from GroupOrderingPartial #14566

Conversation

ctsk commented Feb 9, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

ctsk commented Feb 10, 2025

Micro-benchmark results

2010YOUY01 commented Feb 13, 2025

ctsk commented Feb 13, 2025

lewiszlw Feb 14, 2025

Choose a reason for hiding this comment

2010YOUY01 Feb 14, 2025

Choose a reason for hiding this comment

alamb Feb 15, 2025

Choose a reason for hiding this comment

ctsk Feb 15, 2025

Choose a reason for hiding this comment

ctsk Feb 15, 2025

Choose a reason for hiding this comment