Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage due to repetitive batch build/compression in re-insertion #1457

Open
4 of 9 tasks
byte-sourcerer opened this issue Dec 20, 2024 · 1 comment
Open
4 of 9 tasks
Assignees

Comments

@byte-sourcerer
Copy link

byte-sourcerer commented Dec 20, 2024

Observed

Upon attempting to re-insert into Clickhouse, we noticed a heightened CPU usage attributed to (*batch).Append and (*batch).Send. As it stands, the current design integrates the construction and compression of the batch with its transmission. Consequently, each attempt to resend a batch mandates a fresh construction and compression execution, resulting in significant strains on CPU performance.

Even if we can call batch.Send(..) multiple times, the context.Context is shared, which prevents us to control timeout.

Solution

Decoupling of the two linked processes: the formation of a batch and its delivery.

Expected behaviour

The end goal is to enable the re-transmission of batches without the need for repetitive construction and compression operations.

Code example

retry(func() {
    batch := conn.PrepareBatch(...)
    for ... {
        batch.Append(...)
    }
    batch.Send()
})

Details

Profiling flamegraph:

CleanShot 2024-12-20 at 18 57 35@2x-2

Environment

  • clickhouse-go version: v2.13.0
  • Interface: ClickHouse API / database/sql compatible driver
  • Go version: 1.22.4
  • Operating system:
  • ClickHouse version:
  • Is it a ClickHouse Cloud? No
  • ClickHouse Server non-default settings, if any: No
  • CREATE TABLE statements for tables involved: No
  • Sample data for all these tables, use clickhouse-obfuscator if necessary
@byte-sourcerer
Copy link
Author

byte-sourcerer commented Dec 30, 2024

I expect new interfaces like:

type Conn interface {
	PrepareBatchBuilderAndSender(ctx context.Context, query string, opts ...PrepareBatchOption) (BatchBuilder, Sender, error)
}

type BatchBuilder interface {
	Append(v ...any) error
	Build(destination *Buffer) (*Buffer, error)
}

type Sender interface {
	Send(ctx context.Context, block *Buffer) error
	Abort() error
}

and use them like:

batchBuilder, sender, err := conn.PrepareBatchBuilderAndSender(ctx, /**/)
for /**/ {
    batchBuilder.Append(/**/)
}

buffer := bufferPool.Get()
defer bufferPool.Put(buffer)

buffer, err := batchBuilder.Build(buffer)

retry(func() {
    sender.send(ctx, buffer)
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants