Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SinkToSource: avoid heap allocation #12255

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

NaN-git
Copy link

@NaN-git NaN-git commented Jan 14, 2025

Motivation

This PR refactors SinkToSource, so that its instances don't allocate a std::string and unnecessary memory copies are avoided, too.

Context

The current implementation does not release the allocated std::string before the SinkToSource object is destroyed. This can cause unlimited usage, when the lifetime of SinkToSource objects is longer than expected.

Would have prevented #7359


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@NaN-git NaN-git requested a review from edolstra as a code owner January 14, 2025 22:14
@NaN-git
Copy link
Author

NaN-git commented Jan 16, 2025

@Mic92 Can you review and merge this PR, please?

@Mic92
Copy link
Member

Mic92 commented Jan 16, 2025

@edolstra you reviewed the commit yesterday in the meeting but didn't write a comment afterwards.

@edolstra
Copy link
Member

Notes from the team meeting:

  • Does Boost have a ring buffer type? If so it would be nice to use that.
  • Alternatively or additionally, we can change addMultipleToStore() to take a PathsSource && pathsToCopy and then have it destroy the Source objects as soon as it's done with them (i.e. it would pop elements from pathsToCopy, process them and drop them). That way, the coroutine and its associated buffer should be released quickly.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-01-15-nix-team-meeting-minutes-208/58993/1

@NaN-git NaN-git requested a review from Ericson2314 as a code owner January 19, 2025 17:17
@github-actions github-actions bot added the store Issues and pull requests concerning the Nix store label Jan 19, 2025
@NaN-git
Copy link
Author

NaN-git commented Jan 19, 2025

  • I considered to use boost::circular_buffer. We have a very limited use case and this use case does not map directly to the interface of the templated class boost::circular_buffer, i.e. the correct amount of data to be inserted/read has to be calculated, too. After reading the data another call is necessary to erase it.
    The advantage is that iterators can hide the fact that the data might be saved in two spans. I don't know whether reading the data using the iterators would be efficient.
  • I found and fixed the code, which did not release the pointer to the Source objects. At the beginning I analyzed the wrong code paths because I didn't know that the issue probably only occurs when using the remote-store implementation. Store::addMultipleToStore seems to be OK.

Copy link
Member

@Mic92 Mic92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @NaN-git I rather have data structure intree that is easy to understand than a complex generic abstraction that doesn't quite fit our use cases.
@edolstra any objections beyond merging this?

@edolstra
Copy link
Member

I did the move semantics approach in #12296.

@roberth
Copy link
Member

roberth commented Jan 21, 2025

I'm not entirely convinced that we need a ring buffer here.

Note that these are coroutines and not threads, so we have control over whether the producer or consumer is running for any given state.

This means that we could instead implement a regime where

  • the producer decides the chunk size and is responsible for capping that size
  • sinkToSource ensures that the chunk is consumed entirely before returning control to the producer
  • no copying or extra buffers are needed

If a producer were to allocate large or arbitrary buffers, that could indeed be a problem, but that problem would need to be resolved in the producer anyway.

I believe this is a more efficient solution within the current constraints. We'd have to reconsider if we were to use threads or an async I/O runtime that allows us to yield to arbitrary threads and resume the producer while the consumer is still processing.

@NaN-git NaN-git changed the title sinkToSource: transfer data in chunks and release memory early sinkToSource: avoid heap allocation Jan 22, 2025
@NaN-git NaN-git changed the title sinkToSource: avoid heap allocation SinkToSource: avoid heap allocation Jan 22, 2025
@NaN-git
Copy link
Author

NaN-git commented Jan 22, 2025

I changed the PR, so that additional heap allocations are avoided. Furthermore no unnecessary memory copy is done anymore.

@Mic92
Copy link
Member

Mic92 commented Jan 22, 2025

I dropped unrelated parentheses changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
store Issues and pull requests concerning the Nix store
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants