Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: zero relocation string containers #159

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

muggenhor
Copy link
Contributor

This makes the underlying storage type, used by the containers and previously hardcoded to bits::carray, configurable.

It then adds a specialized container (pic_array) for storing variable-length arrays of Ts. The most prominent example of such arrays being strings (arrays of char). This container stores these arrays as sub-arrays inside one large storage array.

This simultaneously achieves:

  1. higher density than over-allocating to fit the largest array (string)
  2. better memory locality than storing (pointer,size) tuples (aka span/string_view) or over-allocating
  3. unlike storing pointers (span/string_view) this avoids R_<arch>_RELATIVE-style load-time relocations (for -fPIC) because it stores relative addresses to data (indices into the storage array) instead of absolute addresses
  4. consumes less (rodata) memory than storing pointers by selecting the smallest possible integer capable of addressing the entire storage array for storing relative addresses

I have two different use cases where this benefits me significantly:

  1. huge lookup tables in shared libraries that cause noticeable delays at process startup (20ms average for the largest) when the dynamic linker (ld.so) is relocating large amounts of string pointers in .data.rel.ro: these get moved to .rodata which doesn't need to be relocated at all.
  2. different project on an ESP8266 where the increased memory density allows me to shave off ~15 kB of precious flash storage.

Depends-on: #158

template <typename T, typename U, typename Compare, std::size_t... KNs,
std::enable_if_t<!bits::is_pair<Compare>::value>* = nullptr>
constexpr auto make_map(
Compare const& compare,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this has the comparison functor as it's first parameter instead of the last because otherwise this overload is sometimes ambiguous with the overload without comparison functor. (Probably due to the parameter pack).

return at_impl(*this, key);
}
constexpr Value& at(Key const &key) {
constexpr decltype(auto) at(Key const &key) {
Copy link
Contributor Author

@muggenhor muggenhor Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decltype(auto), unlike the explicit reference, allows our use of a proxy iterator by allowing this to return on-demand constructed by-value types to be returned here (mostly string_view/frozen::string). Otherwise we'd return a reference to a temporary, which is UB.


} // namespace bits

// Helpers to preserve arrays in type information, instead of letting them decay to pointers
Copy link
Contributor Author

@muggenhor muggenhor Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially std::make_pair with the added twist that it avoids decaying of arrays to pointers. This function is the primary thing I'm not super happy with. But I couldn't find any other way to preserve the array size (string size) in the type information otherwise.

It cannot be put in an argument to a constexpr function either, because the values of parameters to constexpr functions must also be usable at runtime, and thus cannot become part of the type info later on (i.e. storage_size template parameter of return type of factory function make_XXX).

@muggenhor
Copy link
Contributor Author

Well, I was not expecting those build failures on MSVC. I'll try to resolve those tomorrow.

@muggenhor
Copy link
Contributor Author

It appears I've managed to discover some bugs in MSVC. Hopefully my last commit successfully works around them (I may still have to add or subtract one level of std::enable_if SFINAE usage). I need to wait for a subsequent AppVeyor run to be able to confirm as I don't have MSVC myself. (And Compiler Explorer only goes so far).

I've had some discussions in the CppLang slack about this https://cpplang.slack.com/archives/C21PKDHSL/p1692113981628139 (invite, if required, here: https://cppalliance.org/slack/). It appears that MSVC is definitely buggy with some combination of parameter packs, NTTPs (non-type template parameters) and reference-to-bounded arrays. There appear to be multiple ways to work around this. I've gone for the one that seemed most palatable to me (adding and using wrapper type array_ref<T, N> as replacement for const T(&)[N]).

@muggenhor
Copy link
Contributor Author

I finally managed to get a Windows VM with VS 2017 (because AppVeyor runs with 2017). After 5352523 I don't have any build errors locally anymore. So hopefully AppVeyor concurs...

@muggenhor
Copy link
Contributor Author

@serge-sans-paille I think you need to trigger the build on AppVeyor? Could you do that (or, if I'm wrong, tell my how to do so myself).

@serge-sans-paille
Copy link
Owner

@muggenhor I've removed appveyor and travis CI in favor of github actions, could you rebase on master branch first?

@muggenhor muggenhor force-pushed the feat/zero-reloc-string-containers branch from 5352523 to cb357a8 Compare September 5, 2023 12:58
Unfortunately the 'push' event doesn't trigger when a PR originates from
a fork. So trigger on the 'pull_request' event too.
@muggenhor muggenhor force-pushed the feat/zero-reloc-string-containers branch from 1479ccb to aae5197 Compare September 5, 2023 13:24
1. Instead of separate mkdir build && do stuff in build: let CMake handle this
   itself with its -B <build-dir-mkdir-on-demand> and -S <srcdir> options.
2. Tell GCC/Clang to color their diagnostics even though their output is
   captured in a pipe (!isatty(STDOUT_FILENO)).
…struction

Specifically don't pin it to 'carray'. This makes it possible to change
the item storage later on.
This allows using more efficient underlying storage containers for
specific cases where it's possible to be more efficient than the current
naive array of `value_type[N]`.
When our underlying iterator is a proxy iterator it's dereferencing
operator (`*`) will return a temporary. The .at() method should *not*
return a reference to it or one of its members (->second). Instead it's
`second` member should be returned as is. A value when it's a value
type, a reference when it's a reference type.
Specifically this version stores strings in a way that doesn't require
dynamic relocations when part of position-independent-code.
These specializations will cause the string storage to be contained
inside the 'set' and 'unordered_set' instances by using the new
PIC-friendly container. Sets constructed this way should not add any
extra dynamic relocations and their associated costs and restrictions.
This is necessary for storing strings in maps in a way that doesn't
require dynamic relocations when part of position-independent-code.

Using the tuple accessor (meta-) functions has the side benefit of this
being generic enough to be useful (without modifications) for something
like a multi_index type in the future.
These specializations will cause the string storage to be contained
inside the 'set' and 'unordered_set' instances by using the new
PIC-friendly container. Sets constructed this way should not add any
extra dynamic relocations and their associated costs and restrictions.
For large string tables (large number of elements) with relatively short
strings this has a huge effect. Easily cutting such a string table in
half.

For unordered maps/sets this is exclusively a size benefit. For ordered
maps/sets this significantly increases the probability of L1 cache hits
and thus also provides a potential performance benefit.
…rectly

Specifically it fails to consider a dependent type for overload
resolution even when the template parameter it's dependent on is
specified explicitly.

So we're using enable_if-based SFINAE instead now to check for the
presence of a `value_type` member type in `T`.
…am packs

It's unclear what exactly is going on here. But use of a reference to
bounded-array as a template parameter inside a parameter pack cause MSVC
to emite useless compiler errors. On MSVC 19.16 it appears to eliminate
these overloads from consideration without any diagnostic (like SFINAE).
On more recent versions it provides the supremely useless "failed to
specialize function template" without informing about *why* it is
failing to do so.

And some variations of these, applying decltype to NTTPs, even ICEs the
compiler.

Converting `const T (&)[N]` to `array_ref<T, N>` allows us to *not* have
a reference-to-array expression inside the parameter pack and this
appears to stop triggering all compiler bugs discovered thus far.
For some unknown reason MSVC fails to evaluate at
constant-evalutation-time (constexpr) pic_array::appender::operator()
when it uses a range-based-for expression. Manually expanding it, in the
exact same way that the C++ spec says "the implementation" (compiler)
should do it, allows it to proceed.
@muggenhor muggenhor force-pushed the feat/zero-reloc-string-containers branch from aae5197 to a7d4008 Compare September 5, 2023 13:30
@muggenhor
Copy link
Contributor Author

I had to do some changes to get CI to work:

  1. 9c3d62a: necessary to trigger at all
  2. a9576b3: to get usable workflow names in the GitHub Actions tab
  3. 145cb94: get colored error messages

(2) and (3) are not strictly necessary. So I can remove them easily enough if you dislike those changes.

Also I used interactive rebase to make them the commits closest to master. This allows you to merge them straight to master right now if you would want to (git merge --ff-only --ff 145cb94f71d2b5c43be5dad47633532b229d46a8).

@serge-sans-paille
Copy link
Owner

I started reviewing the changes... but there are too many of them. Could you split this in individual PRs I can review independently? The time I have to spend on frozen is unfortunately limited :'(

@muggenhor
Copy link
Contributor Author

I can split it, but there's dependencies between the changes, so there are some limits to how far I can split it.

Either way, I'm also somewhat restricted in time. I was hoping to do some splitting this week but didn't end up having any time. Hopefully somewhere in the next two weeks.

@agraham-roku
Copy link

I came here looking for exactly this, thank you.

Another good thing to have for unordered_map/unordered_set would be the option to simply not store the key values in addition to the tables. This means it won't be able to detect a collision and a failed lookup will return an arbitrary result, but in practice, sometimes that's ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants