-
Notifications
You must be signed in to change notification settings - Fork 702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embed key and TTL in robj #992
Comments
We would ideally align TTL, since most arithmetic operations require 8 byte alignment on x86 and ARM to be most performant, so it would be before the key when possible. |
I'm looking into it |
@madolson so shall we place TTL before key? We'd have to memmove key in some cases if TTL is added or removed, if we want to save memory. |
We probably want a reasonably small size on the embedded key (and embedded value). Since jemalloc size buckets get progressively larger, the wasted space (internal fragmentation) will get (on average) increasingly larger as the key/value size increases. Re: TTL and alignment, I'm with Maddy that I like to ensure alignment if reasonably possible. However from everything I've read, there's not much alignment penalty on modern processors. (But it really bugs me to have things unaligned by design.) Regarding TTL being added/removed, I suggest that after TTL is added, it is never removed. Any key which had a TTL at one time, is likely to have a TTL again. Also, removing the TTL most likely won't result in memory savings anyway as the jemalloc bucket size is less likely to change once we start embedding keys and values (using larger buckets). |
Hello Jim! I agree with most of this. Regarding key, I would like to always Regarding TTL, we can put it before embedded key (but then we need to memmove the key I think the value should be embedded if it fits with the key and ttl within 64 bytes (one cache line). Otherwise, I think value should only be embedded if it fits within the usable We may want to avoid reallocating small objects to move them between the 48B and
Anyway, we should provide an abstraction for this and provide functions to get expire, get embedded key, etc. Later, we'll be able to change the logic for when to embed what. |
This is to prepare for storing
robj
in a new hash table implementation.This includes replace dict with hashset in kvstore, db, expire, ...
The new hash table implementation doesn't have a "dict entry", nor does it have keys and values. It just has "elements". An element can effectively be anything that contains the key. The user provides a "get key" callback function in the hashsetType (same idea as the callbacks in dictType).
To use an
robj
as a container for both key and value, we can embed the key into therobj
. Therobj
(typedef'ed tostruct serverObject
) is accessed in very many places, so we can't easily make it opaque and change all the accesses to it. That would be very much work. Instead, we can add some extra fields (for optional embedded key and embedded TTL) and keep the existing fields. Wherever the robj is passed around in legacy code, an embedded key will just be ignored.There's is already something called embedded string (
obj->encoding = OBJ_ENCODING_EMBSTR
, created bycreateEmbeddedStringObject
) which allocates extra space after the struct, copies the string contents (an sds string) into the same allocation and pointsptr
to it. We can call this "embedded value" and it's not the same as an embedded key, which doesn't exist yet.The sds strings are somewhat weird since they are pointers to the string contents, and they also have a header which is before the content (so internally, the sds library does
s[-1]
to look at the bytes before the content). That's whyptr
points to the sds content, rather than to the sds header.The
ptr
pointer takes up some space (8 bytes) in the struct and it would be possible to avoid it (by computing it instead), but it's accessed in thousands of places just likeobj->ptr
so let's leave it like that for now. We can still embed the key. We can avoid adding another pointer to the embedded key though, if we use the same method that was used in #541 to embed a key in dictEntry.To embed also a key (another sds string) into the same robj, let's put the key before the value, because the key never changes. It can just be deleted, and then the whole robj is deleted. The value can change though (shrink or grows, possibly be moved out to its own allocation).
Let's use a bit from
refcount
to flag that there's an embedded key. We can also embed TTL and add a "has ttl" flag.The
data[]
field isn't strictly needed, since it's zero size by default and it's possible to allocate more space and use it anyway, but with a data field, we can access the end of the struct likeobj->data
and&obj->data[offset]
.Note the added "size of sds header" in the drawing above. There are different variants of the sds header, with different header sizes. (See
src/sds.h
.) Currently,createEmbeddedStringObject
always embeds the headerstruct sdshdr8
(which is a header of 3 bytes and the max length of strings of this sds variant 256 bytes, since it uses one byte to stores the length of the string). Embedded strings are (currently) only used for short strings anyway.When we embed keys, though, we want to embed them regardless of their size. Thus we want to able to use different sds headers. We add one byte to store the size of the sds header, so we can compute where the sds content is, and from that we can call
sdslen
and other sds functions to figure out the length of the sds string and where it ends.Now, what we need is a function to combine two
robj
into one, with an embedded key and a (embedded or not) value.If we are smart, we can convert an object with an embedded value into an object with an embedded key and no value (ptr = NULL), without reallocating or moving the actual value. That's why I'm suggesting (not sure if it's a good idea) that we add a "sds header size" byte also for embedded value. When we convert an robj with embedded string into an robj with embedded key and no value, we just set theobj->embkey = 1; obj->ptr = NULL;
.Update: The strings from a client are parsed and stored as
robj
objects inc->argv
and this argument vector needs to be valid even after the command has been executed, for replication and other purposes, so when we're handling a command likeSET k v
,k
andv
are both robj with a valid string value "k" and "v" respectively. We need to embed "k" as embedded key in thev
object, without destroyingk
or turning thek
into something else. Currently (as of valkey 8.0-rc1) the keys are embedded in a dictEntry and they're copied there (which seems fine, performance wise). Similarly we need to copy the string content ofk
into the 'embedded-key' part ofv
and then addv
to the hash table.The text was updated successfully, but these errors were encountered: