Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would hset/ hget wrapper be beneficial? #42

Closed
jrots opened this issue Aug 13, 2017 · 8 comments
Closed

Would hset/ hget wrapper be beneficial? #42

jrots opened this issue Aug 13, 2017 · 8 comments
Labels

Comments

@jrots
Copy link

jrots commented Aug 13, 2017

Nice module! Might look to use it for a project that involves a lot of "skiplists" of a lot of users & lookup/ getting these lists needs to be fast as possible..
having 250M users, that have skipped 18B ids in total,
storing a roaring list / key/value pair for each of these users would require +/- 60GB in total
(if my initial tests were right with some test data )
but taking : https://engineering.instagram.com/storing-hundreds-of-millions-of-simple-key-value-pairs-in-redis-1091ae80f74c in to account,
it would make sense to spread these 250M in to buckets and have some hset compression on it, to further push that 60GB lower 😇.
Could be that compression on 10K roaring lists is not that beneficial. .but nonetheless ..
might be worth smth to explore?

@aviggiano
Copy link
Owner

Hello @jrots, thank you for checking this module out!

First of all I would like to say that the amount of users you are talking about seems a good fit for memory-optimized data structures such as Roaring Bitmaps.

However I didn't quite understand what are you trying to index.

If you need to store arbitrary key/value pairs, I am not sure how can you do that with bitmaps, since they are naturally suited to store integer/bool pairs (e.g. user 1024 is cached, user 3333 is not cached).

And I am also not sure how to use Redis' HASH data type with Roaring, since the secret sauce behind Instagram's optimization is in the zipmap implementation itself (see [1] and [2]), which stores small hastables as key-value strings, not so much in the buckets partition strategy.

@jrots
Copy link
Author

jrots commented Aug 14, 2017

Hi,
What I want to index are sort of "skiplists".
Users that have been "seen/ no voted" by other userids.
They can be small or large depending on the usage of the user.. given the above numbers, it would be on average a list of 72 userids per user (some users have very large lists however).
key: skiplist: value: 1 223 123123 445555 ... etc.
to index I could use R.SETINTARRAY and to fetch R.GETINTARRAY,
& maintain new skips by, R.SETBIT
But I still need to store +/- 200M keys with a lot of lists which can be an overhead too..
Thought there was some more compression going on (gzip or something) in the zipmap,
but appartenly it's just reordering the hash to avoid having too much pointers..
Wont be beneficial in this case. Can be closed if no optim is possible :p

@aviggiano
Copy link
Owner

Hey,

Now I understand what you need to index.

Roaring Bitmaps seems to be a good fit for your use case, since one user would generally be "seen/no voted" by a small number of other users, and their IDs would be very sparse, such as you exemplified (1, 123123, etc.).

With regular bitmaps you'd need to store a great amount of empty bits representing the IDs not present in each user's set, while with compressed bitmaps you typically need much less memory.

If you try this module please give me your feedback.
I'd like to know more about the usability of the API, missing commands, and perhaps performance benchmarks.

@jrots
Copy link
Author

jrots commented Aug 29, 2017

hi sorry for the late reply,
tried bulk loading some data in it using redis roaring.
Observed some issues:

  • used_memory_human seems to be incorrect, but there is probably already an open issue for this
  • the RSS memory usage of the process increases fast.
  • when issue-ing a "save"command and trying to restart redis afterwards, I get an error : The RDB file contains module data for the module ' ' that is not terminated by the proper module value EOF marker

The file I want to load is +/- 239 GB, contains +/- 200M unique list keys, each with a list of users.

I stopped the loading when +/- 260K list keys were loaded in memory and redis was using +/- 4GB at that time - If I extrapolate this to the whole dataset it would probably require

200000000/ 260000 => 770 x 4GB => 3TB?
to load everything . . so I suspect that there must be a memory leak somewhere,
tried the dump command and wanted to restart redis afterwards to see what memory usage then would be, but that failed.. so if that issue would be fixed I can retest and see what memory usage I get.


# Memory
used_memory:23925760
used_memory_human:22.82M
used_memory_rss:4057612288
used_memory_rss_human:3.78G
used_memory_peak:27584048
used_memory_peak_human:26.31M
used_memory_peak_perc:86.74%
used_memory_overhead:15497798
used_memory_startup:765664
used_memory_dataset:8427962
used_memory_dataset_perc:36.39%
total_system_memory:50640334848
total_system_memory_human:47.16G
...
db0:keys=262205,expires=0,avg_ttl=0

             total       used       free     shared    buffers     cached
Mem:         48294      48060        233          1        136      14336

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       522 41.3  8.0 4009436 3962512 pts/3 Sl+  08:17   1:02 redis-server *:6379

root@elastic30:~# sudo pmap -x 522
522:   redis-server *:6379
Address           Kbytes     RSS   Dirty Mode  Mapping
0000000000400000    1280     720       0 r-x-- redis-server
000000000073f000       4       4       4 r---- redis-server
0000000000740000      20      20      20 rw--- redis-server
0000000000745000      96      96      96 rw---   [ anon ]
000000000159b000 3930744 3929848 3929848 rw---   [ anon ]
00007fd621c00000   34816   24536   24536 rw---   [ anon ]
00007fd623fc7000     200      80       0 r-x-- libredis-roaring.so
00007fd623ff9000    2048       0       0 ----- libredis-roaring.so
00007fd6241f9000       4       4       4 r---- libredis-roaring.so
00007fd6241fa000      12       8       8 rw--- libredis-roaring.so
00007fd6241fd000       4       0       0 -----   [ anon ]
00007fd6241fe000    8192       8       8 rw---   [ anon ]
00007fd6249fe000       4       0       0 -----   [ anon ]
00007fd6249ff000    8192       8       8 rw---   [ anon ]
00007fd6251ff000       4       0       0 -----   [ anon ]
00007fd625200000   12288    6144    6144 rw---   [ anon ]
00007fd625ebe000    1784     632       0 r-x-- libc-2.19.so
00007fd62607c000    2048       0       0 ----- libc-2.19.so
00007fd62627c000      16      16      16 r---- libc-2.19.so
00007fd626280000       8       8       8 rw--- libc-2.19.so
00007fd626282000      20      16      16 rw---   [ anon ]
00007fd626287000     100      76       0 r-x-- libpthread-2.19.so
00007fd6262a0000    2044       0       0 ----- libpthread-2.19.so
00007fd62649f000       4       4       4 r---- libpthread-2.19.so
00007fd6264a0000       4       4       4 rw--- libpthread-2.19.so
00007fd6264a1000      16       4       4 rw---   [ anon ]
00007fd6264a5000      12       8       0 r-x-- libdl-2.19.so
00007fd6264a8000    2044       0       0 ----- libdl-2.19.so
00007fd6266a7000       4       4       4 r---- libdl-2.19.so
00007fd6266a8000       4       4       4 rw--- libdl-2.19.so
00007fd6266a9000    1044      68       0 r-x-- libm-2.19.so
00007fd6267ae000    2044       0       0 ----- libm-2.19.so
00007fd6269ad000       4       4       4 r---- libm-2.19.so
00007fd6269ae000       4       4       4 rw--- libm-2.19.so
00007fd6269af000     140     124       0 r-x-- ld-2.19.so
00007fd626bc5000      16      16      16 rw---   [ anon ]
00007fd626bce000      12      12      12 rw---   [ anon ]
00007fd626bd1000       4       4       4 r---- ld-2.19.so
00007fd626bd2000       4       4       4 rw--- ld-2.19.so
00007fd626bd3000       4       4       4 rw---   [ anon ]
00007ffcc8df9000     132      20      20 rw---   [ stack ]
00007ffcc8f32000       8       4       0 r-x--   [ anon ]
ffffffffff600000       4       0       0 r-x--   [ anon ]
---------------- ------- ------- -------
total kB         4009436 3962516 3960804


after stopping redis : 

root@elastic30:~/redis-4.0.1# free -m
             total       used       free     shared    buffers     cached
Mem:         48294      44076       4218          1        125      14363
-/+ buffers/cache:      29586      18707
Swap:         1999        144       1855


save 

2791:M 29 Aug 08:26:30.981 # The RDB file contains module data for the module ' ' that is not terminated by the proper module value EOF marker

@jrots
Copy link
Author

jrots commented Aug 29, 2017

Ah btw the size of the dump.rdb is :
620M -rw-r--r-- 1 root root 620M Aug 29 08:25 dump.rdb

@aviggiano
Copy link
Owner

Hey @jrots thank you for getting back!

Indeed the used_memory_human statistics will be wrong due to a known issue #2.
I didn't have time to implement the fix, but I will work on it soon.

Regarding the two other problems you noticed, I'll take a look at them.

The current unit and integration tests don't report any memory leaks, so I believe there must be something wrong with my implementation of the module. Even if CRoaring's memory usage is not taken into account by redis, valgrind correctly identifies its potential leaks, from what I tested.

There are no integration tests for the SAVE command, thank you for noticing. I guess it should be easy to find out what's wrong.

I'll keep you posted.

@aviggiano aviggiano added bug and removed question labels Sep 2, 2017
@aviggiano
Copy link
Owner

Hey @jrots, can you please try loading those keys again?

I have just updated the master branch correcting the SAVE command, together with a memory leak fix on the RDB load function (sorry, this scenario wasn't being tested before -- I took the opportunity to add AOF tests as well).

I am still looking at the used_memory_human issue, but I think it will take more time.

@aviggiano
Copy link
Owner

Closing this issue due to inactivity. Also, the bug with the SAVE command has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants