-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Would hset/ hget wrapper be beneficial? #42
Comments
Hello @jrots, thank you for checking this module out! First of all I would like to say that the amount of users you are talking about seems a good fit for memory-optimized data structures such as Roaring Bitmaps. However I didn't quite understand what are you trying to index. If you need to store arbitrary key/value pairs, I am not sure how can you do that with bitmaps, since they are naturally suited to store integer/bool pairs (e.g. user 1024 is cached, user 3333 is not cached). And I am also not sure how to use Redis' HASH data type with Roaring, since the secret sauce behind Instagram's optimization is in the |
Hi, |
Hey, Now I understand what you need to index. Roaring Bitmaps seems to be a good fit for your use case, since one user would generally be "seen/no voted" by a small number of other users, and their IDs would be very sparse, such as you exemplified (1, 123123, etc.). With regular bitmaps you'd need to store a great amount of empty bits representing the IDs not present in each user's set, while with compressed bitmaps you typically need much less memory. If you try this module please give me your feedback. |
hi sorry for the late reply,
The file I want to load is +/- 239 GB, contains +/- 200M unique list keys, each with a list of users. I stopped the loading when +/- 260K list keys were loaded in memory and redis was using +/- 4GB at that time - If I extrapolate this to the whole dataset it would probably require
|
Ah btw the size of the dump.rdb is : |
Hey @jrots thank you for getting back! Indeed the Regarding the two other problems you noticed, I'll take a look at them. The current unit and integration tests don't report any memory leaks, so I believe there must be something wrong with my implementation of the module. Even if CRoaring's memory usage is not taken into account by redis, valgrind correctly identifies its potential leaks, from what I tested. There are no integration tests for the I'll keep you posted. |
Hey @jrots, can you please try loading those keys again? I have just updated the master branch correcting the I am still looking at the |
Closing this issue due to inactivity. Also, the bug with the |
Nice module! Might look to use it for a project that involves a lot of "skiplists" of a lot of users & lookup/ getting these lists needs to be fast as possible..
having 250M users, that have skipped 18B ids in total,
storing a roaring list / key/value pair for each of these users would require +/- 60GB in total
(if my initial tests were right with some test data )
but taking : https://engineering.instagram.com/storing-hundreds-of-millions-of-simple-key-value-pairs-in-redis-1091ae80f74c in to account,
it would make sense to spread these 250M in to buckets and have some hset compression on it, to further push that 60GB lower 😇.
Could be that compression on 10K roaring lists is not that beneficial. .but nonetheless ..
might be worth smth to explore?
The text was updated successfully, but these errors were encountered: