-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete Stale Keys from Disk AND RBM #13
Conversation
to delete stale keys from diskcache
Signed-off-by: Kiran Prakash <[email protected]>
server/src/main/java/org/opensearch/common/settings/ClusterSettings.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/common/settings/ClusterSettings.java
Show resolved
Hide resolved
|
||
diskCleanupKeyToCountMap | ||
.computeIfAbsent(shardId, k -> ConcurrentCollections.newConcurrentMap()) | ||
.merge(cleanupKey.readerCacheKeyId, 1, Integer::sum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment explaining this since it's a bit hard to read? My understanding is add a new concurrent map as the value for shardId if it doesn't yet exist. Then take this inner map and set its value for readerCacheKeyId to 1 if it's empty, or increment it by 1 if it already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your understanding is right.
I do have code comments already added on top of this method that I thought would be self explanatory.
let me know if you didn't see it or u think that was not clear enough.
return; | ||
} | ||
|
||
if (cleanupKey.readerCacheKeyId == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will the reader cache key ID be null? Is this when the shard is totally closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be null when the cache clear api is called which calls this clear() method.
https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/indices/IndicesRequestCache.java#L132
When the shard is closed, we would call the onclose() method of the cleanupKey generated which will have the readerCacheKeyId.
so essentially.
Closing a shard, or moving to a new reader (on refresh) will always know what the current readerCacheKeyId is.
But in terms of clear api, its more like, i dont care what the current readerCacheKeyId is, cleanup everything.
server/src/main/java/org/opensearch/indices/IndicesRequestCache.java
Outdated
Show resolved
Hide resolved
public static final Setting<Boolean> INDICES_ID_FIELD_DATA_ENABLED_SETTING = Setting.boolSetting( | ||
"indices.id_field_data.enabled", | ||
true, | ||
Property.Dynamic, | ||
Property.NodeScope | ||
); | ||
|
||
public static final Setting<String> INDICES_REQUEST_CACHE_DISK_CLEAN_THRESHOLD_SETTING = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could instead use Setting.DoubleSetting() from 0 to 100 (or 0.0 to 1.0), which has built in min value and max value arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but that wouldn't accept 50% and instead only works for 0.5.
having string is more flexible. this is following the same pattern like CLUSTER_ROUTING_ALLOCATION_LOW_DISK_WATERMARK_SETTING
--- link
server/src/internalClusterTest/java/org/opensearch/indices/IndicesRequestCacheDiskTierIT.java
Outdated
Show resolved
Hide resolved
server/src/internalClusterTest/java/org/opensearch/indices/IndicesRequestCacheDiskTierIT.java
Outdated
Show resolved
Hide resolved
|
||
// When entire disk tier is stale, test whether cache cleaner cleans up everything from disk | ||
public void testDiskTierInvalidationByCacheCleanerEntireDiskTier() throws Exception { | ||
int thresholdInMillis = 4_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set lower so the test doesn't take so long to run? Or does it flake if you do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it flakes, i'm setting the threshold low enough so that by the time the cleaner runs, we have inserted 2 entries.
and then put the tests to sleep until the cachecleaner runs the job. when test thread wakes up the entries would have been deleted.
client().prepareIndex("index").setId("1").setSource("k", "good bye"); | ||
ensureSearchable("index"); | ||
|
||
for (int i = 0; i < 6; i++) { // index 5 items with the new readerCacheKeyId |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this 6 items?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, i'm considering going back to kinder garden with my son
|
||
double result = indicesRequestCache.diskCleanupKeysPercentage(); | ||
|
||
assertEquals(0, result, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add small delta?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why ? we want this to be absolute zero since we are checking when disk cache is empty. That's why I didn't add any delta
@@ -215,18 +215,56 @@ public class IndicesService extends AbstractLifecycleComponent | |||
private static final Logger logger = LogManager.getLogger(IndicesService.class); | |||
|
|||
public static final String INDICES_SHARDS_CLOSED_TIMEOUT = "indices.shards_closed_timeout"; | |||
public static final String INDICES_REQUEST_CACHE_DISK_CLEAN_THRESHOLD_SETTING_KEY = "cluster.request_cache.disk.cleanup_threshold_percentage"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this one start with "cluster" instead of "indices" like the rest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want this to be a cluster level setting.
} | ||
} | ||
|
||
public static class CleanDiskCacheTests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These don't run as part of the test suite in intellij for me - it says bc it's part of the nested class, but it might also be bc of the junit stuff. Should prob refactor this, but I'm gonna go ahead and benchmark without those changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that too, let me address this, thanks !
private final ConcurrentMap<ShardId, ConcurrentMap<String , Integer>> diskCleanupKeyToCountMap = ConcurrentCollections.newConcurrentMap(); | ||
private final AtomicInteger staleKeysInDiskCount = new AtomicInteger(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these variables should be extracted out and be moved to some other class. Or make it generic so that it can be extended to other tiers if needed. As otherwise it tightly couples this cache with disk properties irrespective of whether tiered caching(disk) is enabled or not.
Here diskCleanupKeyToCountMap serve a mechanism to calculate staleKeys which in itself is like a "stats" associated with disk tier. So it should be attached to disk tier itself rather than hardcoding here. In case disk tier is enabled, this stats would be automatically calculated and available, otherwise now.
keysToClean.add(new CleanupKey(entity, null)); | ||
CleanupKey cleanupKey = new CleanupKey(entity, null); | ||
keysToClean.put(cleanupKey, new CleanupStatus()); | ||
updateStaleKeysInDiskCount(cleanupKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be called in case disk tier is turned off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll wait addressing this until your framework changes are in, it will give me a better picture on how to move all of this into each cache layer. Ideally, yes i also want the cleanup logic of each cache to be in their own classes instead of indicesrequestcache.java, which will also address this.
we need to make sure we clean the disk cache as well | ||
hence passing threshold as 0 | ||
*/ | ||
cleanDiskCache(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again call it explicitly doesn't make sense as disk tier might be turned off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll wait addressing this until your framework changes are in, it will give me a better picture on how to move all of this into each cache layer. Ideally, yes i also want the cleanup logic of each cache to be in their own classes instead of indicesrequestcache.java, which will also address this.
…ath (#13…" (opensearch-project#13244) This reverts commit 3d1d5e7.
Description
[Describe what this change achieves]
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.