Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] keyspace_expiration_misses stats #1507

Open
proost opened this issue Jan 5, 2025 · 5 comments · May be fixed by #1547
Open

[NEW] keyspace_expiration_misses stats #1507

proost opened this issue Jan 5, 2025 · 5 comments · May be fixed by #1547

Comments

@proost
Copy link

proost commented Jan 5, 2025

The problem/use-case that the feature addresses

In large-scale caching systems, passive expiration of keys can cause unexpected cache misses, resulting in unnecessary performance degradation. The current keyspace_misses metric in Valkey tracks all key misses, but it does not distinguish between misses due to expired keys and misses due to non-existent keys.

This lack of distinction makes it difficult for users to identify whether their cache misses are primarily caused by TTL settings or application-level key access patterns. Without visibility into passive expiration misses, users cannot optimize their TTL values effectively or adjust eviction policies to reduce unnecessary misses.

The keyspace_expiration_misses metric will provide better observability by tracking how many cache misses are specifically due to passive expiration. This will help users:

  • Identify whether TTL values are too short or too long.
  • Understand how frequently keys expire before they are accessed again.
  • Optimize TTL configurations to reduce cache misses and improve cache hit ratios.

Description of the feature

The feature introduces a new metric called keyspace_expiration_misses, which counts the number of key misses caused specifically by passive expiration.

The new metric will be tracked in the lookupKey() function in the Valkey codebase. When a key is found to be expired during a lookup, the keyspace_expiration_misses counter will be incremented. This will allow users to distinguish between expiration misses and other types of misses when inspecting Valkey's INFO stats output.

Example output from the INFO stats command after the feature is implemented:

# Stats
keyspace_hits: 1000000
keyspace_misses: 200000
keyspace_expiration_misses: 150000

Key changes in Valkey codebase:

  • Modify lookupKey() to increment keyspace_expiration_misses.
  • Update INFO stats output to include the new metric.

Additional information

Potential Next Steps:
Once keyspace_expiration_misses is implemented, a future enhancement could include tracking the distribution expiration gap (the time difference between now and the expiration time when a key is passively expired). This would provide deeper insights into whether keys are being accessed just after they expire or long after they are no longer relevant.

@ranshid
Copy link
Member

ranshid commented Jan 12, 2025

One small issue I have with this proposal is that it does not play well when active expiry is enabled. When enabled, the engine might handle active expiration is changing efficiency and the user tracking this metric might get the wring insight about his application access patterns.

given said that I do not have a strong objection tracking this metric, but would it be so much different than tracking the lazy-expired key events?

@proost
Copy link
Author

proost commented Jan 12, 2025

@ranshid

One small issue I have with this proposal is that it does not play well when active expiry is enabled. When enabled, the engine might handle active expiration is changing efficiency and the user tracking this metric might get the wring insight about his application access patterns.

Yes, you are right. So user carefully interpret the stat in context with:

  • Active expiry frequency (hz)
  • expired_keys, keyspace_misses
  • The application’s read patterns
  • Or overall eviction policy

User can form a comprehensive view of how many keys are truly missed by the application due to unexpected TTL expiry. Based on the stat, user can tune TTL setting or hz.

given said that I do not have a strong objection tracking this metric, but would it be so much different than tracking the lazy-expired key events?

keyspace expired event can be generated from active expiry and passive expiry both. we can get more deeper insight to expired items using the stat and keyspace expired event.

@ranshid
Copy link
Member

ranshid commented Jan 12, 2025

User can form a comprehensive view of how many keys are truly missed by the application due to unexpected TTL expiry. Based on the stat, user can tune TTL setting or hz.

I am not sure how well a user can achieve that from what you just mentioned. Users are mostly better off not touching the hz config and I cannot understand how this can be deducted from the expired_keys, keyspace_misses, application’s read patterns or overall eviction policy.

keyspace expired event can be generated from active expiry and passive expiry both. we can get more deeper insight to expired items using the stat and keyspace expired event.

I know. what I meant was if only adding a statistic for lazy-expired keys would not be good enough (although it will also capture the expired on writes). IMO this might be more helpful statistic.

@proost
Copy link
Author

proost commented Jan 13, 2025

I know. what I meant was if only adding a statistic for lazy-expired keys would not be good enough (although it will also capture the expired on writes). IMO this might be more helpful statistic.

Ah, now I see. Yes you are right. Stat is not enough. lazy expired keyspace event should come with the stat.

@proost
Copy link
Author

proost commented Jan 17, 2025

@ranshid
I agree with you. So i change to add keyspace notification too to the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants