-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: LRU Caching for PSF, ability to set cache size with envvar #98
base: main
Are you sure you want to change the base?
Conversation
this looks great to me. Only thing i'm thinking about is the presence of the |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #98 +/- ##
==========================================
- Coverage 84.79% 84.77% -0.02%
==========================================
Files 47 47
Lines 3156 3160 +4
==========================================
+ Hits 2676 2679 +3
- Misses 480 481 +1 ☔ View full report in Codecov by Sentry. |
Ah yeah I didn't think of this, I will add |
src/microsim/psf.py
Outdated
IN_MEM_PSF_CACHE_SIZE = int( | ||
os.getenv("MICROSIM_IN_MEM_PSF_CACHE_SIZE", IN_MEM_PSF_CACHE_SIZE_DEFAULT) | ||
) | ||
logging.info(f"In-memory PSF cache size has been set to: {IN_MEM_PSF_CACHE_SIZE}.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh! and there's one more thing. Not sure if you've used pydantic-settings before. One nice feature is that it automatically loads environment variables, including nested variables. So we don't really need to re-check this value (and for the sake of consistency with all the other things that can be loaded via env var... maybe we shouldn't?)
so, one thing we could do here is:
- Move
Settings.in_mem_psf_cache_size
toCacheSettings.num_psfs
- Just put the default inline:
class CacheSettings(SimBaseModel): read: bool = True write: bool = True num_psfs: int = Field( default=64, ...
- use
Settings().cache.num_psfs
in psf.py rather than re-reading from env var - You can then use
MICROSIM_CACHE__PSF_SIZE=10
to set the variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I looked up that Pydantic Settings load the environment variables, and read the microsim docs 😄. I was originally going to do Settings.model_fields["in_mem_psf_cache_size"].default
but I didn't mostly because I thought it looked ugly, but I didn't consider simply instantiating the Settings class.
As I wrote in the notes of my original PR description, I didn't put the parameter in the CacheSettings
since that is related to the on disk cache, and the number of PSFs saved in there will be different? I can make the change though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I wrote in the notes of my original PR description, I didn't put the parameter in the CacheSettings since that is related to the on disk cache
Ah sorry I missed that! That does make sense... but I also don't mind generalizing semantics of CacheSettings
to include all caching-related settings. Happy to go with whatever you felt was more intuitive here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I will change it to num_psfs_in_memory
in the CacheSettings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that there are a few other cached functions about, that maybe you also want to be able to control the size of? Do you think the following is unnecessary?
class InMemoryCacheSizes(SimBaseModel):
"""
Parameters that control the cache size of various cached functions.
"""
psf: int = Field(
default=64,
description=(
"The maximum number of PSFs that will be stored in the in-memory cache, "
"which follows the LRU caching strategy. Note, this setting will only take "
"effect by modifying the equivalent environment variable prior to "
f"importing microsim."
),
frozen=True,
)
class CacheSettings(SimBaseModel):
read: bool = True
write: bool = True
in_mem_size: InMemoryCacheSizes = Field(default_factory=InMemoryCacheSizes)
Now the environment variable will be MICROSIM_CACHE__IN_MEM_SIZE__PSF
, which is pretty long. But more cached functions can be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it would be nice to have a way to control all of those, but it is pretty annoying to have to do it all via environment variables huh? I recently added dynamic control of lru_cache size to another project (here)
but I feel like this is scope creep for this PR. I'm happy enough with the PR as is. I think the only thing we should be sure we're happy with before merge is the naming of the settings fields (since it would be breaking to change them later). I do like the InMemoryCacheSizes
design you propose here. While perhaps verbose, it's very clear... So, I'd say let's go with it? And we add conveniences so that people don't need to always use that long clunky env var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess it will be quite rare that someone wants to change from the default anyway. Ok, I will add the InMemoryCacheSizes
class.
This PR decorates the
microsim.psf.cached_psf
withfunctools.lru_cache
rather thancache
. This is to stop memory usage increasing indefinitely if a large number of images are being generated. Additionally, the max size of the cache can be set with an environment variable.Notes:
in_mem_psf_cache_size
toSettings
and notCacheSettings
because it seemed to meCacheSettings
relates more to the cache on disk. But I can move it if you think it's better.Closes #97
Changes
IN_MEM_PSF_CACHE_SIZE_DEFAULT
variable tomicrosim/schema/settings.py
in_mem_psf_cache_size
field toSettings
microsim/psf.py
cache
decorator tolru_cache
decorator onmicrosim.psf.cached_psf
.