Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: LRU Caching for PSF, ability to set cache size with envvar #98

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

melisande-c
Copy link

@melisande-c melisande-c commented Feb 13, 2025

This PR decorates the microsim.psf.cached_psf with functools.lru_cache rather than cache. This is to stop memory usage increasing indefinitely if a large number of images are being generated. Additionally, the max size of the cache can be set with an environment variable.

Notes:

  • I added the variable to the settings, mostly for documentation, but it will not change the cache size if set here. The cache size can only be modified with an environment variable since the cache size is set at import. I looked up a few ugly work arounds that involve wrapping the function or redecorating the function dynamically, but I decided the current solution was adequate.
  • I added in_mem_psf_cache_size to Settings and not CacheSettings because it seemed to me CacheSettings relates more to the cache on disk. But I can move it if you think it's better.

Closes #97

Changes

  • Added IN_MEM_PSF_CACHE_SIZE_DEFAULT variable to microsim/schema/settings.py
  • Added in_mem_psf_cache_size field to Settings
  • Added mechanism to get default or environment variable size in microsim/psf.py
  • Modified cache decorator to lru_cache decorator on microsim.psf.cached_psf.

@melisande-c melisande-c changed the title feat: lru caching for psf, with ability to set size with envvar feat: LRU Caching for PSF, ability to set cache size with envvar Feb 13, 2025
@tlambert03
Copy link
Owner

this looks great to me. Only thing i'm thinking about is the presence of the Settings.in_mem_psf_cache_size field. I think your field description captures perfectly the point that you really can't actually set the setting. But maybe add frozen=True to the field as well, so that someone gets an error if they try?

Copy link

codecov bot commented Feb 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.77%. Comparing base (729dcfe) to head (3bce109).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #98      +/-   ##
==========================================
- Coverage   84.79%   84.77%   -0.02%     
==========================================
  Files          47       47              
  Lines        3156     3160       +4     
==========================================
+ Hits         2676     2679       +3     
- Misses        480      481       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@melisande-c
Copy link
Author

But maybe add frozen=True to the field as well, so that someone gets an error if they try?

Ah yeah I didn't think of this, I will add

Comment on lines 25 to 29
IN_MEM_PSF_CACHE_SIZE = int(
os.getenv("MICROSIM_IN_MEM_PSF_CACHE_SIZE", IN_MEM_PSF_CACHE_SIZE_DEFAULT)
)
logging.info(f"In-memory PSF cache size has been set to: {IN_MEM_PSF_CACHE_SIZE}.")

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! and there's one more thing. Not sure if you've used pydantic-settings before. One nice feature is that it automatically loads environment variables, including nested variables. So we don't really need to re-check this value (and for the sake of consistency with all the other things that can be loaded via env var... maybe we shouldn't?)

so, one thing we could do here is:

  1. Move Settings.in_mem_psf_cache_size to CacheSettings.num_psfs
  2. Just put the default inline:
    class CacheSettings(SimBaseModel):
        read: bool = True
        write: bool = True
        num_psfs: int = Field(
            default=64,
            ...
  3. use Settings().cache.num_psfs in psf.py rather than re-reading from env var
  4. You can then use MICROSIM_CACHE__PSF_SIZE=10 to set the variable

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I looked up that Pydantic Settings load the environment variables, and read the microsim docs 😄. I was originally going to do Settings.model_fields["in_mem_psf_cache_size"].default but I didn't mostly because I thought it looked ugly, but I didn't consider simply instantiating the Settings class.

As I wrote in the notes of my original PR description, I didn't put the parameter in the CacheSettings since that is related to the on disk cache, and the number of PSFs saved in there will be different? I can make the change though.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote in the notes of my original PR description, I didn't put the parameter in the CacheSettings since that is related to the on disk cache

Ah sorry I missed that! That does make sense... but I also don't mind generalizing semantics of CacheSettings to include all caching-related settings. Happy to go with whatever you felt was more intuitive here

Copy link
Author

@melisande-c melisande-c Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I will change it to num_psfs_in_memory in the CacheSettings

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that there are a few other cached functions about, that maybe you also want to be able to control the size of? Do you think the following is unnecessary?

class InMemoryCacheSizes(SimBaseModel):
    """
    Parameters that control the cache size of various cached functions.
    """
    psf: int = Field(
        default=64,
        description=(
            "The maximum number of PSFs that will be stored in the in-memory cache, "
            "which follows the LRU caching strategy. Note, this setting will only take "
            "effect by modifying the equivalent environment variable prior to "
            f"importing microsim."
        ),
        frozen=True,
    )

class CacheSettings(SimBaseModel):
    read: bool = True
    write: bool = True
    in_mem_size: InMemoryCacheSizes = Field(default_factory=InMemoryCacheSizes)

Now the environment variable will be MICROSIM_CACHE__IN_MEM_SIZE__PSF, which is pretty long. But more cached functions can be added.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it would be nice to have a way to control all of those, but it is pretty annoying to have to do it all via environment variables huh? I recently added dynamic control of lru_cache size to another project (here)

but I feel like this is scope creep for this PR. I'm happy enough with the PR as is. I think the only thing we should be sure we're happy with before merge is the naming of the settings fields (since it would be breaking to change them later). I do like the InMemoryCacheSizes design you propose here. While perhaps verbose, it's very clear... So, I'd say let's go with it? And we add conveniences so that people don't need to always use that long clunky env var?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess it will be quite rare that someone wants to change from the default anyway. Ok, I will add the InMemoryCacheSizes class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PSF Cache Memory Leak
2 participants