Make `simple-parsing` faster #278

anivegesana · 2023-08-03T01:00:06Z

Is your feature request related to a problem? Please describe.
Thank you for your work on #276! In the parser that our team wrote, we noticed that most of the time needed to generate the --help menu was spent inside of the inspect.getsource function. I think the issue is that the issue is that caching is done on _get_attribute_docstring, but the majority of the time is spent in inspect.getsource, which has to be done multiple times for the same dataclass since the key of the cache is (dataclass, field_name).

Describe the solution you'd like
Constructing caches for the inspect.getsource, inspect.getdoc, and dp.parse functions dramatically speeds up the construction of the parser. I ran a simple experiment where I didn't change any code of the simple-parsing library and simply prepended the following lines of code to my script:

import functools
import inspect

import docstring_parser as dp

inspect.getsource = functools.lru_cache(2048)(inspect.getsource)
inspect.getdoc = functools.lru_cache(2048)(inspect.getdoc)
dp.parse = functools.lru_cache(2048)(dp.parse)

Construction without monkey patch: 0.902 s
Parsing without monkey patch: 0.323 s
Construction with monkey patch: 0.020 s
Parsing with monkey patch: 0.021 s

Also, you import numpy even if you don't use it. It is possible to import numpy lazily so that --help menus will be more snappy. (I haven't checked to see if this is a major source of unnecessary waiting, but, on my machine, it seems to take up ~0.105s/0.211s to construct the parser and parse the arguments.This may not may not also apply to yaml.)

To lazily load numpy, you can create a module called lazy_numpy.py and use the following code:

def __getattr__(attr):
    import numpy as np
    globals().update(vars(np))
    return getattr(np, attr)

def __dir__():
    import numpy as np
    globals().update(vars(np))
    return dir(np)

Importing lazy_numpy does not import numpy but accessing any attribute within lazy_numpy does.

Describe alternatives you've considered
I haven't tried other locations that caching could be improved. I just think that, if three lines of code can speed things up this dramatically, might as well add them.

Additional context
I can open a PR, but I do not have a good idea on how to implement test cases for this.

The text was updated successfully, but these errors were encountered:

lebrice · 2023-08-03T04:03:19Z

Thanks a lot @anivegesana , You're more than welcome to submit a PR for the caching, I think the current tests are sufficient in convering the docstring creation, I dont think additional tests are needed.

Fixes #278 Signed-off-by: Fabrice Normandin <[email protected]>

* Add pytest-benchmark test dependency Signed-off-by: Fabrice Normandin <[email protected]> * Add performance files for before the changes Signed-off-by: Fabrice Normandin <[email protected]> * Use cached versions of inspect.getdoc/getsource Fixes #278 Signed-off-by: Fabrice Normandin <[email protected]> * Make numpy import lazy Signed-off-by: Fabrice Normandin <[email protected]> * Simplify the benchmark code a bit Signed-off-by: Fabrice Normandin <[email protected]> * Remove .benchmarks file from git history Signed-off-by: Fabrice Normandin <[email protected]> * Playing around with GitHub actions for benchmark Signed-off-by: Fabrice Normandin <[email protected]> * Remove upload workflow, add benchmark Signed-off-by: Fabrice Normandin <[email protected]> * Tweak benchmark.yml and build.yml Signed-off-by: Fabrice Normandin <[email protected]> * Only run workflow on push to master Signed-off-by: Fabrice Normandin <[email protected]> --------- Signed-off-by: Fabrice Normandin <[email protected]>

lebrice · 2023-08-08T18:01:29Z

Hey @anivegesana , just FYI, I did make the numpy import lazy, and added the lru_cache to all the inspect functions in #279

anivegesana · 2023-08-08T18:14:22Z

Thank you! I noticed that my --help menu was much faster with yesterday's release. I think that there is a place that you forgot to remove an import for numpy: anivegesana@153eb03

Thank you for your awesome work!

lebrice · 2023-08-08T21:06:13Z

Oops. Thanks for pointing it out

lebrice added a commit that referenced this issue Aug 3, 2023

Use cached versions of inspect.getdoc/getsource

00b3c15

Fixes #278 Signed-off-by: Fabrice Normandin <[email protected]>

lebrice added a commit that referenced this issue Aug 3, 2023

Use cached versions of inspect.getdoc/getsource

c0d9f6b

Fixes #278 Signed-off-by: Fabrice Normandin <[email protected]>

lebrice added a commit that referenced this issue Aug 3, 2023

Use cached versions of inspect.getdoc/getsource

15badca

Fixes #278 Signed-off-by: Fabrice Normandin <[email protected]>

lebrice mentioned this issue Aug 3, 2023

Increase import performance (lru_cache) and add pytest-benchmark #279

Merged

lebrice closed this as completed in #279 Aug 3, 2023

lebrice mentioned this issue Aug 3, 2023

Add Performance Regression tests to CI #282

Merged

lebrice reopened this Aug 8, 2023

lebrice mentioned this issue Aug 10, 2023

Faster import (lazy numpy) #285

Merged

lebrice closed this as completed in #285 Aug 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `simple-parsing` faster #278

Make `simple-parsing` faster #278

anivegesana commented Aug 3, 2023 •

edited

Loading

lebrice commented Aug 3, 2023

lebrice commented Aug 8, 2023

anivegesana commented Aug 8, 2023

lebrice commented Aug 8, 2023

Make simple-parsing faster #278

Make simple-parsing faster #278

Comments

anivegesana commented Aug 3, 2023 • edited Loading

lebrice commented Aug 3, 2023

lebrice commented Aug 8, 2023

anivegesana commented Aug 8, 2023

lebrice commented Aug 8, 2023

Make `simple-parsing` faster #278

Make `simple-parsing` faster #278

anivegesana commented Aug 3, 2023 •

edited

Loading