First Implemetation of export cache functionality #29

broeder-j · 2020-03-12T12:20:05Z

Hi,
I put this here for review, @PhilippRue and @greschd since my time to work on this is sparse.
The code might be not totally fine yet, I left both fixture versions (run_with_cache, with_export_cache) side by side. Maybe we should do a first merge, that we can continue to work on smaller things at a time.

After the slack discussion:
The fixture run_with_cache was split, that one can use export_cache and load_cache alone. There is are simple test for these.
In order for the run_with_cache fixture test to work, I monkey patched code.get_object_to_hash and calcjob.get_object_to_hash (since it used the computer uuid). there is a fixture for this. thanks greschd for this suggestion.

PhilippRue has solved this issue different by introducing the aiida-core and mock code with a session scope instead of function and imports of computers and codes. In aiida-kkr he has running examples of with run_with_cache. This may not be a solution for several developers.

The builder hash now supports nested namespaces.

Further I have implemented also a second fixture to run with exports where you have to specify the export to use, feature suggestions to this by greschd:

allow for multiple calculation classes (currently all or one, maybe with_caching in aiida-core has to support it first)
making sure computer/ code hash to the same value for different machines (should work with the monkeypatch fixture, there is a test for this)

Known issue:
Sometimes the caching when running does not work (which causes the tests to fail), which I do not understand the hash of the imported and running calcjob is the same. @greschd maybe you spot the reason for this.

Then the fixture run_with_cache should also work with yield
maybe the diff workchain for testing should not be in the test file?

I left both fixture versions in, since I would like the export name to be choosen in an automatic way per default. fix #20

Add documentation build to CI

Improve installation instructions.

…team#25, also adjusted test to include this

…e_25 allowed for expressions in ingnore file lists, implements issue aiidateam#25, …

…this, for teting on a different system and that philipp can join debugging

…on for the builder

…into implement_caching_wc

… not working yet

…efore

…since localhost and local code is always different

update readme

… monkeypatch how computers and codes are included in the hash of aiida-core

broeder-j · 2020-03-12T12:28:29Z

ok the calcjob hash on the CI is different...

start documenting mock_code

PhilippRue · 2020-03-16T09:48:58Z

aiida_testing/export_cache/_fixtures.py

+    Requires to provide an absolutpath to the export file to load or export to.
+    """
+    @contextmanager
+    def _with_export_cache(data_dir_abspath, calculation_class=None):


if we have a default value for the data dir in run_with_cache (line 299) should we have the same here?

I'd actually prefer not having a default in either of the two places.

I don't have strong feelings towards one or the other. It should just be the same for both ways

PhilippRue · 2020-03-16T09:52:15Z

aiida_testing/export_cache/_fixtures.py

+            if len(union_pk) != 0:
+                process_node_pk = min(union_pk)
+                #export data to reuse it later
+                export_cache(node=load_node(process_node_pk), savepath=full_import_path)


This logic seems problematic. Suppose we prepare the inputs to a workflow with a calcfunction which then has a lower pk. Then the export would be done for the calcfunction and not the workflow.

Why don't we just use export_cache(node=resnode, ...)? Then we are sure to export the workflow.

This was implement because I tried no to use run_get_node within the fixture. if we say it is ok that the workchain is run within the fixture (fixture should only prepare and clean up), this part becomes obsolete. This fixture is therefore like a monkeypatch for run_get_node

OK but this did not work for this test where I prepare an input to the workflow with a calcfunction which therefore has a lower pk. Then only the calcfunction is exported but not the workflow nodes itself.

But maybe this can be fixed with an option for the export.

PhilippRue · 2020-03-17T07:57:10Z

Also we might want to provide the option for the with_export_cache to overwrite the output file. Now the export file is only written if no previous file is found.

This could be helpful for debugging/developing workflows (see https://groups.google.com/forum/?hl=en#!topic/aiidausers/y-ekFBlLgsg from mailing list).

greschd · 2020-03-18T10:44:49Z

aiida_testing/export_cache/_fixtures.py

+
+
+@pytest.fixture(scope='function')
+def export_cache(hash_code_by_entrypoint):


I'm not sure if this should be a fixture (same for load_cache), or just an internal helper function. Do we want users to use the bare export_cache?

I made this an extra fixture, because if we change the import/export functionality to be different from what aiida-core does, as a user I might want to use it in a test (or see what it does for debugging). In aiida-fleur tests I use a fixture to export and import things to create test environments, because we also have some input nodes for workchains where uuids are provided. So there might be a need to have something bare which is adapted for tests (like with specific hashing, or striped/slim exports etc.). For now this defines the interface. Overall I'm also not sure if this should be just a helper as for my current purposes do not need to take it further.

greschd · 2020-03-18T10:52:02Z

aiida_testing/export_cache/_fixtures.py

+        """
+
+        cache_exists = False
+        if data_dir == 'data_dir':


What is this special case for? As far as I can tell, this doesn't actually change the behavior: Below, we use pathlib.Path(full_import_path), which - if for example full_import_path is data_dir/something.tar.gz - is relative to cwd anyway, no?

You are right

greschd · 2020-03-18T10:54:24Z

aiida_testing/export_cache/_fixtures.py

+        print(name)
+
+        # check existence
+        full_import_path = str(data_dir) + '/' + name + '.tar.gz'


The hard-coded slash could be problematic: To be platform-agnostic, this should be

cache_path = pathlib.Path(data_dir) / (name + '.tar.gz')

greschd · 2020-03-18T11:01:31Z

aiida_testing/export_cache/_fixtures.py

+
+        cache_exists = False
+        if data_dir == 'data_dir':
+            cwd = pathlib.Path(os.getcwd())  # Might be not the best idea.


Yeah, using os.getwcd() is definitely not great -- the issue is that one might run pytest from different directories (e.g., form the top-level module directory or inside the tests directory).

In my opinion, both should always work. There are two approaches to this that I've seen:

In the (top-level) conftest.py, make use of __file__ to define a fixture that gives you a constant "anchor directory". See e.g. https://github.com/FrescolinoGroup/pyhdf5io/blob/master/tests/conftest.py#L17

Use the location of the test that you are currently running, and define paths relative to that. This is what pytest-datadir uses.

Both are valid, depending on the use case (do you want to share data between tests, or have it only for a specific test / test directory. I think we should leave this choice up to the user.

I will change this. I was not familiar with pathlib and why to use it at all and not os. Same goes for the tests

we use the package file path for our plugins. I also did not like cwd. Since relative paths are to cwd, the default is ensured to work, which was the only point in doing so.

greschd · 2020-03-18T11:22:22Z

aiida_testing/export_cache/_fixtures.py

+            load_cache(path_to_cache=full_import_path)
+
+        # now run process/workchain whatever
+        with enable_caching():  # should enable caching globally in this python interpreter


Shouldn't it be possible to re-use with_export_cache here? That exports the calculations while here we explicitly export the parts connected to the node, but is this distinction really needed?

We could maybe change the with_enable_caching logic to only export calculations that were created within the context manager (and their direct inputs / outputs). So, for example

run(SomeCalculation, **<inputs>) with with_export_cache(<...>): run(OtherCalculation, **<inputs>)

would export the OtherCalculation, but not SomeCalculation. That would also get around the need to always clean the DB before running the test.

Currently run_with_cache (which is kind of a patch for run_get_node) exists only because I do not know how to explicitly do this and I want an automatic cache naming which changes if the test changes, otherwise debugging becomes harder. How do you know what is run in the with environment? One idea is query for creation time. Then on the other hand if you use as we currently do the default aiida export you get the full provenance, which means if your SomeCalculation and OtherCalculation are connected (which within a test they usually are) they will get exported both anyway.

Yeah, I think it makes sense to just export what is created during the with statement (and its provenance). That can be either by creation time, or maybe by checking what exists already in the DB when entering the with statement - whatever works more reliably.
I'm not sure if we might get into clock synchronization issues (between the DB and the python interpreter, maybe?) when using ctime.

Comment: 'Spliting' the cache
works currently for both fixtures, since the export happens at the end of the with.

with with_export_cache(<...>): run(OtherCalculation, **<inputs>) with with_export_cache(<...>): run(ThisWeCacheSeperateCalculation, **<inputs>)

So, if you know to extract what ran in the contextmanager let me know.

I will have to do some testing to see if there is a good way to extract that...

use fastentrypoints module [1] to load console scripts. not only is it faster, the alternative route via pkg_resources also throws exceptions whenever it detects dependency clashes in the python environment, causing aiida-mock-code to fail. [1] https://github.com/ninjaaron/fast-entry_points

faster entry point loading for console script

…unctions and fixture in export_cache

…tionality for with_export_cache and run_with_cache

…ting into implement_caching_wc

broeder-j · 2020-04-01T10:49:31Z

Also we might want to provide the option for the with_export_cache to overwrite the output file. Now the export file is only written if no previous file is found.

This could be helpful for debugging/developing workflows (see https://groups.google.com/forum/?hl=en#!topic/aiidausers/y-ekFBlLgsg from mailing list).

Done

PhilippRue

I think all my complains are in now. I mainly tested the run_with_cache fixture which now works fine for me

PhilippRue · 2020-04-01T12:30:04Z

aiida_testing/export_cache/_fixtures.py

@@ -345,6 +345,8 @@ def _run_with_cache( # type: ignore

        # check existence
        full_import_path = pathlib.Path(data_dir) / (name + '.tar.gz')
+        # make sure the path is absolute (this is needed by export_cache)
+        full_import_path  = full_import_path.absolute()


if this is not an absolute path export_cache fails because there this is checked

I think we should check that it is an absolute path instead of just making it absolute. Otherwise the test will depend on which directory they are run from.

As much as possible, e.g. running pytest from the root directory should do the same as e.g. cd tests; pytest.

oh yes, that slipped my mind. so how about we make this a check only and get rid of the default value. Then the users would be required to set the paths to executables, output directory etc. themselves and we wouldn't be hiding this in the fixture.
I guess this would be cleaner.

* add cli options for config file management Add cli options: --testing-config-action: Read config file if present ('read'), require config file ('require') or generate new config file ('generate'). --regenerate-test-data: Regenerate test data * add tests for new cli options * add documentation of new cli options * fix: check that the data directory path is absolute and exists * fix: add quotation marks to exported environment variables Paths may contain spaces etc.

greschd · 2020-04-02T08:17:49Z

@broeder-j @PhilippRue I think we can do the following: We merge this PR now - I will go over the functionality and try to implement the "node selection by what ran in the context manager" discussed above. After that I will create a new PR (possibly with some interface changes...), and have the two of you review it / test it on your plugins.

Does that work for you?

PhilippRue · 2020-04-02T08:28:13Z

@broeder-j @PhilippRue I think we can do the following: We merge this PR now - I will go over the functionality and try to implement the "node selection by what ran in the context manager" discussed above. After that I will create a new PR (possibly with some interface changes...), and have the two of you review it / test it on your plugins.

Does that work for you?

sounds good 👍

greschd · 2020-04-03T10:00:46Z

Since it seemed to be incompatible with the changes in #30, I merged it into a separate branch for now (to keep the tests running on develop).

Dominik Gresch and others added 26 commits February 27, 2020 13:51

Improve installation instructions.

60013fb

Merge branch 'develop' into improve_docs

f110204

Add documentation build to CI

4535a2c

Merge pull request aiidateam#24 from greschd/add_doc_to_ci

aa4fa60

Add documentation build to CI

Merge branch 'develop' into improve_docs

4aa540d

Merge pull request aiidateam#23 from greschd/improve_docs

5541e9e

Improve installation instructions.

allowed for expressions in ingnore file lists, implements issue aiida…

5ea2d75

…team#25, also adjusted test to include this

changed implemenation

6d6f594

Merge pull request aiidateam#26 from broeder-j/implement_feature_issu…

ecbaa7e

…e_25 allowed for expressions in ingnore file lists, implements issue aiidateam#25, …

uncomplete work and started to implement export_cache

cbe5168

first pseudo code version of a fixture to run with cashing, I commit …

698635b

…this, for teting on a different system and that philipp can join debugging

bug fixes of run_with_cache fixture and registration on module level

b7c003c

further bug fixes of run_with_cache fixture implemented a hash creati…

1349769

…on for the builder

Automatically store inputs before hashing

8f5e9a8

Merge branch 'develop' of https://github.com/aiidateam/aiida-testing …

853bfed

…into implement_caching_wc

moved generate diff input fixture to conftest, for usage in other tests

a47c0de

splitted fixture with_cache into several, rekursion for nested inputs…

066195f

… not working yet

removed rekursion out of hash_process, we now unnest the dictionary b…

5eeb804

…efore

added basic tests for export_cache, run with fixture is not working, …

38f0763

…since localhost and local code is always different

small bugfixes in export_cache fixtures and import all feature

8451cbb

update readme

ead6e33

Merge pull request aiidateam#27 from ltalirz/fix-readme

3dbc01e

update readme

added two fixtures one for a with environment with caching and one to…

d66d3b7

… monkeypatch how computers and codes are included in the hash of aiida-core

start documenting mock_code

af3ff67

Merge branch 'develop' into document-mock-code

c85b796

added cache files, added further comments to the code

891fd11

broeder-j requested review from PhilippRue and greschd March 12, 2020 12:20

Merge pull request aiidateam#28 from ltalirz/document-mock-code

35ebe9d

start documenting mock_code

PhilippRue reviewed Mar 16, 2020

View reviewed changes

greschd reviewed Mar 18, 2020

View reviewed changes

ltalirz and others added 7 commits March 31, 2020 13:15

Merge pull request aiidateam#34 from ltalirz/fast-entrypoints

9247c3d

faster entry point loading for console script

added processbuildernamespace as allowed type for input of certain f…

35e823d

…unctions and fixture in export_cache

removed os path operations, now with pathlib. included overwrite func…

e7bc252

…tionality for with_export_cache and run_with_cache

Merge branch 'implement_caching_wc' of github.com:broeder-j/aiida-tes…

f8c623a

…ting into implement_caching_wc

Merge branch 'develop' into implement_caching_wc

6c4e03b

Merge branch 'implement_caching_wc' of github.com:broeder-j/aiida-tes…

1c7866e

…ting into implement_caching_wc

Fix abspath requirement in run_with_cache fixture

2249eb5

PhilippRue previously approved these changes Apr 1, 2020

View reviewed changes

Fix double spacing

d8d72d9

PhilippRue dismissed their stale review via d8d72d9 April 1, 2020 12:32

PhilippRue approved these changes Apr 1, 2020

View reviewed changes

Merge branch 'develop' into implement_caching_wc

fa44c3b

greschd changed the base branch from develop to export_cache April 3, 2020 09:59

greschd merged commit f9a6f79 into aiidateam:export_cache Apr 3, 2020

janssenhenning mentioned this pull request Jul 8, 2022

Replace deprecated collections.Iterable with collections.abc.Iterable #54

Merged

janssenhenning mentioned this pull request Jul 18, 2022

New attempt at implementing export_cache fixture #56

Closed

janssenhenning mentioned this pull request Jan 5, 2023

Archive cache fixtures #67

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First Implemetation of export cache functionality #29

First Implemetation of export cache functionality #29

broeder-j commented Mar 12, 2020 •

edited

Loading

broeder-j commented Mar 12, 2020

PhilippRue Mar 16, 2020

greschd Mar 18, 2020

PhilippRue Mar 19, 2020

PhilippRue Mar 16, 2020

broeder-j Mar 18, 2020

PhilippRue Mar 19, 2020

PhilippRue commented Mar 17, 2020

greschd Mar 18, 2020

broeder-j Apr 1, 2020

greschd Mar 18, 2020

broeder-j Apr 1, 2020

greschd Mar 18, 2020

greschd Mar 18, 2020

broeder-j Apr 1, 2020

broeder-j Apr 1, 2020

greschd Mar 18, 2020

broeder-j Apr 1, 2020

greschd Apr 1, 2020

broeder-j Apr 1, 2020

greschd Apr 2, 2020

broeder-j commented Apr 1, 2020

PhilippRue left a comment

PhilippRue Apr 1, 2020

greschd Apr 1, 2020

PhilippRue Apr 2, 2020

greschd commented Apr 2, 2020

PhilippRue commented Apr 2, 2020

greschd commented Apr 3, 2020



		@pytest.fixture(scope='function')
		def export_cache(hash_code_by_entrypoint):

First Implemetation of export cache functionality #29

First Implemetation of export cache functionality #29

Conversation

broeder-j commented Mar 12, 2020 • edited Loading

broeder-j commented Mar 12, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PhilippRue commented Mar 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

broeder-j commented Apr 1, 2020

PhilippRue left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greschd commented Apr 2, 2020

PhilippRue commented Apr 2, 2020

greschd commented Apr 3, 2020

broeder-j commented Mar 12, 2020 •

edited

Loading