-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiprocessing: Use a private context for set_start_method #1270
Closed
zmedico
wants to merge
10
commits into
gentoo:master
from
zmedico:bug_924416_private_multiprocessing_context
Closed
multiprocessing: Use a private context for set_start_method #1270
zmedico
wants to merge
10
commits into
gentoo:master
from
zmedico:bug_924416_private_multiprocessing_context
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The egencache usage in ResolverPlayground that was used to trigger bug 924319 triggered a pickling error for AuxdbTestCase.test_anydbm with the multiprocessing spawn start method, so fix the anydbm cache module to omit the unpicklable database object from pickled state, and regenerate it after unpickling. Bug: https://bugs.gentoo.org/924319 Signed-off-by: Zac Medico <[email protected]>
Use a deallocate_config future to release self.settings when it is no longer needed. It's necessary to manage concurrency since commit c95fc64 because mutation of self.settings is no longer limited to the EbuildMetadataPhase _start method, where exclusive access was guaranteed within the main thread. Add support to the isAlive() method to detect when the EbuildMetadataPhase has started but the pid is not yet available (due to async_check_locale usage from commit c95fc64). This can be used to check if an EbuildMetadataPhase instance has been successfully started so that it can be relied upon to set the result of the deallocate_config future. Bug: https://bugs.gentoo.org/924319 Signed-off-by: Zac Medico <[email protected]>
For EbuildMetadataPhase consumers like MetadataRegen and depgraph, store a pool of config instances in a config_pool list, and return instaces to the list when the deallocate_config future is done. Bug: https://bugs.gentoo.org/924319 Fixes: c95fc64 ("EbuildPhase: async_check_locale") Signed-off-by: Zac Medico <[email protected]>
Make the ResolverPlayground _create_ebuild_manifests method call egencache --jobs, which reliably triggers the KeyError from bug 924319 for multiple tests: lib/portage/tests/bin/test_doins.py::DoIns::testDoInsFallback Exception in callback EbuildMetadataPhase._async_start_done(<Task finishe...Error('EAPI')>) handle: <Handle EbuildMetadataPhase._async_start_done(<Task finishe...Error('EAPI')>)> Traceback (most recent call last): File "/usr/lib/python3.12/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "lib/_emerge/EbuildMetadataPhase.py", line 154, in _async_start_done future.cancelled() or future.result() ^^^^^^^^^^^^^^^ File "lib/_emerge/EbuildMetadataPhase.py", line 130, in _async_start retval = portage.doebuild( ^^^^^^^^^^^^^^^^^ File "lib/portage/package/ebuild/doebuild.py", line 1030, in doebuild doebuild_environment( File "lib/portage/package/ebuild/doebuild.py", line 519, in doebuild_environment eapi = mysettings.configdict["pkg"]["EAPI"] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^ File "lib/portage/util/__init__.py", line 1684, in __getitem__ return UserDict.__getitem__(self, item_key) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "lib/portage/cache/mappings.py", line 175, in __getitem__ return self.data[key] ~~~~~~~~~^^^^^ KeyError: 'EAPI' Bug: https://bugs.gentoo.org/924319 Signed-off-by: Zac Medico <[email protected]>
Wrap asyncio.Lock for compatibility with python 3.9 where the deprecated loop parameter is required in order to avoid "got Future <Future pending> attached to a different loop" errors. The pordbapi async_aux_get method can use asyncio.Lock to serialize access to its doebuild_settings attribute in order to prevent issues like bug 924319. Bug: https://bugs.gentoo.org/924319 Signed-off-by: Zac Medico <[email protected]>
For the portdbapi async_aux_get method, there is not a very good place to store a config pool, so instead use asyncio.Lock to manage access to the portdbapi doebuild_settings attribute when using the main event loop in the main thread. For other threads, clone a config instance since we do not have a thread-safe config pool. This cloning is expensive, but since portage internals do not trigger this case, it suffices for now (an AssertionError ensures that internals do not trigger it). For the main event loop running in the main thread, performance with the asyncio.Lock should not be significantly different to performance prior to commit c95fc64, since check_locale results are typically cached and before there was only a single shared doebuild_settings instance with access serialized via the EbuildMetadataPhase _start method. Update async_aux_get callers to use asyncio.ensure_future on the returned coroutine when needed, since it used to return a future instead of a coroutine, and sometimes a future is needed for add_done_callback usage. In the portdbapi async_fetch_map method, fix a broken reference to "future" which should have been "aux_get_future", an error discovered while testing this patch. Bug: https://bugs.gentoo.org/924319 Fixes: c95fc64 ("EbuildPhase: async_check_locale") Signed-off-by: Zac Medico <[email protected]>
Bug: https://bugs.gentoo.org/923841 Signed-off-by: Zac Medico <[email protected]>
Change config.environ() check_locale calls to async_check_locale calls in the EbuildPhase _async_start method in order to eliminate synchronous waiting for child processes in the main event loop thread. Bug: https://bugs.gentoo.org/923841 Signed-off-by: Zac Medico <[email protected]>
4afc501
to
56d749f
Compare
Wrap the multiprocessing module to return a private multiprocessing context instead. It's safe to call set_start_method on this and it will only affect portage internals, which makes it compatible with the pytest-xdist plugin. Bug: https://bugs.gentoo.org/924416 Signed-off-by: Zac Medico <[email protected]>
8464a80
to
d15cfc6
Compare
Since pytest-xdist workers crash intermittently for the multiprocessing spawn start method, use pytest-rerunfailures to detect and handle this case. Only use pytest-rerunfailures for the spawn start-method since that is the only case where we've observed intermittent pytest-xdist worker crashes, and use --only-rerun 'worker .* crashed while running' to ensure that rerun only triggers for worker crashes. Bug: https://bugs.gentoo.org/924416 Signed-off-by: Zac Medico <[email protected]>
d15cfc6
to
beeffdb
Compare
I'll close this since it seems like the private multiprocessing context does not really solve the pytest-xdist worker crashes. Opened #1273 to just handle it with pytest-rerunfailures. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Wrap the multiprocessing module to return a private multiprocessing context instead. It's safe to call set_start_method on this and it will only affect portage internals, which makes it compatible with the pytest-xdist plugin.
Bug: https://bugs.gentoo.org/924416
Seeing pytest-xdist worker crashes less frequently now:
https://github.com/gentoo/portage/actions/runs/7897425289/job/21552979356
https://github.com/gentoo/portage/actions/runs/7897425289/job/21553620257?pr=1270
Maybe pytest-dev/pytest-rerunfailures#158 could be a solution at this point, since it handles
pytest-xdist -n
worker crashes.NOTE: Temporarily added the commits from #1267 to see if they trigger more pytest-xdist runner crashes.