-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Race condition in FileSystems initialisation #33965
Comments
FileSystems.setDefaultPipelineOptions() should be called once on worker startup. It is an internal method and is not supposed to be used outside worker. Would you mind sharing stacktrace so we can identify excessive call in Beam code path? |
@liferoad I wonder why is this @Abacn , I admit my use-case is not a standard one. I am in a mixed pure Spark and Beam on Spark runner environment. I did came across this when in pure spark I am trying to reuse filesystem utility and was calling I understand that API is internal and subject to change. But still I believe there is race condition in the init. @Abacn , can you share how does workers ensure this is consistently initialised under multithreaded execution. Maybe it can inspire me with some thoughts. I was not able to find it hence I believe that in Beam the race is there perhaps too just initi sequence is more loaded therefore from initial call to |
will my PR #34007 fix the issue you have? I closed this since I thought #34007 (comment) could resolve your problem. Reopen this. |
This class is used by Spark/Flink/Samza/Jet runners. A fix could be move to use |
This issue is the same as #33965. SerializablePipelineOptions shouldn't call FileSystems.setDefaultPipelineOption |
@liferoad yes, something like that would fix the problem
@Abacn That fix works for my specific use case, but the underlying issue remains for anyone using |
setDefaultPipelineOptions() is supposed to be called once, per job, on worker, at the beginning of pipeline execution. Because it initializes FileSystems with a given PipelineOption. A second invocation will overwrite the PipelineOption in FileSystem, if it is still used by the first pipeline, then the first pipeline is running into inconsistent state where its FileSystem interface are overriden by others However it is used in other place that really shouldn't to, to be able to use Beam FileSystem outside pipeline execution, and SerializablePipelineOptions constructor is one of it. So I think a proper fix is to eliminate setDefaultPipelineOptions and change to registerFileSystemsOnce |
What happened?
When method FileSystems.setDefaultPipelineOptions() is called form multiple threads followed by using the FileSystems, then some task will fail as
FILESYSTEM_REVISION
will already be set butSCHEME_TO_FILESYSTEM
is not yet initialised.Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: