-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to skip blocking pod startup if driver is not ready to create a request yet #20
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: munnerz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm, thanks @munnerz for this effort.
@@ -146,5 +146,10 @@ func (m *MemoryFS) ReadFiles(volumeID string) (map[string][]byte, error) { | |||
if !ok { | |||
return nil, ErrNotFound | |||
} | |||
return vol, nil | |||
// make a copy of the map to ensure no races can occur |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly using sync.RWMutex
and RLock()
to avoid the read race condition ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we return this map, it's not possible to do so as we can't enforce call-sites create a mutex. To push this onus onto the caller creates a fair bit of extra complexity and I'm not convinced that outweighs the performance gains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got your idea. thanks.
driver/nodeserver.go
Outdated
// Only wait for the volume to be ready if it is in a state of 'ready to request' | ||
// already. This allows implementors to defer actually requesting certificates | ||
// until later in the pod lifecycle (e.g. after CNI has run & an IP address has been | ||
// allocated, if a user wants to embed pod IPs into their requests). | ||
isReadyToRequest := ns.manager.IsVolumeReadyToRequest(req.GetVolumeId()) | ||
if isReadyToRequest || !ns.continueOnNotReady { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we simplify the logic here that, wait for volumeReady only if continueOnNotReady
is false ?
Here, if a driver use default AlwaysReadyToRequest
func, isReadyToRequest
may always return true
. In this case, continueOnNotReady
becomes useless ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct - the logic is that if we are ready to request straight away, we wait/block. This was intentional so that we only skip waiting if we aren't 'ready' to wait (and it's been configured to skip in these cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. okay to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR makes sense to me and gives users a way to begin implementing these features which are blocked on kube order of operation.
The main thing that sticks out to me is that there is no way for a ReadyToRequest
to signal early that it is ready, and instead has to wait for the check at the next backoff step (maximum of 1 min currently). A lack of this signal does potentially increase the issue time significantly, being exponential. Just missing the ~30 sec step for example, and then having to wait for another ~minute is quite painful. Even the fact it starts at 2 seconds could be quite significant when added up over a large number of volumes/pods.
Perhaps this is an non-issue as ReadyToRequestFunc
is expected to return true
after a brief period anyway, but something to keep in mind.
38e3cb2
to
24ecb2c
Compare
…ate a request yet Signed-off-by: James Munnelly <[email protected]>
Signed-off-by: James Munnelly <[email protected]>
Signed-off-by: James Munnelly <[email protected]>
Signed-off-by: James Munnelly <[email protected]>
Signed-off-by: James Munnelly <[email protected]>
Signed-off-by: James Munnelly <[email protected]>
24ecb2c
to
e152da4
Compare
PR is looking good to me! /lgtrm |
/lgtm |
This will help facilitate cert-manager/csi-driver#17 and any other uses where we want to permit the pod to startup even if the CSI driver is not yet ready.
This feature comes with the caveat that user applications/pods MUST be designed to tolerate certificate/private key data not being available when the pod first starts up (as the driver will no longer block pod startup until the volume is ready to request a certificate).