-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Standard] KaaS default StorageClass v2 (scs-0211-v2) #658
Conversation
Signed-off-by: Martin Morgenstern <[email protected]>
Signed-off-by: Martin Morgenstern <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except...
Co-authored-by: Joshua Mühlfort <[email protected]> Signed-off-by: Martin Morgenstern <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add something about previous versions, as in https://docs.scs.community/standards/scs-0100-v3-flavor-naming
-- first, one sentence in the intro about this version, and then one section further below about previous versions and what has changed compared to those.
Signed-off-by: Martin Morgenstern <[email protected]>
Signed-off-by: Martin Morgenstern <[email protected]>
Previously, the backing storage of the default storage class was required to be protected | ||
against data loss caused by a physical disk or host failure. | ||
It also contained recommendations (MAY) with regard to redundant storage across hosts | ||
or availability zones. | ||
In this revision, these requirements and recommendations have been dropped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This clarification actually confused me a bit, as I was under the impression that protection "against data loss caused by a physical disk or host failure" was intended to be covered by "MUST NOT be backed by local or ephemeral storage".
Regarding edge setups, you concluded in #652:
- the requirement of "not being backed by local storage" should remain
...- there was a quick discussion whether this is too strict from a standards point of view
- example: edge clouds probably cannot fulfill this -> but this a very special case and might be solved by another hypothetical certificate scope for edge environments
There are not that many options (or use cases) to implement storage that is NOT being backed by local storage, yet NOT being protected against data loss due to disk/host failure.
So this concession to CSP's will probably create little value for CSP's, yet would reduce the value for users who may want to rely on SCS standards to be "sensible/common defaults". I do not know of any major block storage cloud offering that does not provide basic redundancy or at least snapshotting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This clarification actually confused me a bit, as I was under the impression that protection "against data loss caused by a physical disk or host failure" was intended to be covered by "MUST NOT be backed by local or ephemeral storage".
I removed the "physical disk or host failure" sentence because we will not be able to test this as part of the conformance checks anyway.
My reasoning behind this is: even if you use the Cinder CSI, or let's say, the Rook CSI provisioner, the backing storage could in theory be misconfigured wrt to single host or disk failure tolerance, but there is, AFAIK, no reliable way to check this from inside the K8s cluster (and with minimal privileges).
Of course, our expectation would be that this failure tolerance is configured correctly by the CSP. As a compromise I can add a sentence that documents this expectation.
In my understanding, the important bit of this standard is that a provisioned volume isn't bound to the node's lifecycle, i.e., it doesn't just use something like the rancher.io/local-path
or the lvm.csi.metal-stack.io
provisioner.
For the test, I currently plan to check against an allowlist of provisioners (e.g., cinder.csi.openstack.org
, *.rbd.csi.ceph.com
, driver.longhorn.io
, may need extension in the future), because I think it is the most pragmatic approach. (Going one step further, the test could even try to mount the same PVC on two different nodes.)
So this concession to CSP's will probably create little value for CSP's, yet would reduce the value for users who may want to rely on SCS standards to be "sensible/common defaults". I do not know of any major block storage cloud offering that does not provide basic redundancy or at least snapshotting.
I do not agree completely: there is a value for the user if we prevent that a "simple" provisioner is used (like local-path), e.g., volume is independent of the node, can be used on another node, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding testability: Alright, did not know that this that much of an important factor. 👍
I do not agree completely: there is a value for the user if we prevent that a "simple" provisioner is used (like local-path), e.g., volume is independent of the node, can be used on another node, etc.
Definitely! Yet, it's kind of a niche use case.
In my experience, one would want to have either...
- some failure-safe volume for any kind of application. Maybe a trivial non-HA database installation for non-HA use cases. While availability may be of little concern, data durability often will be, making a storage backend with some hands-off redundancy very much favorable. This is the default on most clouds, which will also work with most use cases involving applications implementing data high availability/durability themselves.
OR - some not necessarily failure-safe volume (may be local/node-bound/non-redundant, possibly fast) to run applications that take care of data durability and availability on application level.
While offering a default storage class which is not redundant/failure-safe/replicated in any way, only non-local, would make e. g. "A lot of third-party software, such as Helm charts, [which] assume that a default storage class is configured" technically initially installable, it would probably also break assumptions of these charts regarding data durability, undermining this standard's motivation for an "out-of-the-box working experience" (at least some time later if/when a disk eventually fails).
That may be the outcome (yet I do not know of CSP's who are really restrained here), but no matter what, IMHO it should be spelled out more explicitly also in the "Decision" section.
Signed-off-by: Katharina Trentau <[email protected]>
added the omitted requirements as recommendations under Decisions |
Signed-off-by: Katharina Trentau <[email protected]>
Signed-off-by: Katharina Trentau <[email protected]>
Signed-off-by: Katharina Trentau <[email protected]>
…gnCloudStack/standards into 652-kaas-default-storageclass-v2
Signed-off-by: Katharina Trentau <[email protected]>
Signed-off-by: Katharina Trentau <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
necessary requirements for fail-safe data carriers have been added as recommendations, as there may be use cases that allow these to be ignored. In general, these recommendations can be adhered to and checked by selecting the right provisioner.
Resolves: #652