-
Notifications
You must be signed in to change notification settings - Fork 4
Consider SADifying Selective Disclosure Blinded Attribute Array #78
Comments
Another solution to this may be to try and correlate the order of the blinded list with the attribute definitions in the |
Even if the schema disallowed objects with the same keys and different values, I think there would be a problem since the data isn't necessarily disclosed together (the schema can still be validated during disclosure). |
@jasoncolburne I must admit I don't understand the attack. I think it might be due to a difference in assumptions about how a verifier verifies any presentation of the ACDC. So I will state my assumptions and see if that is the source of my misunderstanding. A verifier does three verifications.
If any of the three fail then the verification fails. A verifier will never accept as valid a presentation that does not satisfy all three verification. There may be additional verifications of committments by the presenter of a selective disclosure based on a graduate disclosure where the presenter signs additional commitments. Maybe this is where I misunderstand. But I think you are talking about the Issuer's commitments and verifying those. Using your labeling
In order for the verifier to verify any selective disclosure presentation of any combination of blocks drawn from 1, the presenter must provide the list of hashes (SAIDs) in 2). This enables the verifier to recompute 3). So if even one of the hashes (SAIDs) provided by the presenter is different (or if the order of SAIDs is different) then the verifier will compute a different In your example the SAID of the third block in the malicious ACDC is different from the SAID in the third block of the valid ACDC. So when the verifier recomputes I believe that where source of ambiguity is a limitation of JSON Schema's uniqueItems property, which as you point out would allow multiple copies of the same sub-schema but the uniqueItems is applied in combination with the anyOf. I believe that one may not repeat subschema from the anyOf list of subschema. Admittedly the documentation of anyOf does not make this clear, but it is implied and all the examples I could find imply non-repetition. Your example repeats the subschema with the legalName subschema. It would be good to verify this property of anyOf. Notwithstanding an anyOf that allows repeated subschema, the computation of What am I missing? |
The proof process for selective disclosure is outlined in detail in section 13.3.3 of the ACDC Spec. the relevant language is hoisted below: Given that aggregate value A appears as the compact value of the top-level attribute section,
The actual detailed disclosed attribute block is only disclosed after the disclosee has agreed to the terms of the rules section. Therefore, in the event the potential disclosee declines to accept the terms of disclosure, then a presentation of the compact version of the ACDC and/or the list of attribute digests, [a0, a1, ...., aN-1]. does not provide any point of correlation to any of the attribute values themselves. The attributes of block j are hidden by aj and the list of attribute digests [a0, a1, ...., aN-1] is hidden by the aggregate A. The partial disclosure needed to enable chain-link confidentiality does not leak any of the selectively disclosable details. The disclosee may then verify the disclosure by:
The last 3 steps that culminate with verifying the signature(s) require determining the key state of the Issuer at the time of issuance, this may require additional verification steps as per the KERI, PTEL, and CESR-Proof protocols. |
@jasoncolburne, just for the better understanding: Are we talking about an instance of a double-commitment scheme, which you consider an attack? A commitment scheme allows one to commit to a chosen value while keeping it hidden to others, with the ability to reveal the chosen value later. These are usually characterized by two phases: the commit phase and the reveal phase. If an issuer makes two claims and chooses to reveal one at a later time, this seems to resemble a form of double-commitment scheme, ... |
The purpose of the schema is not to verify the cryptographic structure of the presented data, but merely, the semantic structure. It is possible by putting SAIDs as constants into the schema itself to sort of use schema validation to validate some of the cryptographic structure but this may make unreasonable assumptions about how JSON scheming tooling works or will work in the future. Indeed the cryptographic structure validation protects the semantic validation because even if a malicious JSON schema were to validate on its own, it only materially matters if the malicious schema validates in concert with the cryptographic structure validation. So what we care about for an attack is a presentation where the cryptographic structure validation passes but with a malicious schema that also passes not a malicious schema that passes but the cryptographic structure validation fails. Separating the concerns of semantic validation from cryptographic validation and then designing it so that the stronger cryptographic validation protects the weaker semantic one follows a principle I first learned from Schnierer's book "practical Crytography" Design Rule 1. Complexity is the worst enemy of security. Admittedly there is a lot of aethestic judgement in how to apply such rules. But in this case I think of it this way. If I have a security measure that is already computationally infeasible to attack, then bolstering it with some other security measure that is not protecting from a different type of attack is unnecessary complexity and it makes the correctness of the security of the first measure non-local (both are needed). Its like bolting on a wood 2x4 to a steel ibeam. The 2x4 may add some negligible amount of extra strength under load but it does not add enough to justify the complexity of bolting it on. If, on the other hand, the bolt on 2x4's purpose is prettier painted wood trim meant to hide the ugliness of the steel ibeam then any extral structural strength the trim imbues is purely a side-effect. Instead the greater stength of the the ibeam will protect the structural integrity of the pretty trim (not the other way round). So attempting to use schema validation (2x4) to protect structural validation (ibeam) is using schema validation for the wrong purpose. Moreover, layering weaker security measures on top of stronger ones runs the risk that one might rely on the weak measure in corner conditions that then justify weakening the stronger measure which then may backfire in some other corner condition where the two are not mutually protecting. I have fallen for that temptation more than once. ACDCs depend on a strong non-repudiable proof via digital signature of issuance, which signature makes a cryptographic commitment to the strong cryptographic digest of the field map structure of the ACDC, which includes a strong commitment to the digest of the schema which schema makes a comparatively weak commitment to the semantics of the field structure but that semantic commitment is protected by the stronger field structure proof not the other way around. |
@jasoncolburne , I see how the situation you constructed allows mischief. But IIUC isn't this a shortcoming of the bad schema design? You assume a schema that allows a credential to have more than one legal name; naturally this would then allow a malicious issuer to put more than one legal name into the cred (presumably colluding with an issuee who later wants to disclose one name but not the other). But is such a schema reasonable? As @SmithSamuelM points out, the verifier should be confirming that the schema matches their requirements -- and unless the requirements in this case are about knowing at least one of the issuee's legal names for some strange reason (e.g., what was it either before or after a name change), it seems like this schema should be rejected, without creating any additional behavior in ACDCs to prevent it. |
I am saying that the colluding issuer issues not the first blinded array, but one corresponding to the double entry. Here is the array the issuer would use to compute the top level digest in my malicious example: "A": [
"EJgDHAe0lS3dWPB7yT78O2d1xb_AuNecU8VMjykVTd4F",
"EGcJzlAaalMxlFSfs2DPB7Tx7n3D7EKAWSXJaheVf_-P",
"EB7_uP-FZ8aoErbInx6BZTZmDb0A5QuFhIQCROlStoMF"
] A verifier that only requests the name and issuee AID (and not the age) has no way that I can see to ensure that a SAID in the blinded array corresponds to a given label in the unblinded attribute array, only that it is in the array or not. For example, how do we know which SAID in the blinded array corresponds to I understand that a malicious issuer can issue untrue data at any time, and it's really no different than that, but I feel like simply adding labels prevents the more concerning situation of retaining identifier reputation by presenting valid data until the critical moment when an attack can be mounted from arising entirely? The schema I provided was the example in the spec, and the In addition to avoiding this mischievious case, SADifying that array would allow me to store it without creating an edge case. In a comment in slack it was mentioned I may be describing Partial Disclosure but when I re-read the section seemed to indicate I was talking about Selective. I just want to be clear that I'm talking about the case where some attributes are unblinded and others remain hidden. I've also been trying to frame this in the typical example but in my world it is highly likely that self-issuance will occur which raises the likelihood of this happening. |
@jasoncolburne I have to think about this some more and see if I can see where you are coming from or to suggest the appropriate measure. If feels like to me that the selective disclosure might not be the appropriate mechanism for your use case. That better is to make the claims granular one or more closely related claims per acdc that do not need to be selectively disclosed and then just use full disclosure. Then they are already Saidified. Historically selective disclosure is a way to granularize VC ified legacy credentials like drivers licenses that mix functional attributes with forensic attributes such as authenticators and so need selective disclosure to separate out the forensic attributes from the functional ones. But where all your use cases are self-issued or individualized ACDCs there is no need to mix functional attributes with forensic ones. They can each reside in their own VC and be “effectively” selectively disclosed by being chained together when they need to be disclosed and not chained with they don’t and then the chained ones are partially disclosed in order get contractual protection and then fully disclosed and the unchained ones never show up. |
@jasoncolburne I am trying to understand the exploit but some things you said are confusing to me. When you say:
Clearly when a block is actually presented (disclosed) the label of the field inside the block will either be The purpose of selective disclosure is not to expose any correlatable information about the undisclosed attributes. The labels themselves in concert with the associated blinded SAID provide correlatable information. Its no longer fully blinded. Not being able to match up a given schema block to a given blinded SAID is a feature. So as far as I can tell, the exploit is that a given Issuer can issue a set of two or more blocks, where each block in the set shares the same subschema in that the field labels are the same but the values of each block in the set are different. A schema for such a set can hide the fact that there is a such a set by not including in the schema repeated identical subschema. However any presentation in order to verify must use one of the blocks in the set that were used to create the blinded array in the first place. Is this a correct understanding of what your are saying is the attack? So what the issuer is hiding is that there is such a set of repeated blocks in the blinded array. From the standpoint of why selective disclosure has been used historically in VCs, such an array of undetectable repeated blocks (for example using a ZKP for selective disclosure) would be a feature not a fault. The presenter gets to decide which block to present. Since all are valid issuances they are all valid presentations. The problem as I see it, is that a self-issuer or colluding malicious issuer may choose to make conflicting, inconsistent, statements about the issuee, and there is no way for the verifier to know that a given ACDC that uses selective disclosure is indeed making conflicting statements. This is always true for selective disclosure mechanisms that truly make the attributes selectively disclosed uncorrelatable with the attributes not selectively disclosed. That an issuer could issue conflicting or contractory attributes, i.e. duplicitous in a selectively disclosable format is always true. The only way for a verifier to prove that there are no duplicitous attributes is to correlate all the attributes. Making the labels correlatable without disclosing the values is still correlation. But then if we want to make the labels correlatable, then we are in a situation where selective disclosure is no longer selective and we have a use case that contra-indicates the use of selective disclosure. Better, as I suggested above to use granular ACDCs where all attributes must be disclosed and use partial disclosure to hide those values until after contractual protection is in place. By themselves an ACDC cannot ensure the veracity of the data so conveyed. We must trust the issuer. If the Issuer does not have any counter-incentive to issuing inaccurate or unfounded or duplicitous statements, the ACDC itself does not prevent that. Only correlation across multiple presentations could expose duplicitous issuances by that issuer. So selective disclosure from an untrustable issuer is an anti-pattern. |
@jasoncolburne I think what you want to use instead of selective disclosure is a two-level partial disclosure with nested oneOf s in the schema. One of the problems I faced is that the term “selective disclosure” has a well known technical meaning in the SSI community that is different from the english language definition. So I picked the term partial disclosure to indicate a different “selectively disclosure” mechanism. Nested partial disclosure enables disclosing field labels without disclosing field values. I can provide an example to be clear. |
Let me suggest this approach: The fully disclosed data is structure as follows: Full"a":
{
"d": "EJgDHAe0lS3dWPB7yT78O2d1xb_AuNecU8VMjykVTd4F",
"u": "0AB9VADfPtCQvFqp-u4BxUvy",
"i": "ENoxXSSTfy8FDryU0J0av3IdHKqAb6aYBu0fIT5fvqfY",
"LegalName":
{
"d": "EGcJzlAaalMxlFSfs2DPB7Tx7n3D7EKAWSXJaheVf_-P",
"u": "0ACsCxwKKCg0C9Hb7OX9ajbZ",
"value": "Jason Colburne"
},
"age":
{
"d": "EOYM0KDlDPODdiUDL7Xp-XRjr9mif7Dv5ovMSGTRxyTA",
"u": "0ACbHEmnqeXUJFMf1G2Dj0BU",
"value": 43
}
}
The schema has two levels of oneOfs. The first for the whole attributes block Least"a": "EJgDHAe0lS3dWPB7yT78O2d1xb_AuNecU8VMjykVTd4F" Issuee and field labels"a":
{
"d": "EJgDHAe0lS3dWPB7yT78O2d1xb_AuNecU8VMjykVTd4F",
"u": "0AB9VADfPtCQvFqp-u4BxUvy",
"i": "ENoxXSSTfy8FDryU0J0av3IdHKqAb6aYBu0fIT5fvqfY",
"LegalName": "EGcJzlAaalMxlFSfs2DPB7Tx7n3D7EKAWSXJaheVf_-P",
"age": "EOYM0KDlDPODdiUDL7Xp-XRjr9mif7Dv5ovMSGTRxyTA"
} LegalName but not Age"a":
{
"d": "EJgDHAe0lS3dWPB7yT78O2d1xb_AuNecU8VMjykVTd4F",
"u": "0AB9VADfPtCQvFqp-u4BxUvy",
"i": "ENoxXSSTfy8FDryU0J0av3IdHKqAb6aYBu0fIT5fvqfY",
"LegalName":
{
"d": "EGcJzlAaalMxlFSfs2DPB7Tx7n3D7EKAWSXJaheVf_-P",
"u": "0ACsCxwKKCg0C9Hb7OX9ajbZ",
"value": "Jason Colburne"
},
"age": "EOYM0KDlDPODdiUDL7Xp-XRjr9mif7Dv5ovMSGTRxyTA"
} Age but not Legal Name"a":
{
"d": "EJgDHAe0lS3dWPB7yT78O2d1xb_AuNecU8VMjykVTd4F",
"u": "0AB9VADfPtCQvFqp-u4BxUvy",
"i": "ENoxXSSTfy8FDryU0J0av3IdHKqAb6aYBu0fIT5fvqfY",
"LegalName": "EGcJzlAaalMxlFSfs2DPB7Tx7n3D7EKAWSXJaheVf_-P",
"age":
{
"d": "EOYM0KDlDPODdiUDL7Xp-XRjr9mif7Dv5ovMSGTRxyTA",
"u": "0ACbHEmnqeXUJFMf1G2Dj0BU",
"value": 43
}
} I think nested partial disclosure accomplishes your use case without any ambiguity in the schema. The field values of undisclosed attributes may remain blinded but the field labels are disclosed. This pattern of nested partial disclosure of blinded sub-blocks may be repeated to any number of levels. Where each level discloses the field labels of the next level down but does not disclose the field values. Or may partially disclose some of the field values while leaving the others blinded. The Issuer can’t play games with the schema because anyOf is not used only oneOF. A verifier in an IPEX exchange then just removes oneOfs in the schema in their Apply message to force partial disclosure of the fields they need at any given nested layer. |
The one complication of using nested oneOf schema operators is to know how to the compute the SAID of a given block. We always use the most compact version of the block. |
As I suggested above another way to accomplish “partial” disclosure is to use blinded edges in a graph of ACDCs. |
Thanks for all your thought @SmithSamuelM and yes, I was forgetting that the whole reason to use selective disclosure is to prevent correlation - and nested partial disclosure is a great name, now that I understand your terminology. My mistake is that when it came time to implement I jumped to the end of the docs to understand the algorithms and data structures, and not the purpose/nomenclature. As a result, I thought the primary difference was that selective was composable and partial was all or nothing. In the end we may actually need to use a mix of selective and partial disclosure, our requirements are a moving target. Maybe it's obvious from the fact that if correlation is denied one could present differing data to different parties, but I feel like others may make the assumption that selective disclosure disallows this kind of behaviour - specifically when the disclosed data is valid with respect to the same compact ACDC. As we have noted, there is nothing forcing the issuer to create an ACDC that is still valid when fully disclosed (it may never be fully disclosed), and an adversary can construct data that can be presented and validated that is backed by two values for the same key - as long as the two values are never presented together. In some scenarios people may make system design assumptions like 'there is at most one non-revoked driving license ACDC issued to an individual from a motor vehicle authority', and because of this, they may assume that if they encounter an individual, that individual has at most one ACDC and thus at most one assigned I understand that breaking non-correlation with my original suggestion is a bad idea and goes against the intent of selective disclosure, as defined, entirely, but do you think it warrants some sort of warning or example that if one chooses to prevent correlation, it does become impossible to prevent this type of misrepresentation? |
Sorry, didn't mean I came up with nested partial disclosure I just mean those were the same conclusions I came to when asking if we could label the saids (to nest the SADs). |
This is elegant |
Yes that would be a good caution to add. Historically selective disclosure mechanism have been the subject of discussion about how to prevent fraud against the verifier given that lack of correlatibility can make it easier for a presenter to share credentials without being detected. This means that some other mechanism must be added to disincentivize sharing of credentials. Hence my comment above that use of selective disclosure from untrusted issuers may be an anti-pattern. |
A valid use case for the selective disclosure mechanism in ACDC, in the case where the array of selectively disclosable values has repeated instances of the same sub-schema, is for static threshold proofs. this is a way to support multiple threshold proofs that do not require disclosure of the actual value without correlating to the other elements of the threshold array. The primary use case is for a set of legal age thresholds. So for example instead of disclosing the actual age of a person say 43 in order to satisfy a minimum legal age requirement, the 13, 18, 21, for someone age 43 and for someone age 70 the following 13, 18, 21, 55, 60, 65 The non-correlatability of the elements of the seletively disclosable array means a verifier checking say social media age of 13 would not know that the presenter was some other age. So all the legal age tests that a given person satisfied could be included in the selectively disclosable array while minimizing the ability of verifiers to deduce that actual age. One would have to trust that the Issuer was trustable in not constructing a malicious array. which for legal age requirement would be a government issuer shich would be the only legally trustable issuer. another applications would be for a credential to have repeated values but where each label was internationalized for languages. Then a given presentation would not have to be verbose by exposing all the field labels in a nested partial disclosure but only disclose the one element in the array that was the language of choice. |
Great examples, thank you. We anticipate a mix of trusted and untrusted issuers and various use cases so this will be helpful in framing as we decide on a case by case basis which method is appropriate. For posterity here is a schema I made for nested partial disclosure that uses the examples in this thread (edges and rules omitted): {
"$id": "EECobeq1Pbe-BHZxovIg3ND8vkN183xdkTp0LG3OK9dm",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Nested Partial Disclosure",
"description": "A demonstration of Nested Partial Disclosure",
"credentialType": "Demonstration",
"version": "1.0.0",
"type": "object",
"required": [
"v",
"d",
"i",
"s",
"ri",
"a"
],
"properties": {
"v": {
"description": "Credential Version",
"type": "string"
},
"d": {
"description": "Credential SAID",
"type": "string"
},
"u": {
"description": "One time use nonce - optional",
"type": "string"
},
"i": {
"description": "Issuer AID",
"type": "string"
},
"ri": {
"description": "Credential Registry Identifier",
"type": "string"
},
"s": {
"description": "Schema SAID",
"type": "string"
},
"a": {
"oneOf": [
{
"description": "Attributes section SAID",
"type": "string"
},
{
"$id": "EGjNHiyvgu7Yc3HGbu4tKD3AAaiNWjiGd8eOM8ALlNuz",
"description": "Attributes section",
"type": "object",
"required": [
"d",
"dt",
"i",
"u",
"legalName",
"age"
],
"properties": {
"d": {
"description": "Attributes SAID",
"type": "string"
},
"dt": {
"description": "Date and time of issuance in ISO8601 format",
"type": "string",
"format": "date-time"
},
"i": {
"description": "Issuee AID",
"type": "string"
},
"u": {
"description": "Salty Nonce",
"type": "string"
},
"legalName": {
"oneOf": [
{
"description": "Blinded legal name SAID",
"type": "string"
},
{
"type": "object",
"required": [
"d",
"u",
"value"
],
"properties": {
"d": {
"description": "SAID of disclosable data",
"type": "string"
},
"u": {
"description": "Salty nonce",
"type": "string"
},
"value": {
"description": "Unblinded legal name",
"type": "string"
}
},
"additionalProperties": false
}
]
},
"age": {
"oneOf": [
{
"description": "Blinded age SAID",
"type": "string"
},
{
"type": "object",
"required": [
"d",
"u",
"value"
],
"properties": {
"d": {
"description": "SAID of disclosable data",
"type": "string"
},
"u": {
"description": "Salty nonce",
"type": "string"
},
"value": {
"description": "Unblinded age",
"type": "string"
}
},
"additionalProperties": false
}
]
}
},
"additionalProperties": false
}
]
}
},
"additionalProperties": false
} |
The ACDC specification defines selective disclosure, amongst other mechanisms, to facilitate graduated disclosure and chain link confidentiality. It is defined by the spec to work like this:
Create unblinded array of anonymized SADs containing attribute details.
Create array of SAIDs from SADs.
Create digest by concatenating all SAIDs and hashing.
The first (1) looks like:
The second becomes:
And the third:
The way it works, is that you put (3) in the ACDC, and it proves a commitment to (2). You then show the ACDC schema to interested parties, and give them (2) plus some elements from (1). The elements from (2) are committed to, and they provide a further commitment to the disclosed (some) elements from (1). This allows one to share some, but not all, blinded attributes of an ACDC.
The spec suggests a schema like this to provide a verifier the info they need to understand what information is contained in the ACDC:
Here's the problem.
uniqueItems
thinks objects with the same keys but different values are different, and allows them. This lets a malicious issuer construct this data in a valid ACDC:Now a colluding issuee can present differing information to disjoint sets of participants for the same attribute keys.
A solution: saidify a key/value object of SAIDs rather than using a list. So for (2), use this SAD:
And then (3) becomes:
This permits a check to ensure that someone can use a maximum of one value per key.
Does this make sense?
The text was updated successfully, but these errors were encountered: