Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bit vector revocation design discussion #6

Closed
kimdhamilton opened this issue Nov 21, 2020 · 15 comments
Closed

Bit vector revocation design discussion #6

kimdhamilton opened this issue Nov 21, 2020 · 15 comments
Assignees

Comments

@kimdhamilton
Copy link
Contributor

kimdhamilton commented Nov 21, 2020

We've decided to use the bit vector revocation approach for MVP. Past design discussions are here

@jchartrand @dmitrizagidulin

Let's iterate on bit vector implementation options. Let's also make a distinction between a reasonable MVP (pilot) implementation vs higher-stakes. In the latter case, herd privacy will be especially important.

The concern we discussed: if the issuer is simply incrementing indices, then a 3rd party might be able to derive additional general information about revocations in a cohort (say if it's well-known that there were 30 students in a class).

The issuer could deal with this in the implementation, i.e. by using an index generation scheme that adds noise/offsets (possibly at the risk of needing to track previously-used indices).

Update: more recently discussed in this issue w3c-ccg/vc-status-rl-2020#6, but the repo mentioned isn't reachable.

I discussed this with Manu and Dave Longley who suggested the following:

  1. Instead of incrementing indexes, consider approaches like:
    • internally splitting the available space of unassigned indexes into blocks which can then be randomly handed out for assignment as VCs are issued.
    • use an efficient mechanism to pseudorandomly hand out indexes from the available pool.
  2. Add a PR to the spec highlighting limitations:
    • draw implementers' attention to this issue in privacy considerations section.
    • the mechanism depends on the issuer being honest; i.e. if the issuer wants to track you, they can
    • clarify when it works well. From Dave Longley: "The mechanism works well when actors that care about the privacy of their credentialed population implement the mechanism correctly (CRLs containing large populations, large chunk sizes, randomly handed out CRL bit positions, etc.). That's true for any privacy preserving technology, but we might as well state it"
@dmitrizagidulin
Copy link
Member

@kimdhamilton

MVP Revocation Proposal

We believe that for the DCC MVP/Pilot project, selecting the Revocation List 2020 bitstring strategy strikes the right balance of implementation simplicity (leveraging an existing library), reasonable server-side storage constraints, cache-ability and herd privacy, for the given use case.

First Iteration - Randomly Named Block with Sequential Index Assignment

For the initial implementation:

  1. Use the RevList 2020 library with the default bitstring block size of 16KB (131k entries).
  2. Store the (zlib-compressed, per the spec) revocation list on disk (so that it can be served as a static file by the issuer app). Use a random string as the filename (for example, QHDR12CD2J, so that it can be served at a URL like https://example.edu/revocations/QHDR12CD2J).
  3. The issuer app will keep track of the "number of credentials issued" count, internally. Whenever a new VC is issued, that count will be incremented, and used as the position of the revocation index.
  4. Once the "number of credentials issued" count goes past 131k entries, a new bitstring block is generated (with a new random file name), and the count is reset.

Implementation Note - Revocation Config

The issuer service will create a revocation.config.json file on startup, and will use it to keep track of the credentialsIssued count, as well as the currentBlock filename.
So, for example, the first time an issuer service is started, it will check to see if the revocation.config.json file exists. If not, it will generate a new one. For example:

{
  "credentialsIssued": 0,
  "currentBlock": "QHDR12CD2J"
}

That means the first VC issued by that issuer service will have the following credentialStatus attribute:

{
  // "@context": [ ... ],
  "credentialStatus": {
    "type": "RevocationList2020Status",
    "revocationListIndex": "1",
    "revocationListCredential": "https://example.edu/revocations/QHDR12CD2J"
  },
  "credentialSubject": { /* ... */ },
  "proof":{ /* ... */ }
}

The next VC issued will have the revocationListIndex: "2", and so on.

Implementation Note - Issuer Log

To aid the revocation process, the Issuer service should keep a private log of issued VCs. For each VC, the log will note:

  • The credential id, and any other relevant metadata (timestamp, subject id, which key it was issued with, etc).
  • The revocationListCredential url (or simply the bitstring filename) that was used.
  • The revocationListIndex for that credential.

That way, when a particular VC needs to be revoked, the issuer log will contain the filename and the index position for that credential, so that its bit can be flipped.

First Iteration - Limitations/Risks

  1. Revocation list is centralized at the issuer server; issuer can track verifier requests. However, see the section on Cache Expiration below, for mitigation.
  2. For a given revocation bitstring URL, an attacker can estimate the number of credentials that were revoked (just by counting the revoked bits), which could give them an approximate percentage of VC revocation rates (for that block). (This can be mitigated by initializing the bitstring with noise, see below.)
  3. Because the indexes are sequential, an attacker that sees multiple VCs can estimate the number of total VCs issued for that block. See the German Tank Problem for details. Note that because the revocation list is randomly named, the attacker does not necessarily know how many other revocation list blocks exist (and therefore, what the total number of VCs issued is).

Second Iteration - Randomly Named Block with Random Index Assignment

To mitigate risk 3 mentioned above (the ability of an attacker to estimate the number of VCs issued for a given block), a second iteration can be done.

Instead of assigning list indexes sequentially, the issuer will generate the list index randomly. Because of the random assignment, the issuer will need a second bitstring, this one private, to keep track of which indexes have already been assigned.

This approach doubles the server-side storage requirements (for 131k entries, now 32KB is required, versus the 16KB in the first iteration). However, note that there is no additional bandwidth or storage burden on the verifier / relying party.

Optional Enhancement - Block Initialization Using Noise/Chaff

To mitigate risk 2 (ability of an attacker to estimate the percentage of the VCs revoked for a given block), an additional step can be taken.

When a revocation bitstring block is provisioned by the issuer, a percentage of it (say, 20-30%) can be initialized with random noise (so, 30% of the bits can be randomly flipped to 1, at the outset). To keep track of which index positions have been chaffed in this manner, they can be recorded in the second, private bitstring from the Second Iteration above.

Note that this does increase the issuer-side storage requirement by the same percentage (ie, by 20-30% from the previous example).

Cache Expiration and Mitigating Tracking/"Phone Home"

The major drawback of a bitstring revocation list stored at the issuer's website is the possibility of the issuer being able to track the verifier's requests for a given revocation list. This can potentially provide metadata such as timestamp, the requester's IP address, and any cookies and headers the verifier's browser is willing to send.

However, depending on the use case (the relative value and threat level of the credential), several simple steps can be taken to mitigate this risk.

Firstly, not all VC use cases need to be revokable. For each type of credential, the issuer is advised to consider whether their business processes actually allow for revocation. In many cases, the issuer can use a VC expiration date instead of revocation (that is, issue a transcript that is good for one year, for example).

For VC use cases that do need revocation ability, issuers can reduce the "phone home" problem by considering just how quickly does a particular VC need to be revoked. This is an important question, because it directly determines the cache-ability of the revocation list, and whether it can be stored by third parties (for example, cached by Content Delivery Networks).

If a high-value/high-threat credential (such as an administrator login) needs to be be revoked (and the revocation needs to propagate) within a matter of seconds, it is not very cache-able, and the issuer's revocation list must be the only source of truth.

If, however, the business rules require that the revocation can propagate within a day, that means that the revocation list can be cached by a CDN (or by the verifier's browser cache) with the cache expiration value of 24 hrs.

This dramatically changes the risk of tracking / "phone home" behavior by the issuer. Now, instead of a verifier having to request the revocation list each time it verifies a VC, each verifier can simply fetch the revocation list first thing in the morning. This can be done automatically (by a cron job, etc, much like RSS feeds), which means that all subsequent verification operations (within that 24 hr expiration period) can be performed using the verifier's local cache. Well known HTTP mechanisms (such as the Etag / If-None-Match headers) allow further bandwidth savings for verifiers and issuers alike.

@ntn-x2
Copy link

ntn-x2 commented Dec 4, 2020

What about tracking by the verifier? If I present the same credential more than once, my credential will probably have the same index in all presentations, meaning that the verifier will know it is the same entity across all the interactions.

@dmitrizagidulin
Copy link
Member

Hi @Diiaablo95!

Can you say a bit more about the threat model you're concerned about? Are you building an education use case that is anonymous, and that foresees anonymous or pseudonymous reuse of the same verifiable credential?

@ntn-x2
Copy link

ntn-x2 commented Dec 4, 2020

Yes, kind of. What I am trying to say is that having to reveal the index in the revocation list, if it does not change, opens the room to correlation attacks. Someone might even have an anonymous credential, but if the index needs to be revealed, then all the presentations of the same credentials can be linked together. How can this weakness be addressed?

@jchartrand
Copy link
Contributor

@Diiaablo95 @dmitrizagidulin

This probably reveals my ignorance, but wouldn't there always be some part of a credential that remains constant across presentations?

@ntn-x2
Copy link

ntn-x2 commented Dec 4, 2020

@jchartrand at least to my knowledge, the use of zero-knowledge proofs, like the one used in Hyperledger Indy or in BBS+ signatures, allows the prover to hide parts of a credential. So if the information revealed in the proof provides enough herd anonymity, the proof itself does not have to leak anything across presentations. The problem arises when proving that the credential has not been revoked, as revealing the same index in several presentations is a linking factor.
To my knowledge, the only truly privacy-preserving solution to credential revocation are cryptographic accumulators as implemented e.g. in Hyperledger Indy, even though they have other downsides, like constantly updating the delta etc.

@jchartrand
Copy link
Contributor

jchartrand commented Dec 4, 2020

Thank you @Diiaablo95 - very interesting (although I admit the math is a bit bedazzling.)

A related question: all of this seems to assume that the verifier is the weak point: that a centralized verifier is used by many different people/applications, and can therefore track multiple requests for the same credential, and so potentially correlate.

But, wouldn't it often be the case that a centralized verifier isn't in fact used? If the verifier is nothing more than code that can run anywhere (e.g., in a web browser or in a phone app) then you could run that code locally (e.g., within a web browser, so using the computer on which the web browser is running) without making any calls to a centralized server, and so there's less opportunity to cross-reference different requests?

@ntn-x2
Copy link

ntn-x2 commented Dec 5, 2020

In web development 101 they teach you that validation on client side equals no validation at all. So if a verifier wants to actually verify the validity of a credential, it cannot rely on client code, but it has to perform the validation server-side, hence the possibility of correlation attacks.

@jchartrand
Copy link
Contributor

How about an example:

  • I am a recruiter who confirms that potential employees' credentials are valid
  • I have a verification app on my phone
  • the app downloads the revocation bit vector (and a list of trusted signing keys) from a known server
  • the app checks that a credential's entry in the bit vector hasn't been revoked
  • the app checks everything else, e.g., signatures, trustworthiness of signing keys, etc.

What makes this client side validation 'no validation at all'?

@ntn-x2
Copy link

ntn-x2 commented Dec 7, 2020

Well, in this case the mobile app is acting as the verification service, hence the app represent the "threat" from the point of view of the potential employee's. But in this case there is definitely no need for unlinkability, as we are discussing about a potential employees. But if you want to extend this to those use cases, as I said (in my opinion), the mobile app still represents a threat because it is able to understand when the same user is showing a presentation, given that the index in the bit vector will be the same.

@jchartrand
Copy link
Contributor

Could you provide a concrete example/scenario that shows where the correlation happens when verification is done entirely within a mobile app (on a phone)?

@ntn-x2
Copy link

ntn-x2 commented Dec 8, 2020

Because the validity of a credential needs to be proved at the entity offering a service. A mobile app might be the client for that service, but good information security practices say that client validation equals no validation. So a properly developed server will need to check on its own whether the user is authorised to perform a specific action, be it via the mobile app or a command line application. And again, the service provider cannot blindly trust whatever the app does, as apps are in the hands of end users, which means hackers, which means that an app can be sideloaded and a mobile phone jailbroken.

@jchartrand
Copy link
Contributor

jchartrand commented Dec 8, 2020

Ah, now I think I see where you are going with this.

Another example then: Say I am applying for a drivers licence, and have to present proof of citizenship to get the drivers licence. I could present my credential in person or I could submit it electronically.

  1. In person:
  • I head in to the licensing office and present a VC/VP that asserts I am a citizen of Canada (so the equivalent of a passport).
  • The person behind the desk needs to verify that the VC was signed by the Canadian government.
  • The person behind the desk has a device (possibly phone, possibly other) with an app to which I bluetooth my credential.
  • The app downloads the revocation list, etc. and verifies my cred, telling the desk person that my cred is valid - or not, if Justin Trudeau has decided he doesn't like me anymore.

In this case, the app is presumably safe since it is controlled by the drivers licence bureau?

  1. Electronically
  • I submit my VC/VP proof of citizenship online.
  • In this case a machine might now do the verification (although a person could still do it).
  • The verification code could still run locally, within the drivers licence offices, again pulling in the revocation list, etc., and doing the verification locally, so effectively having their own verifier that doesn't share data with any other organization.
  • This model is also in a sense more secure since the drivers licence office can better trust their local copy of the verification code.

On the topic of the trustworthiness of apps, many of us trust banking apps to do our banking, or say the amazon app to order from amazon, etc. - should we not be? Even the web browsers on our phones are apps - should we not trust those?

And how is a centralized verification point (say a web page) any more trustworthy? Because it has a well known domain name? If you do feel that web pages are more trustworthy, then we could put the verification code (as javascript) directly into the web page. The code then runs in the browser, pulling in revocation lists (via XHR), etc. So again no opportunity for correlation because all verification runs locally the browser.

As always, though, I may well misunderstand your point - could you provide a real-life example?

@ntn-x2
Copy link

ntn-x2 commented Dec 9, 2020

You are indeed making a different point. My original point was that by revealing the same index in the revocation list, you are providing a correlating identifier that allows to link multiple presentations, even though those presentations do not share any other detail that would make them linkable. So as long as you are revealing the same index to the same verifier (be it the same mobile app, or the same system), the verifier will be able to verify that it is the same entity, without necessarily know what is that entity's real identity, but that depends on the use case.
So, my point is clearly not valid in case the presentation itself includes personal identifiable information, or if the presentation is provided in person. My point is valid in case you are trying to build a privacy-preserving system that uses ZKPs and selective disclosure, but then some of the efforts are wasted because of a correlating factor given by the index in the non-revocation proof.
Hope that is more clear now. Scenarios that would "suffer" from such correlation problem most likely will not include the two parties being at Bluetooth or offline range, anyways.

@kimdhamilton
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants