From b5fd7cc4717e24af52c211a39bae44a5013bacbe Mon Sep 17 00:00:00 2001
From: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Date: Wed, 14 Sep 2022 09:41:43 +0100
Subject: [PATCH 1/7] Provider Record Settings Description

---
 kad-dht/README.md | 58 +++++++++++++++++++++++++++++++++++++----------
 1 file changed, 46 insertions(+), 12 deletions(-)

diff --git a/kad-dht/README.md b/kad-dht/README.md
index 5caa104a8..8a70363ba 100644
--- a/kad-dht/README.md
+++ b/kad-dht/README.md
@@ -75,12 +75,12 @@ nodes, unrestricted nodes should operate in _server mode_ and restricted nodes,
 e.g. those with intermittent availability, high latency, low bandwidth, low
 CPU/RAM/Storage, etc., should operate in _client mode_.
 
-As an example, running the libp2p Kademlia protocol on top of the Internet,
-publicly routable nodes, e.g. servers in a datacenter, might operate in _server
+As an example, running the libp2p Kademlia protocol on top of
+publicly routable nodes, e.g. servers in a datacenter, should operate in _server
 mode_ and non-publicly routable nodes, e.g. laptops behind a NAT and firewall,
-might operate in _client mode_. The concrete factors used to classify nodes into
+should operate in _client mode_. The concrete factors used to classify nodes into
 _clients_ and _servers_ depend on the characteristics of the network topology
-and the properties of the Kademlia DHT . Factors to take into account are e.g.
+and the properties of the Kademlia DHT. Factors to take into account are e.g.
 network size, replication factor and republishing period.
 
 Nodes, both those operating in _client_ and _server mode_, add another node to
@@ -228,7 +228,7 @@ Then we loop:
                becomes the new best peer (`Pb`).
 			2. If the new value loses, we add the current peer to `Po`.
 	2. If successful with or without a value, the response will contain the
-       closest nodes the peer knows to the key `Key`. Add them to the candidate
+       closest nodes the peer knows to the `Key`. Add them to the candidate
        list `Pn`, except for those that have already been queried.
 	3. If an error or timeout occurs, discard it.
 4. Go to 1.
@@ -256,7 +256,7 @@ type Validator interface {
 ```
 
 `Validate()` should be a pure function that reports the validity of a record. It
-may validate a cryptographic signature, or else. It is called on two occasions:
+may validate a cryptographic signature, or similar. It is called on two occasions:
 
 1. To validate values retrieved in a `GET_VALUE` query.
 2. To validate values received in a `PUT_VALUE` query before storing them in the
@@ -268,23 +268,55 @@ heuristic of the value to make the decision.
 
 ### Content provider advertisement and discovery
 
-Nodes must keep track of which nodes advertise that they provide a given key
-(CID). These provider advertisements should expire, by default, after 24 hours.
-These records are managed through the `ADD_PROVIDER` and `GET_PROVIDERS`
+There are two things at play with regard to provider record (and therefore content)
+liveness and reachability:
+
+Content providers need to make sure that their content is reachable, despite peer churn
+and nodes that store and serve provider records need to make sure that the CIDs whose 
+records they store are still served by the content provider.
+
+The following two parameters help cover both of these cases.
+1. **Provider Record Republish Interval (24hrs):** The content provider 
+needs to make sure that the nodes chosen to store the provider record 
+remain online when clients ask for the record. In order to 
+guarantee this, while taking into account the peer churn, content providers
+republish the records they want to provide every 24 hours.
+2. **Provider Record Expiration Interval (48hrs):** The network needs to provide
+content that content providers are still interested in providing. In other words,
+nodes should not keep records for content that content providers have stopped 
+providing (aka stale records). In order to guarantee this, provider records 
+_expire_ after 48 hours, i.e., nodes stop serving those records, 
+unless the content provider has republished the provider record.
+
+The values chosen for those parameters should be subject to continuous monitoring 
+and investigation. Ultimately, the values of those parameters should balance 
+the tradeoff between provider record liveness (due to node churn) and traffic overhead
+(to republish records).
+The latest parameters are based on the comprehensive study published
+in [provider-record-measurements].
+
+Provider records are managed through the `ADD_PROVIDER` and `GET_PROVIDERS`
 messages.
 
 #### Content provider advertisement
 
 When the local node wants to indicate that it provides the value for a given
-key, the DHT finds the closest peers to the key using the `FIND_NODE` RPC (see
+key, the DHT finds the (`k` = 20) closest peers to the key using the `FIND_NODE` RPC (see
 [peer routing section](#peer-routing)), and then sends an `ADD_PROVIDER` RPC with
-its own `PeerInfo` to each of these peers.
+its own `PeerInfo` to each of these peers. The study in [provider-record-measurements]
+proved that the replication factor of `k` = 20 is a good setting, although continuous
+monitoring and investigation.
 
 Each peer that receives the `ADD_PROVIDER` RPC should validate that the received
 `PeerInfo` matches the sender's `peerID`, and if it does, that peer should store
 the `PeerInfo` in its datastore. Implementations may choose to not store the
 addresses of the providing peer e.g. to reduce the amount of required storage or
-to prevent storing potentially outdated address information.
+to prevent storing potentially outdated address information. In the current implementation
+peers keep the network address (i.e., the `multiaddress`) of the providing peer for **the
+first 10 mins** after the provider record (re-)publication. The setting of 10 mins follows
+the DHT Routing Table refresh interval. After that, peers provide 
+the provider's `peerID` only, in order to avoid pointing to stale network addresses 
+(i.e., the case where the peer has moved to a new network address).
 
 #### Content provider discovery
 
@@ -470,3 +502,5 @@ multiaddrs are stored in the node's peerbook.
 [ping]: https://github.com/libp2p/specs/issues/183
 
 [go-libp2p-xor]: https://github.com/libp2p/go-libp2p-xor
+
+[provider-record-measurements]: https://github.com/protocol/network-measurements/blob/master/results/rfm17-provider-record-liveness.md

From 9f7f275afb9498e51f5290ad0f28af27e983fa4c Mon Sep 17 00:00:00 2001
From: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Date: Thu, 15 Sep 2022 05:43:53 +0100
Subject: [PATCH 2/7] Update kad-dht/README.md

Co-authored-by: Marcin Rataj <lidel@lidel.org>
---
 kad-dht/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kad-dht/README.md b/kad-dht/README.md
index 8a70363ba..2b58429df 100644
--- a/kad-dht/README.md
+++ b/kad-dht/README.md
@@ -272,7 +272,7 @@ There are two things at play with regard to provider record (and therefore conte
 liveness and reachability:
 
 Content providers need to make sure that their content is reachable, despite peer churn
-and nodes that store and serve provider records need to make sure that the CIDs whose 
+and nodes that store and serve provider records need to make sure that the Multihashes whose 
 records they store are still served by the content provider.
 
 The following two parameters help cover both of these cases.

From 7f646133020e953c5edf8bc20796fd89b21d72b6 Mon Sep 17 00:00:00 2001
From: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Date: Thu, 15 Sep 2022 05:44:16 +0100
Subject: [PATCH 3/7] Update kad-dht/README.md

Co-authored-by: Marcin Rataj <lidel@lidel.org>
---
 kad-dht/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kad-dht/README.md b/kad-dht/README.md
index 2b58429df..3fa2ed7d8 100644
--- a/kad-dht/README.md
+++ b/kad-dht/README.md
@@ -311,8 +311,8 @@ Each peer that receives the `ADD_PROVIDER` RPC should validate that the received
 `PeerInfo` matches the sender's `peerID`, and if it does, that peer should store
 the `PeerInfo` in its datastore. Implementations may choose to not store the
 addresses of the providing peer e.g. to reduce the amount of required storage or
-to prevent storing potentially outdated address information. In the current implementation
-peers keep the network address (i.e., the `multiaddress`) of the providing peer for **the
+to prevent storing potentially outdated address information. Implementations that choose
+to keep the network address (i.e., the `multiaddress`) of the providing peer should do it for **the
 first 10 mins** after the provider record (re-)publication. The setting of 10 mins follows
 the DHT Routing Table refresh interval. After that, peers provide 
 the provider's `peerID` only, in order to avoid pointing to stale network addresses 

From 094094d418591852e226751d4deb9b52977c4d27 Mon Sep 17 00:00:00 2001
From: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Date: Thu, 15 Sep 2022 05:45:01 +0100
Subject: [PATCH 4/7] Update kad-dht/README.md

Co-authored-by: Marcin Rataj <lidel@lidel.org>
---
 kad-dht/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kad-dht/README.md b/kad-dht/README.md
index 3fa2ed7d8..f6eae60bf 100644
--- a/kad-dht/README.md
+++ b/kad-dht/README.md
@@ -305,7 +305,7 @@ key, the DHT finds the (`k` = 20) closest peers to the key using the `FIND_NODE`
 [peer routing section](#peer-routing)), and then sends an `ADD_PROVIDER` RPC with
 its own `PeerInfo` to each of these peers. The study in [provider-record-measurements]
 proved that the replication factor of `k` = 20 is a good setting, although continuous
-monitoring and investigation.
+monitoring and investigation may change this recommendation in the future.
 
 Each peer that receives the `ADD_PROVIDER` RPC should validate that the received
 `PeerInfo` matches the sender's `peerID`, and if it does, that peer should store

From 2318d76c2d8d0edcce4d04af018d4bd726ceaec1 Mon Sep 17 00:00:00 2001
From: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Date: Thu, 29 Sep 2022 07:25:23 +0100
Subject: [PATCH 5/7] addressing editorial comments

---
 kad-dht/README.md | 48 +++++++++++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/kad-dht/README.md b/kad-dht/README.md
index f6eae60bf..45d945464 100644
--- a/kad-dht/README.md
+++ b/kad-dht/README.md
@@ -75,14 +75,22 @@ nodes, unrestricted nodes should operate in _server mode_ and restricted nodes,
 e.g. those with intermittent availability, high latency, low bandwidth, low
 CPU/RAM/Storage, etc., should operate in _client mode_.
 
-As an example, running the libp2p Kademlia protocol on top of
-publicly routable nodes, e.g. servers in a datacenter, should operate in _server
+As an example, publicly routable nodes running the libp2p Kademlia protocol, 
+e.g. servers in a datacenter, should operate in _server
 mode_ and non-publicly routable nodes, e.g. laptops behind a NAT and firewall,
 should operate in _client mode_. The concrete factors used to classify nodes into
 _clients_ and _servers_ depend on the characteristics of the network topology
 and the properties of the Kademlia DHT. Factors to take into account are e.g.
 network size, replication factor and republishing period.
 
+For instance, setting the replication factor to a low value would require more
+reliable peers, whereas having higher replication factor could allow for less 
+reliable peers at the cost of more overhead. Ultimately, peers that act as 
+servers should help the network (i.e., provide positive utility in terms of 
+availability, reachability, bandwidth). Any factor that slows down network
+operations (e.g., a node not being reachable, or overloaded) for the majority
+of times it is being contacted should instead be operating as a client node.
+
 Nodes, both those operating in _client_ and _server mode_, add another node to
 their routing table if and only if that node operates in _server mode_. This
 distinction allows restricted nodes to utilize the DHT, i.e. query the DHT,
@@ -271,22 +279,26 @@ heuristic of the value to make the decision.
 There are two things at play with regard to provider record (and therefore content)
 liveness and reachability:
 
-Content providers need to make sure that their content is reachable, despite peer churn
+Content providers need to make sure that their content is reachable, despite peer churn;
 and nodes that store and serve provider records need to make sure that the Multihashes whose 
 records they store are still served by the content provider.
 
 The following two parameters help cover both of these cases.
-1. **Provider Record Republish Interval (24hrs):** The content provider 
+1. **Provider Record Republish Interval:** The content provider 
 needs to make sure that the nodes chosen to store the provider record 
-remain online when clients ask for the record. In order to 
+are still online when clients ask for the record. In order to 
 guarantee this, while taking into account the peer churn, content providers
-republish the records they want to provide every 24 hours.
-2. **Provider Record Expiration Interval (48hrs):** The network needs to provide
+republish the records they want to provide. Choosing the particular value for the
+Republish interval is network-specific and depends on several parameters, such as
+peer reliability and churn. For the IPFS network it is currently set to 22 hours.
+2. **Provider Record Expiration Interval:** The network needs to provide
 content that content providers are still interested in providing. In other words,
 nodes should not keep records for content that content providers have stopped 
 providing (aka stale records). In order to guarantee this, provider records 
-_expire_ after 48 hours, i.e., nodes stop serving those records, 
-unless the content provider has republished the provider record.
+should _expire_ after some interval, i.e., nodes should stop serving those records, 
+unless the content provider has republished the provider record. Again, the specific
+setting depends on the characteristics of the network. In the IPFS DHT the Expiration 
+Interval is set to 48hrs.
 
 The values chosen for those parameters should be subject to continuous monitoring 
 and investigation. Ultimately, the values of those parameters should balance 
@@ -298,6 +310,16 @@ in [provider-record-measurements].
 Provider records are managed through the `ADD_PROVIDER` and `GET_PROVIDERS`
 messages.
 
+It is also worth noting that the keys for provider records are multihashes. This
+is because:
+
+- Provider records are used as a rendezvous point for all the parties who have
+advertised that they store some piece of content.
+- The same multihash can be in different CIDs (e.g. CIDv0 vs CIDv1 of a SHA-256 dag-pb object,
+or the same multihash but with different codecs such as dag-pb vs raw).
+- Therefore, the rendezvous point should converge on the minimal thing everyone agrees on,
+which is the multihash, not the CID.
+
 #### Content provider advertisement
 
 When the local node wants to indicate that it provides the value for a given
@@ -312,9 +334,11 @@ Each peer that receives the `ADD_PROVIDER` RPC should validate that the received
 the `PeerInfo` in its datastore. Implementations may choose to not store the
 addresses of the providing peer e.g. to reduce the amount of required storage or
 to prevent storing potentially outdated address information. Implementations that choose
-to keep the network address (i.e., the `multiaddress`) of the providing peer should do it for **the
-first 10 mins** after the provider record (re-)publication. The setting of 10 mins follows
-the DHT Routing Table refresh interval. After that, peers provide 
+to keep the network address (i.e., the `multiaddress`) of the providing peer should do it for
+a period of time that they are confident the network addresses of peers do not change after the 
+provider record has been (re-)published. As with previous constant values, this is dependent
+on the network's characteristics. A safe value here is the Routing Table Refresh Interval. 
+In the kubo IPFS implementation, this is set to 30 mins. After that period, peers provide 
 the provider's `peerID` only, in order to avoid pointing to stale network addresses 
 (i.e., the case where the peer has moved to a new network address).
 

From 9464d50f4e08337d0be4dc15a72761b92215747c Mon Sep 17 00:00:00 2001
From: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Date: Thu, 29 Sep 2022 07:33:59 +0100
Subject: [PATCH 6/7] provider advert and discovery clarification

---
 kad-dht/README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kad-dht/README.md b/kad-dht/README.md
index 45d945464..9e1715da7 100644
--- a/kad-dht/README.md
+++ b/kad-dht/README.md
@@ -279,9 +279,9 @@ heuristic of the value to make the decision.
 There are two things at play with regard to provider record (and therefore content)
 liveness and reachability:
 
-Content providers need to make sure that their content is reachable, despite peer churn;
-and nodes that store and serve provider records need to make sure that the Multihashes whose 
-records they store are still served by the content provider.
+Content needs to be reachable, despite peer churn;
+and nodes that store and serve provider records should not serve records for stale content,
+i.e., content that the original provider does not wish to make available anymore.
 
 The following two parameters help cover both of these cases.
 1. **Provider Record Republish Interval:** The content provider 

From 3ba82bb6f65d9b16df4bda17ad259e181a3e5439 Mon Sep 17 00:00:00 2001
From: Marcin Rataj <lidel@lidel.org>
Date: Fri, 9 Dec 2022 00:06:54 +0100
Subject: [PATCH 7/7] revision bump and cleanup

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
---
 kad-dht/README.md | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kad-dht/README.md b/kad-dht/README.md
index 9e1715da7..b0d759662 100644
--- a/kad-dht/README.md
+++ b/kad-dht/README.md
@@ -2,7 +2,7 @@
 
 | Lifecycle Stage | Maturity       | Status | Latest Revision |
 |-----------------|----------------|--------|-----------------|
-| 3A              | Recommendation | Active | r1, 2021-10-30  |
+| 3A              | Recommendation | Active | r2, 2022-12-09  |
 
 Authors: [@raulk], [@jhiesey], [@mxinden]
 
@@ -284,21 +284,26 @@ and nodes that store and serve provider records should not serve records for sta
 i.e., content that the original provider does not wish to make available anymore.
 
 The following two parameters help cover both of these cases.
+
 1. **Provider Record Republish Interval:** The content provider 
 needs to make sure that the nodes chosen to store the provider record 
 are still online when clients ask for the record. In order to 
 guarantee this, while taking into account the peer churn, content providers
 republish the records they want to provide. Choosing the particular value for the
 Republish interval is network-specific and depends on several parameters, such as
-peer reliability and churn. For the IPFS network it is currently set to 22 hours.
+peer reliability and churn.
+
+   - For the IPFS network it is currently set to **22 hours**.
+
 2. **Provider Record Expiration Interval:** The network needs to provide
 content that content providers are still interested in providing. In other words,
 nodes should not keep records for content that content providers have stopped 
 providing (aka stale records). In order to guarantee this, provider records 
 should _expire_ after some interval, i.e., nodes should stop serving those records, 
 unless the content provider has republished the provider record. Again, the specific
-setting depends on the characteristics of the network. In the IPFS DHT the Expiration 
-Interval is set to 48hrs.
+setting depends on the characteristics of the network.
+
+   - In the IPFS DHT the Expiration Interval is set to **48 hours**.
 
 The values chosen for those parameters should be subject to continuous monitoring 
 and investigation. Ultimately, the values of those parameters should balance