Skip to content

Commit

Permalink
fix: a bug in the dht size estimation, and add tests to prevent it in…
Browse files Browse the repository at this point in the history
… the future
  • Loading branch information
Nuhvi committed Oct 26, 2024
1 parent 91ca212 commit 01c56d5
Show file tree
Hide file tree
Showing 5 changed files with 15 additions and 18 deletions.
16 changes: 6 additions & 10 deletions docs/dht_size_estimate/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,19 +55,15 @@ The final Dht size estimation is the average of `en_1 + en_2 + .. + en_n`

## Simulation

Running this [simulation](./src/main.rs) for 20 million nodes and a after 4 lookups, we observe:
Running this [simulation](./src/main.rs) for 20 million nodes and a after 12 lookups, we observe:

- Mean estimate: 2,001,627 nodes
- Standard deviation: 23%

![distribution of estimated dht size after 4 lookups](./plot.png)
- Mean estimate: 2,004,408 nodes
- Standard deviation: 10%

Meaning that after 12 lookups, you can be confident you are not overestimating the Dht size by more than 10%,
in fact you are most likely underestimating it slightly due to the limitation of real networks.

The relationship between the number of lookups and precision suggest that it doesn't take too many lookups
for the 95% confidence intervals to drop below +-20% from the real value. Meaning a very infrequent random
FIND_NODE queries in the background, is all what an implementation needs to get a good-enough estimation of the size of the dht.

![standrd deviation relative to number of lookups](./plot.png)
![distribution of estimated dht size after 4 lookups](./plot.png)

## Limitations

Expand Down
Binary file modified docs/dht_size_estimate/plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 3 additions & 4 deletions docs/dht_size_estimate/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ use statrs::statistics::*;

const DEFAULT_DHT_SIZE: usize = 2_000_000;

const DEFAULT_LOOKUPS: usize = 4;
const DEFAULT_LOOKUPS: usize = 12;

#[derive(Parser)]
#[command(author, version, about, long_about = None)]
Expand Down Expand Up @@ -68,7 +68,6 @@ fn main() {

for handle in handles {
handle.join().expect("Thread panicked");
// println!("Worker joined.");
}

let estimates = estimates.lock().unwrap();
Expand Down Expand Up @@ -99,10 +98,10 @@ fn simulate(dht: &BTreeMap<Id, Node>, lookups: usize) -> usize {

let mut closest_nodes = ClosestNodes::new(target);

for (_, node) in dht.range(target..).take(10) {
for (_, node) in dht.range(target..).take(200) {
closest_nodes.add(node.clone())
}
for (_, node) in dht.range(..target).rev().take(10) {
for (_, node) in dht.range(..target).rev().take(200) {
closest_nodes.add(node.clone())
}

Expand Down
Binary file removed docs/dht_size_estimate/standard-deviation.png
Binary file not shown.
10 changes: 6 additions & 4 deletions src/rpc/closest_nodes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ use std::{convert::TryInto, vec::IntoIter};

use crate::{common::MAX_BUCKET_SIZE_K, Id, Node};

const CORRECTION_FACTOR: f64 = 1.26;
const CORRECTION_FACTOR: f64 = 1.0544;

#[derive(Debug, Clone)]
/// Manage closest nodes found in a query.
Expand Down Expand Up @@ -77,7 +77,9 @@ impl ClosestNodes {
},
);

(CORRECTION_FACTOR * (sum / self.nodes.len().max(MAX_BUCKET_SIZE_K)) as f64) as usize
let count = MAX_BUCKET_SIZE_K.min(self.nodes.len());

(CORRECTION_FACTOR * (sum / count) as f64) as usize
}
}

Expand Down Expand Up @@ -163,10 +165,10 @@ mod tests {

let mut closest_nodes = ClosestNodes::new(target);

for (_, node) in nodes.range(target..).take(10) {
for (_, node) in nodes.range(target..).take(100) {
closest_nodes.add(node.clone())
}
for (_, node) in nodes.range(..target).rev().take(10) {
for (_, node) in nodes.range(..target).rev().take(100) {
closest_nodes.add(node.clone())
}

Expand Down

0 comments on commit 01c56d5

Please sign in to comment.