-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clustering coefficient update #1909
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the path-based algorithm has room for optimisation
- we need a benchmark to decide whether it is worth keeping the set-based algorithm at all
- the filtering of nodes for the batch versions is unnecessarily inefficient (no need for creating subgraph views)
- python wrappers should raise proper errors instead of panicking
...algorithms/metrics/clustering_coefficient/local_clustering_coefficient_batch_intersection.rs
Outdated
Show resolved
Hide resolved
...ory/src/algorithms/metrics/clustering_coefficient/local_clustering_coefficient_batch_path.rs
Outdated
Show resolved
Hide resolved
...ory/src/algorithms/metrics/clustering_coefficient/local_clustering_coefficient_batch_path.rs
Outdated
Show resolved
Hide resolved
...ory/src/algorithms/metrics/clustering_coefficient/local_clustering_coefficient_batch_path.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor cleanup and this is good to merge
.iter() | ||
.filter(|nbor| nbor.degree() > 1 && nbor.node != s.node) | ||
.combinations(2) | ||
.filter_map(|nb| match graph.has_edge(nb[0].node, nb[1].node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a simple filter(|nb| graph.has_edge(nb[0].node, nb[1].node) || graph.has_edge(nb[1].node, nb[0].node))
would do the same
|
||
// let mut ctx: Context<G, ComputeStateVec> = graph.into(); | ||
let neighbours_set = accumulators::hash_set::<VID>(0); | ||
//let neighbours_set = accumulators::hash_set::<VID>(0); | ||
let count = accumulators::sum::<usize>(1); | ||
|
||
ctx.agg(neighbours_set); | ||
//ctx.agg(neighbours_set); | ||
ctx.global_agg(count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tidy up comments
if let Ok(py_list) = param.downcast::<PyList>() { | ||
let mut nodes = Vec::new(); | ||
for item in py_list.iter() { | ||
// Extract each item as a float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a node, not a float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these files should not be committed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these files should not be committed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these files should not be committed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these files should not be committed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these files should not be committed
What changes were proposed in this pull request?
Refactor global and local clustering coefficient.
Add two variants of batch local clustering coefficient.Added a single, optimised version.Why are the changes needed?
It's currently extremely inefficient to run LCC on a group of nodes. The batch version should do a better job of parallelizing the process and reducing overhead.
Does this PR introduce any user-facing change? If yes is this documented?
'clustering_coefficient' is renamed to 'global_clustering_coefficient'. All of the clustering coefficient variants have been moved to a submodule of 'metrics' called 'clustering_coefficient'. The new batch implementation has corresponding docstrings.
How was this patch tested?
The two methods wereThe new method was tested for parity against the existing implementation in Rust and Python.Are there any further changes required?
Currently working on an approximate version that uses HyperLogLog.