clustering coefficient update #1909

wyatt-joyner-pometry · 2025-01-08T17:27:34Z

What changes were proposed in this pull request?

Refactor global and local clustering coefficient. ~~Add two variants of batch local clustering coefficient.~~ Added a single, optimised version.

Why are the changes needed?

It's currently extremely inefficient to run LCC on a group of nodes. The batch version should do a better job of parallelizing the process and reducing overhead.

Does this PR introduce any user-facing change? If yes is this documented?

'clustering_coefficient' is renamed to 'global_clustering_coefficient'. All of the clustering coefficient variants have been moved to a submodule of 'metrics' called 'clustering_coefficient'. The new batch implementation has corresponding docstrings.

How was this patch tested?

~~The two methods were~~ The new method was tested for parity against the existing implementation in Rust and Python.

Are there any further changes required?

Currently working on an approximate version that uses HyperLogLog.

ljeub-pometry

the path-based algorithm has room for optimisation
we need a benchmark to decide whether it is worth keeping the set-based algorithm at all
the filtering of nodes for the batch versions is unnecessarily inefficient (no need for creating subgraph views)
python wrappers should raise proper errors instead of panicking

...algorithms/metrics/clustering_coefficient/local_clustering_coefficient_batch_intersection.rs

...ory/src/algorithms/metrics/clustering_coefficient/local_clustering_coefficient_batch_path.rs

raphtory/src/algorithms/metrics/clustering_coefficient/mod.rs

raphtory/src/algorithms/motifs/local_triangle_count.rs

...ory/src/algorithms/metrics/clustering_coefficient/local_clustering_coefficient_batch_path.rs

raphtory/src/python/packages/algorithms.rs

ljeub-pometry

Some minor cleanup and this is good to merge

ljeub-pometry · 2025-01-27T11:24:59Z

raphtory/src/algorithms/metrics/clustering_coefficient/local_clustering_coefficient_batch.rs

+                .iter()
+                .filter(|nbor| nbor.degree() > 1 && nbor.node != s.node)
+                .combinations(2)
+                .filter_map(|nb| match graph.has_edge(nb[0].node, nb[1].node) {


a simple filter(|nb| graph.has_edge(nb[0].node, nb[1].node) || graph.has_edge(nb[1].node, nb[0].node)) would do the same

ljeub-pometry · 2025-01-27T11:30:42Z

raphtory/src/algorithms/motifs/triangle_count.rs

+
    // let mut ctx: Context<G, ComputeStateVec> = graph.into();
-    let neighbours_set = accumulators::hash_set::<VID>(0);
+    //let neighbours_set = accumulators::hash_set::<VID>(0);
    let count = accumulators::sum::<usize>(1);

-    ctx.agg(neighbours_set);
+    //ctx.agg(neighbours_set);
    ctx.global_agg(count);


tidy up comments

ljeub-pometry · 2025-01-27T11:32:51Z

raphtory/src/python/packages/algorithms.rs

+    if let Ok(py_list) = param.downcast::<PyList>() {
+        let mut nodes = Vec::new();
+        for item in py_list.iter() {
+            // Extract each item as a float


that's a node, not a float

ljeub-pometry · 2025-01-27T11:33:48Z

stub_gen/raphtory_stub_gen.egg-info/PKG-INFO