Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clustering coefficient update #1909

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

clustering coefficient update #1909

wants to merge 14 commits into from

Conversation

wyatt-joyner-pometry
Copy link
Contributor

@wyatt-joyner-pometry wyatt-joyner-pometry commented Jan 8, 2025

What changes were proposed in this pull request?

Refactor global and local clustering coefficient. Add two variants of batch local clustering coefficient. Added a single, optimised version.

Why are the changes needed?

It's currently extremely inefficient to run LCC on a group of nodes. The batch version should do a better job of parallelizing the process and reducing overhead.

Does this PR introduce any user-facing change? If yes is this documented?

'clustering_coefficient' is renamed to 'global_clustering_coefficient'. All of the clustering coefficient variants have been moved to a submodule of 'metrics' called 'clustering_coefficient'. The new batch implementation has corresponding docstrings.

How was this patch tested?

The two methods were The new method was tested for parity against the existing implementation in Rust and Python.

Are there any further changes required?

Currently working on an approximate version that uses HyperLogLog.

Copy link
Collaborator

@ljeub-pometry ljeub-pometry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • the path-based algorithm has room for optimisation
  • we need a benchmark to decide whether it is worth keeping the set-based algorithm at all
  • the filtering of nodes for the batch versions is unnecessarily inefficient (no need for creating subgraph views)
  • python wrappers should raise proper errors instead of panicking

Copy link
Collaborator

@ljeub-pometry ljeub-pometry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor cleanup and this is good to merge

.iter()
.filter(|nbor| nbor.degree() > 1 && nbor.node != s.node)
.combinations(2)
.filter_map(|nb| match graph.has_edge(nb[0].node, nb[1].node) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a simple filter(|nb| graph.has_edge(nb[0].node, nb[1].node) || graph.has_edge(nb[1].node, nb[0].node)) would do the same

Comment on lines 73 to 79

// let mut ctx: Context<G, ComputeStateVec> = graph.into();
let neighbours_set = accumulators::hash_set::<VID>(0);
//let neighbours_set = accumulators::hash_set::<VID>(0);
let count = accumulators::sum::<usize>(1);

ctx.agg(neighbours_set);
//ctx.agg(neighbours_set);
ctx.global_agg(count);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tidy up comments

if let Ok(py_list) = param.downcast::<PyList>() {
let mut nodes = Vec::new();
for item in py_list.iter() {
// Extract each item as a float
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a node, not a float

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these files should not be committed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these files should not be committed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these files should not be committed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these files should not be committed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these files should not be committed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants