Feature: Add random sampling function to help users use Kùzu for training GNNs #4665

prrao87 · 2024-12-23T15:45:04Z

API

Other

Description

Kùzu is mentioned in the following paper about using a graph database as a backend while training GNNs.
https://arxiv.org/pdf/2411.11375

As described in Figure 4 of the paper, one of the key steps during training is to perform random sampling of the returned nodes from a 2-hop query as follows:

MATCH (node_0:$NODE_TYPE)
WHERE node_0.id IN $SEED_NODES
OPTIONAL MATCH (node_0)-[rel_1:$REL_TYPE]->(node_1:$NODE_TYPE)-[rel_2:$REL_TYPE]->(node_2:$NODE_TYPE)
WITH node_0, node_1, node_2
ORDER BY rand()
LIMIT $MAX_NEIGHBOURS
RETURN
    node_0.id as src_id,
    node_1.id, node_1.features,
    node_2.id, node_2.features;

The ORDER BY rand() part is where the random sampling suffers on two counts:

The randomness isn't truly random, as the rand() function doesn't offer a high enough level of randomness for the purposes of training (The author of the paper noted that they had to do some more workarounds to add more randomness)
The above query is slow, as it's doing top-k (we could do better from a query performance perspective)

Feature

Per the author's observations, the desired graph database feature would be to provide a function or some high-level utility in Cypher where the random sampling is pushed down to the database layer and isn't done by PyG in-memory. Could we add such a function that allows users to perform random sampling for the purposes of training GNNs using Kùzu?

The text was updated successfully, but these errors were encountered:

prrao87 added the feature New features or missing components of existing features label Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add random sampling function to help users use Kùzu for training GNNs #4665

Feature: Add random sampling function to help users use Kùzu for training GNNs #4665

prrao87 commented Dec 23, 2024 •

edited

Loading

Feature: Add random sampling function to help users use Kùzu for training GNNs #4665

Feature: Add random sampling function to help users use Kùzu for training GNNs #4665

Comments

prrao87 commented Dec 23, 2024 • edited Loading

API

Description

Feature

prrao87 commented Dec 23, 2024 •

edited

Loading