You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Kùzu is mentioned in the following paper about using a graph database as a backend while training GNNs. https://arxiv.org/pdf/2411.11375
As described in Figure 4 of the paper, one of the key steps during training is to perform random sampling of the returned nodes from a 2-hop query as follows:
The ORDER BY rand() part is where the random sampling suffers on two counts:
The randomness isn't truly random, as the rand() function doesn't offer a high enough level of randomness for the purposes of training (The author of the paper noted that they had to do some more workarounds to add more randomness)
The above query is slow, as it's doing top-k (we could do better from a query performance perspective)
Feature
Per the author's observations, the desired graph database feature would be to provide a function or some high-level utility in Cypher where the random sampling is pushed down to the database layer and isn't done by PyG in-memory. Could we add such a function that allows users to perform random sampling for the purposes of training GNNs using Kùzu?
The text was updated successfully, but these errors were encountered:
API
Other
Description
Kùzu is mentioned in the following paper about using a graph database as a backend while training GNNs.
https://arxiv.org/pdf/2411.11375
As described in Figure 4 of the paper, one of the key steps during training is to perform random sampling of the returned nodes from a 2-hop query as follows:
The
ORDER BY rand()
part is where the random sampling suffers on two counts:rand()
function doesn't offer a high enough level of randomness for the purposes of training (The author of the paper noted that they had to do some more workarounds to add more randomness)Feature
Per the author's observations, the desired graph database feature would be to provide a function or some high-level utility in Cypher where the random sampling is pushed down to the database layer and isn't done by PyG in-memory. Could we add such a function that allows users to perform random sampling for the purposes of training GNNs using Kùzu?
The text was updated successfully, but these errors were encountered: