You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{
"verb": "unpack_graph",
"args": {
"column": "clustered_graph",
"type": "nodes",
},
"input": {"source": "workflow:create_base_entity_graph"},
},
{"verb": "rename", "args": {"columns": {"label": "title"}}},
{
"verb": "select",
"args": {
"columns": [
"id",
"title",
"type",
"description",
"human_readable_id",
"graph_embedding",
"source_id",
],
},
},
{
# create_base_entity_graph has multiple levels of clustering, which means there are multiple graphs with the same entities
# this dedupes the entities so that there is only one of each entity
"verb": "dedupe",
"args": {"columns": ["id"]},
},
The above flow of steps/verbs leads to only picking the nodes from the level 0 graph.
And reason why it happens is because of the following:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Here is my understanding of the flow:
The above flow of steps/verbs leads to only picking the nodes from the level 0 graph.
And reason why it happens is because of the following:
graphrag/graphrag/index/verbs/graph/clustering/cluster_graph.py
Line 117 in 9d99f32
Note that the random number generation is seeded for every clustered graph.
If this is intentional, wouldn't operating on the original graph (for final entity generation) instead of the clustered graphs be a good idea?
Beta Was this translation helpful? Give feedback.
All reactions