create_final_entities only picks the nodes from level 0 when dedup verb is executed! Either a bug or possible performance improvement! #754

ksachdeva · 2024-07-29T03:00:42Z

ksachdeva
Jul 29, 2024

Here is my understanding of the flow:

{
            "verb": "unpack_graph",
            "args": {
                "column": "clustered_graph",
                "type": "nodes",
            },
            "input": {"source": "workflow:create_base_entity_graph"},
        },
        {"verb": "rename", "args": {"columns": {"label": "title"}}},
        {
            "verb": "select",
            "args": {
                "columns": [
                    "id",
                    "title",
                    "type",
                    "description",
                    "human_readable_id",
                    "graph_embedding",
                    "source_id",
                ],
            },
        },
        {
            # create_base_entity_graph has multiple levels of clustering, which means there are multiple graphs with the same entities
            # this dedupes the entities so that there is only one of each entity
            "verb": "dedupe",
            "args": {"columns": ["id"]},
        },

The above flow of steps/verbs leads to only picking the nodes from the level 0 graph.

And reason why it happens is because of the following:

graphrag/graphrag/index/verbs/graph/clustering/cluster_graph.py

Line 117 in 9d99f32

random = Random(seed) # noqa S311

Note that the random number generation is seeded for every clustered graph.

If this is intentional, wouldn't operating on the original graph (for final entity generation) instead of the clustered graphs be a good idea?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create_final_entities only picks the nodes from level 0 when dedup verb is executed! Either a bug or possible performance improvement! #754

{{title}}

Replies: 0 comments

Select a reply

create_final_entities only picks the nodes from level 0 when dedup verb is executed! Either a bug or possible performance improvement! #754

ksachdeva Jul 29, 2024

Replies: 0 comments

ksachdeva
Jul 29, 2024