Implement iterative vertex rebalancing for better locality #1417

porunov · 2019-02-15T09:36:04Z

porunov
Feb 15, 2019
Maintainer

Currently vertices are placed in different partitions either randomly or locally withing a single transaction. The vertex id is responsible for vertex placement. We can tell on which machine the vertex is placed by knowing it's identifier.
The problem which I see is that overtime different vertices on different servers may become more connected and that is why deep joins won't be efficient because we need to ask multiple servers to get releated data.
The idea is to implement some scheduled iterative job which can rebalance data in the cluster for better locality (to traverse graph faster).
The idea is taken from this post. The post isn't new (2014 year) but the idea is interesting.
I didn't research much in this field, so I suggest to post some issues or solutions related to this problem here or open a new thread in janusgraph-dev group.
I don't know whether it should be implemented as a separate project or in JanusGraph itself.

Currently problems which I see with rebalancing are:

If we need to rebalance our graph we need to change identifiers of vertices. How should we implement that?
- We can tell users who use rebalancing feature that vertex identifiers may change after rebalancing so users shouldn't use them in their traversals. I.e. use indexed property with custom identifier instead (like realId or myInternalId).
- We can introduce an additional identifier for all vertices like staticId which will be indexed by default and rename id into placementId. Not sure about this idea.
Currently we can't change the id of any vertex after it was stored in the database. How should we solve this issue?
- We can create a duplicate vertex with necessary id, create same edges which should exist in the original vertex and remove the original vertex. Here can be problems like: if the storage isn't consistent we can lose newly created but not yet seen edges. Also, we can lose edges which are created after we moved all edges to a duplicate vertex and before we removed the original vertex.

Possibly there are another solutions exist which I am not aware of, so if you know any or have additional comments, please post here.

Right now I think we should start some scheduled job which starts multiple workers which rebalance our graph in parallel (i.e. removes old vertices and created duplicate vertices on another machine).

porunov · 2019-03-28T21:52:47Z

porunov
Mar 28, 2019
Maintainer Author

Also, possible problem with re-balancing is that if we use edges with direction not equal to BOTH then it is not clear how we can create identical edges if we can't find them from a particular vertex.
The possible solution, maybe is to keep some map with edges collections which we find while we are re-balancing the whole graph.
I.e. I assume we can use MapReduce and find information like:
oldVertexId -> Collection with in/out edges
After that we will be able to make re-balancing and use necessary edges from that collection.

It is just my thoughts. Of-course some research should be taken to find a good solution for re-balancing.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement iterative vertex rebalancing for better locality #1417

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Implement iterative vertex rebalancing for better locality #1417

porunov Feb 15, 2019 Maintainer

Replies: 1 comment

porunov Mar 28, 2019 Maintainer Author

porunov
Feb 15, 2019
Maintainer

porunov
Mar 28, 2019
Maintainer Author