Replies: 1 comment
-
Also, possible problem with re-balancing is that if we use edges with direction not equal to It is just my thoughts. Of-course some research should be taken to find a good solution for re-balancing. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Currently vertices are placed in different partitions either randomly or locally withing a single transaction. The
vertex id
is responsible for vertex placement. We can tell on which machine the vertex is placed by knowing it's identifier.The problem which I see is that overtime different vertices on different servers may become more connected and that is why deep joins won't be efficient because we need to ask multiple servers to get releated data.
The idea is to implement some scheduled iterative job which can rebalance data in the cluster for better locality (to traverse graph faster).
The idea is taken from this post. The post isn't new (2014 year) but the idea is interesting.
I didn't research much in this field, so I suggest to post some issues or solutions related to this problem here or open a new thread in
janusgraph-dev
group.I don't know whether it should be implemented as a separate project or in JanusGraph itself.
Currently problems which I see with rebalancing are:
realId
ormyInternalId
).staticId
which will be indexed by default and renameid
intoplacementId
. Not sure about this idea.Possibly there are another solutions exist which I am not aware of, so if you know any or have additional comments, please post here.
Right now I think we should start some scheduled job which starts multiple workers which rebalance our graph in parallel (i.e. removes old vertices and created duplicate vertices on another machine).
Beta Was this translation helpful? Give feedback.
All reactions