-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17049: Fix Replica Down on startup logic #2432
SOLR-17049: Fix Replica Down on startup logic #2432
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice patch! It would be great to get a test in that could at least catch if this did nothing at all, but that is probably not a simple request and I wouldn't want to see it hold up a fix given it would be a minimally valuable test in the near term and the rest of the tests are good enough in terms of checking if it would break anything.
@@ -181,29 +183,46 @@ public String getShardId(String nodeName, String coreName) { | |||
} | |||
|
|||
public String getShardId(String collectionName, String nodeName, String coreName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nobody calls this; lets remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a public SolrJ API so I hesitate to remove it in a minor release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry; I defintiely don't think we should hold ourselves to that high of a standard. As you know there are a ton of classes and methods that are public in all of SolrJ JARs. The risk of not giving ourselves permission to remove stuff is that contributions that might come, instead do not because it's too much work to maintain backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can mark deprecated for now; don't need to remove it yet.
for (CollectionRef ref : states) { | ||
DocCollection coll = ref.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I know already existed) looping all collections scares me where I work. Maybe that's a LazyDocCollection and we call get() (default false to get disallow cached state, i.e. we need to fetch from ZK).
|
||
ClusterState clusterState = cc.getZkController().getClusterState(); | ||
Map<String, List<Replica>> replicasPerCollectionOnNode = | ||
clusterState.getReplicaNamesPerCollectionOnNode(nodeName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears this is only called on the current node. If true, couldn't we instead list CoreDescriptors to find cores on the current node? This scales much better than looping all collections that exist in the entire cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice, but two problems.
- The core descriptors are not yet available. That's actually why the wait didn't work initially.
- What if the cores not longer exist locally? The cluster state would possibly have them there acting as if they are active and healthy on startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. I wish the replica's state, in PRS, was an ephemeral node, and then this would be a non-issue. No need to mark anything down, ZK does it for you :-). @murblanc often speaks of wishing for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it should be mentioned that this is only really a problem because of PRS. In order to determine if a collection needs to be managed by the node itself, we have to load the collection and check if it is PRS enabled. If the overseer could handle the updates for PRS, it would be much less of a strain.
As for the initial issue that the node doesn't know which collections to watch, we could have the overseer somehow give that information back to the node to tell it which collections to watch for state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, the overseer should have all the collections state anyways, so its much less of an issue. I do understand that looping through thousands (or hundreds of thousands) of collections is a problem itself, without storing replica information in two places (under nodes and under shards) it's a necessary evil sometimes.
Having replica state be an ephemeral node could be useful in this regard, I wonder how it would complicate the general loading and reading of cluster state. (and impact ZK load for large clusters)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, the overseer should have all the collections state anyways, so its much less of an issue
I've heard this might be true but haven't found how to see this in the code. You imply, I suppose, that LazyCollectionRef is never used on an Overseer node? How?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I am handwaving a bit. But if all cluster-mutating events go through the overseer, then the overseer should have all the most recent versions of all collections, right? (Sure, it won't have the states of collections that have not been referenced since the node became overseer, but that issue goes away after the first "node-down" event)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if all cluster-mutating events go through the overseer ...
Yeah, probably this. Locally I want to add either a log or metric on org.apache.solr.common.cloud.ZkStateReader.DocCollectionWatches#activeCollectionCount
to help monitor this and understand it.
Even if all collections are PRS, we will have to do this loop.
Totally understood. Eventually if ephemeral nodes are used for replica state, then this loop would be removed, as the replica' state would be interpreted as down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would still need the loop to bring the replica states back to ACTIVE, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep that overseer alive somewhere. It would make an excellent case study. I think it's pretty rare - I'm hard pressed to think of anything I've ever run into that comes close to its performance / cluster impact in comparison to the infrequent number of bits it actually has to manage and distribute. It's honestly breath taking in its own way. The level of independence in its work, the amount of information involved ... if I ever teach a software course, I'd pull it out of a jar. You can't just waltz into code like that. There are a lot of lessons tied up in that code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to merge
@@ -181,29 +183,46 @@ public String getShardId(String nodeName, String coreName) { | |||
} | |||
|
|||
public String getShardId(String collectionName, String nodeName, String coreName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can mark deprecated for now; don't need to remove it yet.
(cherry picked from commit 1b582e9)
(cherry picked from commit 1b582e9)
I don't think we should add additional methods to an important class like ClusterState unless we think it's "worthy". This PR introduces a method called by only ZkController; so let's instead have ZkController have this logic so we can keep ClusterState smaller with fewer obscure methods. |
https://issues.apache.org/jira/browse/SOLR-17049
The list of replicas should not be determined by what exists locally, but instead on what exists in Zookeeper.