Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-16995: Associate with each replica type a property for "numReplicas" #2039

Merged
merged 5 commits into from
Dec 8, 2023

Conversation

pvcnt
Copy link
Contributor

@pvcnt pvcnt commented Oct 24, 2023

https://issues.apache.org/jira/browse/SOLR-16995

Description

I am resuming refactoring started in #1928 and #1981 with the goal of simplifying/centralising code related to replica types. The end goal would be to avoid having if (replicaType == Replica.Type.*) scattered across the codebase, and to make it easier to add new replica types.

Solution

Each replica type now has a property name to specify its number of replicas in requests attached to it (nrtReplicas, tlogReplicas and pullReplicas). It removes almost all usages of NRT_REPLICAS, TLOG_REPLICAS and PULL_REPLICAS constants, except in code around v2 APIs.

Tests

New tests have been added for the new methods.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

This allows to avoid repetitive code extracting number of replicas from a message.
@@ -118,9 +188,17 @@ public Replica.Type getLeaderType() {
+ "), there must be at least one leader-eligible replica");
}

/** Validate that there is at least one replica that can be a leader. */
public void validate() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of logic should not be in solrj client, moved it in core.

return fromMessage(message, collection, null);
}

public static ReplicaCount fromMessage(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should factor all of the existing use cases of creating ReplicaCount, including with a replicationFactor and with a default replica type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to have two additional tests in ReplicaCountTest for fromMessage:

  1. The fallback to collection default replication factor works when no replicas of the default type are specified (or no replicas at all are specified),
  2. When no replicas are specified at all (message is empty), the default replica type count gets assigned correctly to the default replica type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to answer this comment. I had a look at the suggestions and found out that:

  1. It would be supported but can not happen in practice. When a collection is passed to pull default values from, it will have a replication factor and number of replicas of each type. I don't see a use case where it would have only a replication factor and no number of replicas.
  2. This does not happen in fromMessage, but it handled by the caller (e.g., CreateCollectionCmd#getNumReplicas).

@pvcnt pvcnt marked this pull request as ready for review October 24, 2023 09:33
@pvcnt pvcnt changed the title Associate with each replica type a property for "numReplicas" SOLR-16995: Associate with each replica type a property for "numReplicas" Oct 25, 2023
Copy link
Member

@murblanc murblanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Vincent for doing this well needed cleanup and refactoring.
Overall it looks good to me. I like the approach.

I did leave a few comments on changes to do (or at least to discuss). Nothing really structuring, more like cosmetics and surface changes.

This PR has ben lying around for a while now, I plan to merge it quickly.

@@ -135,6 +140,20 @@ public class CollectionHandlingUtils {
}
}

/** Returns names of properties that are used to specify a number of replicas of a given type. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't these two methods numReplicasProperties and leaderEligibleReplicaTypes belong in Replica.Type?

Copy link
Contributor Author

@pvcnt pvcnt Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the reason for not doing it was to avoid exposing more code in solrj while it is only used in Solr itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But I'd still value more a cleaner object oriented design vs packaging/deployment concerns. SolrJ exposes a lot of internals to clients so within that framework I think it makes sense to group code related to the same concepts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @murblanc . It's really tiny amounts of code any way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really would prefer not to bloat Replica.Type with methods, especially static ones, that are not used there. Since those methods are used only from API/Command layer, I believe that having them in a class named CollectionHandlingUtils makes sense. And since they are static (this is a different story for non-static methods, I added one as requested elsewhere in this PR) I do not see the difference it makes to have them here or there. So I definitely prefer not adding random util methods to a place where they do not belong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you say "do not belong".
Except standard Java classes, these two static methods depend only on Replica.Type.
If you don't want to put them in the enum put them in Replica, but move them close to the code they're related to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do not belong there because they are not used in solrj, but in solr-core.

/**
* Doesn’t index or writes to transaction log. Just replicates from {@link Type#NRT} or {@link
* Type#TLOG} replicas. {@link Type#PULL} replicas can’t become shard leaders (i.e., if there
* are only pull replicas in the collection at some point, updates will fail same as if there is
* no leaders, queries continue to work), so they don’t even participate in elections.
*/
PULL(false);
PULL(false, CollectionAdminParams.PULL_REPLICAS);

public final boolean leaderEligible;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a defaultReplica final boolean here and set it to true for the replica defined as default (via defaultType()). This will will clean up some code needing the default replica type (for example CollectionHandlingUtils.makeCollectionPropsAndDefaults()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But on the other it would require yet another look in Replica.Type#defaultType() or static caching. Have you seen other places where it would be useful? If it's only in CollectionHandlingUtils#makeCollectionPropsAndDefaults, I would prefer not to optimise now for a single use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not optimization. Intent is to improve code readability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure of the readability improvement between x.default or x == Type.defaultValue(). The former is more concise, but it is a very small improvement imo.

@@ -61,6 +122,11 @@ public boolean contains(Replica.Type type) {
return countByType.containsKey(type);
}

/** Returns the replica types for which a number of replicas was explicitely defined. */
public Set<Replica.Type> keySet() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this method if this class implements a method to directly count the number of leader replicas or if there's at least one leader replica? (see comment on count() below)
(doing the review in GitHub not in the IDE so can't easily find all callers)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used every time I need to iterate over replica types explicitly defined in this class. A similar behaviour could be obtained by iterating over all Replica.Type#values and calling ReplicaCount#contains, but it seemed more straightforward this way. We could also have a ReplicaCoubt#forEach method... So many ways to implement this!

@@ -135,6 +140,20 @@ public class CollectionHandlingUtils {
}
}

/** Returns names of properties that are used to specify a number of replicas of a given type. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @murblanc . It's really tiny amounts of code any way.

* onto this replica type. This replica type needs to be leader-eligible.
*/
public static Type defaultType() {
return NRT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add assert NRT.leaderEligible; ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird to do an assert on something that is purely static and cannot be modified at runtime. A unit test would be better if it's important but... we would be in a bad state everywhere if it wasn't, right?

@murblanc murblanc merged commit 10c09d0 into apache:main Dec 8, 2023
3 checks passed
@pvcnt pvcnt deleted the num-replicas-property branch January 9, 2024 13:50
propMap.put(TLOG_REPLICAS, numReplicas.get(Replica.Type.TLOG));
propMap.put(PULL_REPLICAS, numReplicas.get(Replica.Type.PULL));
propMap.put(REPLICATION_FACTOR, numReplicas.get(Replica.Type.defaultType()));
numReplicas.writeProps(propMap);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeProps() skips adding counts for replicas that are not present in numReplicas. This is a different behavior from previously , on the left we can see all counts were added.
This causes createCoreLessCollection when it iterates over CollectionHandlingUtils.COLLECTION_PROPS_AND_DEFAULTS, to add the backup collection replica counts for the replicas that were not explicitly specified in the restore message.

writeProps() should likely be made to add all replica counts, it is the simplest fix here.
I ran into problems while working on SIP-20 (dealing with non compatible replica types) and found this issue.

@pvcnt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants