-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17158 Terminate distributed processing quickly when query limit is reached - Initial impl #2379
Conversation
@@ -227,18 +243,27 @@ public ShardResponse takeCompletedOrError() { | |||
|
|||
private ShardResponse take(boolean bailOnError) { | |||
try { | |||
// although nothing in this class guarantees that pending has been incremented to the total |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CaoManhDat if you can double check my reading of this code and let me know if this comment is inaccurate in any way I'd appreciate it. (I see your name in the git blame a bunch here)
One thing folks may want to review is the addition of synchronization around manipulations of HttpShardHandler#responseCancelableMap which were necessary to avoid CME such as:
This happens because the code is iterating an unsynchronized hash map when the request thread signals a cancel operation. I chose to also enclose pending.incrementAndGet() or decrementAndGet() and responses.take() and even though they are by themselves probablly safe, there's a loop on the value of pending > 0 and these three operations are logically linked so it seemed a bit dangerous to leave them independent. Also I noticed that ShardeResponse.code is being set, but the IDE flags it as "assigned but never read", and against the recommendation of the previously existing comments it's not volatile... |
@@ -228,6 +228,7 @@ public interface CommonParams { | |||
METRICS_PATH); | |||
String APISPEC_LOCATION = "apispec/"; | |||
String INTROSPECT = "/_introspect"; | |||
String ALLOW_PARTIAL_RESULTS = "allowPartialResults"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be replaced with (or used instead of) CommonParams.PARTIAL_RESULTS
, and the associated docs in the ref guide needs to be changed accordingly.
These two parameters carry the exactly same meaning and serve the same purpose - i.e. that returning partial results is either acceptable or not. For QueryLimits if it's not acceptable (partialResults=false
, whichever way this value was set) and the query limits are exceeded then partial results will be discarded by throwing an exception inside components (which is implemented in QueryLimits.maybeExitWithPartialResults
). And here exactly the same behavior is expected, consistent with CommonParams.PARTIAL_RESULTS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, will check/fix
solr/solrj/src/java/org/apache/solr/common/params/ShardParams.java
Outdated
Show resolved
Hide resolved
solr/solrj/src/java/org/apache/solr/client/solrj/request/RequestParamsSupplier.java
Outdated
Show resolved
Hide resolved
return; | ||
} | ||
// all variables that set inside this listener must be at least volatile | ||
responseCancellableMap.put( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe if we used a ConcurrentHashMap
we could avoid locking here altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't feel comfortable with that solution because this map is updated while looping on an Atomic integer pending
and also is updated immediately after responses.take()
all of which appear to need to happen as a unit. The prior code wasn't touching it from the child threads, but since I'm changing that I want to be sure that all of these operations happen as a unit to avoid potentially weird cases where the loop fails to exit but then the map is empty etc...
solr/core/src/java/org/apache/solr/request/json/RequestUtil.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/grouping/CommandHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/search/TestCpuAllowedLimit.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/search/TestCpuAllowedLimit.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/search/TestCpuAllowedLimit.java
Outdated
Show resolved
Hide resolved
solr/solrj/src/java/org/apache/solr/client/solrj/util/AsyncListener.java
Show resolved
Hide resolved
…ly caching empty results. Also, a few TODOs remain.
Still need to tweak result attribute again, as this version is too complicated, and though I did write upgrade notes those will change (and hopefully get simpler) after said tweak and the main docs still need updated.
case "allowPartialResultsDefault": | ||
builder.setAllowPartialResultsDefault(it.boolVal(true)); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer you not bother with all the machinery in solr.xml/NodeConfig and instead have a simple EnvUtils (env or sys prop) to enable. It's just a boolean; doesn't have any interesting config to it. Yes there are other booleans here but I think we Solr maintainers should consider how NodeConfig might systematically / dynamically understand primitive values (boolean, integer, ...) and apply to EnvUtils automatically without having to touch the NodeConfig class (which is kind of a pain; that builder!). For example imagine "solr.search.partialResults" being settable as a system property, or also settable in solr.xml automatically via "searchPartialResults".
CC @janhoy
Another take on this is that, shouldn't the user simply go to their solrconfig.xml and set a default partialResults like we all do for miscellaneous things for search?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, IMHO having a per-collection setting instead of a global sysprop is more flexible (and you can still change it globally using property substitution).
This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this PR touches 33 files, which is super surprising compared to #2493
Can this be simplified or split up?
Boolean userParamAllowPartial = params.getBool(CommonParams.ALLOW_PARTIAL_RESULTS); | ||
if (userParamAllowPartial != null) { | ||
return !userParamAllowPartial; | ||
} else { | ||
return ALLOW_PARTIAL_RESULTS_DEFAULT; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use the overloaded version of getBool that provides a default? We'd then have a one-liner here.
Ok finally back to this. Sorry about the delay, the merge of a multi-threading implementation that broke (or prevented use of) features I had just released co-opted most of my available Solr time... |
Reworked version compatible with current main, and touching fewer files here: #2666 |
still needs better tests, and it is still incorrectly caching empty results. Also, a few TODOs remain.
https://issues.apache.org/jira/browse/SOLR-17158