SOLR-17158 Terminate distributed processing quickly when query limit is reached - Initial impl #2379

gus-asf · 2024-03-31T22:32:02Z

still needs better tests, and it is still incorrectly caching empty results. Also, a few TODOs remain.

https://issues.apache.org/jira/browse/SOLR-17158

gus-asf · 2024-04-01T03:53:03Z

solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java

@@ -227,18 +243,27 @@ public ShardResponse takeCompletedOrError() {

  private ShardResponse take(boolean bailOnError) {
    try {
+      // although nothing in this class guarantees that pending has been incremented to the total


@CaoManhDat if you can double check my reading of this code and let me know if this comment is inaccurate in any way I'd appreciate it. (I see your name in the git blame a bunch here)

gus-asf · 2024-04-01T13:56:36Z

One thing folks may want to review is the addition of synchronization around manipulations of HttpShardHandler#responseCancelableMap which were necessary to avoid CME such as:

{
  "responseHeader":{
    "zkConnected":true,
    "status":500,
    "QTime":18,
    "params":{
      "q":"document",
      "indent":"true",
      "q.op":"OR",
      "timeAllowed":"5",
      "allowPartialResults":"false",
      "useParams":"",
      "_":"1711320289606"    }  },
  "error":{
    "trace":"java.util.ConcurrentModificationException
at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1597)
at java.base/java.util.HashMap$ValueIterator.next(HashMap.java:1625)
at org.apache.solr.handler.component.HttpShardHandler.cancelAll(HttpShardHandler.java:257)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:577)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:238)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2886)
at org.apache.solr.servlet.HttpSolrCall.executeCoreRequest(HttpSolrCall.java:876)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:560)
at org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:254)
at org.apache.solr.servlet.SolrDispatchFilter.lambda$doFilter$0(SolrDispatchFilter.java:215)
at org.apache.solr.servlet.ServletUtils.traceHttpRequestExecution2(ServletUtils.java:247)
at org.apache.solr.servlet.ServletUtils.rateLimitRequest(ServletUtils.java:211)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:209)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192)
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210)
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1635)

This happens because the code is iterating an unsynchronized hash map when the request thread signals a cancel operation. I chose to also enclose pending.incrementAndGet() or decrementAndGet() and responses.take() and even though they are by themselves probablly safe, there's a loop on the value of pending > 0 and these three operations are logically linked so it seemed a bit dangerous to leave them independent.

Also I noticed that ShardeResponse.code is being set, but the IDE flags it as "assigned but never read", and against the recommendation of the previously existing comments it's not volatile...

sigram · 2024-04-02T09:06:04Z

solr/solrj/src/java/org/apache/solr/common/params/CommonParams.java

@@ -228,6 +228,7 @@ public interface CommonParams {
          METRICS_PATH);
  String APISPEC_LOCATION = "apispec/";
  String INTROSPECT = "/_introspect";
+  String ALLOW_PARTIAL_RESULTS = "allowPartialResults";


This should be replaced with (or used instead of) CommonParams.PARTIAL_RESULTS, and the associated docs in the ref guide needs to be changed accordingly.

These two parameters carry the exactly same meaning and serve the same purpose - i.e. that returning partial results is either acceptable or not. For QueryLimits if it's not acceptable (partialResults=false, whichever way this value was set) and the query limits are exceeded then partial results will be discarded by throwing an exception inside components (which is implemented in QueryLimits.maybeExitWithPartialResults). And here exactly the same behavior is expected, consistent with CommonParams.PARTIAL_RESULTS.

Yup, will check/fix

solr/solrj/src/java/org/apache/solr/common/params/ShardParams.java

solr/solrj/src/java/org/apache/solr/client/solrj/request/RequestParamsSupplier.java

sigram · 2024-04-02T09:34:47Z

solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java

-      return;
-    }
+      // all variables that set inside this listener must be at least volatile
+      responseCancellableMap.put(


Maybe if we used a ConcurrentHashMap we could avoid locking here altogether?

I didn't feel comfortable with that solution because this map is updated while looping on an Atomic integer pending and also is updated immediately after responses.take() all of which appear to need to happen as a unit. The prior code wasn't touching it from the child threads, but since I'm changing that I want to be sure that all of these operations happen as a unit to avoid potentially weird cases where the loop fails to exit but then the map is empty etc...

solr/core/src/java/org/apache/solr/request/json/RequestUtil.java

solr/core/src/java/org/apache/solr/search/grouping/CommandHandler.java

solr/core/src/test/org/apache/solr/search/TestCpuAllowedLimit.java

solr/solrj/src/java/org/apache/solr/client/solrj/util/AsyncListener.java

…ly caching empty results. Also, a few TODOs remain.

Still need to tweak result attribute again, as this version is too complicated, and though I did write upgrade notes those will change (and hopefully get simpler) after said tweak and the main docs still need updated.

dsmiley · 2024-04-11T22:22:47Z

solr/core/src/java/org/apache/solr/core/SolrXmlConfig.java

+              case "allowPartialResultsDefault":
+                builder.setAllowPartialResultsDefault(it.boolVal(true));
+                break;


I would prefer you not bother with all the machinery in solr.xml/NodeConfig and instead have a simple EnvUtils (env or sys prop) to enable. It's just a boolean; doesn't have any interesting config to it. Yes there are other booleans here but I think we Solr maintainers should consider how NodeConfig might systematically / dynamically understand primitive values (boolean, integer, ...) and apply to EnvUtils automatically without having to touch the NodeConfig class (which is kind of a pain; that builder!). For example imagine "solr.search.partialResults" being settable as a system property, or also settable in solr.xml automatically via "searchPartialResults".

CC @janhoy

Another take on this is that, shouldn't the user simply go to their solrconfig.xml and set a default partialResults like we all do for miscellaneous things for search?

I agree, IMHO having a per-collection setting instead of a global sysprop is more flexible (and you can still change it globally using property substitution).

github-actions · 2024-06-19T00:00:27Z

This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution!

dsmiley

Wow, this PR touches 33 files, which is super surprising compared to #2493
Can this be simplified or split up?

dsmiley · 2024-07-31T15:46:55Z

solr/core/src/java/org/apache/solr/request/SolrQueryRequest.java

+    Boolean userParamAllowPartial = params.getBool(CommonParams.ALLOW_PARTIAL_RESULTS);
+    if (userParamAllowPartial != null) {
+      return !userParamAllowPartial;
+    } else {
+      return ALLOW_PARTIAL_RESULTS_DEFAULT;
+    }


can we use the overloaded version of getBool that provides a default? We'd then have a one-liner here.

gus-asf · 2024-08-16T14:51:33Z

Wow, this PR touches 33 files, which is super surprising compared to #2493 Can this be simplified or split up?

Ok finally back to this. Sorry about the delay, the merge of a multi-threading implementation that broke (or prevented use of) features I had just released co-opted most of my available Solr time...

gus-asf · 2024-08-26T00:48:48Z

Reworked version compatible with current main, and touching fewer files here: #2666

github-actions bot added jetty-server client:solrj tests cat:search labels Mar 31, 2024

gus-asf changed the title ~~SOLR-17158 Initial impl~~ SOLR-17158 Terminate distributed processing quickly when query limit is reached - Initial impl Mar 31, 2024

gus-asf requested a review from sigram April 1, 2024 03:44

gus-asf commented Apr 1, 2024

View reviewed changes

sigram requested changes Apr 2, 2024

View reviewed changes

gus-asf added 2 commits April 10, 2024 19:38

SOLR-17158 Initial impl, still needs tests, and it is still incorrect…

1fe6593

…ly caching empty results. Also, a few TODOs remain.

SOLR-17158 - review feedback.

f5473c3

Still need to tweak result attribute again, as this version is too complicated, and though I did write upgrade notes those will change (and hopefully get simpler) after said tweak and the main docs still need updated.

gus-asf force-pushed the SOLR-17158 branch from 35ee119 to f5473c3 Compare April 11, 2024 00:35

github-actions bot added documentation Improvements or additions to documentation configs module:ltr labels Apr 11, 2024

dsmiley reviewed Apr 11, 2024

View reviewed changes

github-actions bot added the stale PR not updated in 60 days label Jun 19, 2024

gus-asf mentioned this pull request Jul 31, 2024

SOLR-17320: Added support for timeAllowed time out in HttpShardHandler #2493

Closed

7 tasks

dsmiley reviewed Jul 31, 2024

View reviewed changes

github-actions bot removed the stale PR not updated in 60 days label Aug 2, 2024

gus-asf mentioned this pull request Aug 26, 2024

SOLR-17158 Terminate distributed processing quickly when query limit is reached #2666

Merged

gus-asf closed this Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOLR-17158 Terminate distributed processing quickly when query limit is reached - Initial impl #2379

SOLR-17158 Terminate distributed processing quickly when query limit is reached - Initial impl #2379

gus-asf commented Mar 31, 2024

gus-asf Apr 1, 2024

gus-asf commented Apr 1, 2024

sigram Apr 2, 2024

gus-asf Apr 2, 2024

sigram Apr 2, 2024

gus-asf Apr 11, 2024

dsmiley Apr 11, 2024

sigram Apr 15, 2024

github-actions bot commented Jun 19, 2024

dsmiley left a comment

dsmiley Jul 31, 2024

gus-asf commented Aug 16, 2024 •

edited

Loading

gus-asf commented Aug 26, 2024

SOLR-17158 Terminate distributed processing quickly when query limit is reached - Initial impl #2379

SOLR-17158 Terminate distributed processing quickly when query limit is reached - Initial impl #2379

Conversation

gus-asf commented Mar 31, 2024

Choose a reason for hiding this comment

gus-asf commented Apr 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jun 19, 2024

dsmiley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gus-asf commented Aug 16, 2024 • edited Loading

gus-asf commented Aug 26, 2024

gus-asf commented Aug 16, 2024 •

edited

Loading