Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17149: Fix backup/restore for large collections. #2243

Merged
merged 3 commits into from
Feb 6, 2024

Conversation

psalagnac
Copy link
Contributor

https://issues.apache.org/jira/browse/SOLR-17149

Description

A regression was introduced with SOLR-16879 that makes backup/restore failing when there are more than 5 shards per node.

Solution

Replace the bounded queue for async core admin tasks by an unbounded one.

Tests

No unit tests added.
Manually checked we don't have any tasks rejected by the executor.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@gerlowskija
Copy link
Contributor

Hi @psalagnac - this LGTM, generally.

The one thing missing IMO is some modifications to AbstractIncrementalBackupTest to cover the many-shards case. I took a stab at these locally and have code ready to share, put it looks like your branch isn't configured to allow others to push to it:

$ git push psalagnac SOLR-17149:SOLR-17149
Enumerating objects: 84, done.
Counting objects: 100% (84/84), done.
Delta compression using up to 28 threads
Compressing objects: 100% (32/32), done.
Writing objects: 100% (51/51), 13.84 KiB | 4.61 MiB/s, done.
Total 51 (delta 22), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (22/22), completed with 19 local objects.
To github.com:psalagnac/solr.git
 ! [remote rejected]         SOLR-17149 -> SOLR-17149 (**permission denied**)
error: failed to push some refs to 'github.com:psalagnac/solr.git'

If you're willing to accept my test changes, could you take a look at your branch settings and let me know when they're updated? Or alternatively, if you'd rather make the changes yourself, that's fine too.

@psalagnac
Copy link
Contributor Author

@gerlowskija Not sure why you got denied. I already checked the "Allow edits and access to secrets by maintainers" box.

Can this be because of branch name case? (SOLR-17149 vs solr-17149)

@gerlowskija
Copy link
Contributor

Gah - that must've been it! Thanks for catching my mistake. I've pushed my test-changes, and run 'test'+'check' locally. So I'll merge and backport. Thanks for the quick fix @psalagnac !

@psalagnac
Copy link
Contributor Author

Thanks for the test. Looks good to me.

@gerlowskija gerlowskija merged commit c98a8f6 into apache:main Feb 6, 2024
3 of 4 checks passed
gerlowskija added a commit that referenced this pull request Feb 6, 2024
9.4 introduced a new executor used for "expensive" operations, that
relied on a bounded queue.  This caused backup operations to fail on
collections large enough to hit this queue capacity.

This commit fixes this problem by making the executor queue unbounded.

---------

Co-authored-by: Pierre Salagnac <[email protected]>
Co-authored-by: Jason Gerlowski <[email protected]>
gerlowskija added a commit that referenced this pull request Feb 6, 2024
9.4 introduced a new executor used for "expensive" operations, that
relied on a bounded queue.  This caused backup operations to fail on
collections large enough to hit this queue capacity.

This commit fixes this problem by making the executor queue unbounded.

---------

Co-authored-by: Pierre Salagnac <[email protected]>
Co-authored-by: Jason Gerlowski <[email protected]>
freedev added a commit to freedev/solr that referenced this pull request Feb 9, 2024
* main:
  Add bugfix version 8.11.3
  Add 8.11.3 release to DOAP RDF file
  SOLR-16858: KnnQParser's "Pre-Filtering" behavior is now controlable via local params (closes apache#2157)
  SOLR-17066: Switch HttpSolrClient away from coreURLs, pt 3 (apache#2240)
  dev-docs + help: try-and-tweak 'Solr X Lucene' docs (apache#2223)
  SOLR-17152: Better alignment of Admin UI graph (apache#2249)
  SOLR-15960: Find unexported variables with compgen (apache#2250)
  fix a few typos in the Indexing Guide (apache#2245)
  SOLR-17149: Fix backup/restore for large collections. (apache#2243)
  SOLR-17146: Add DelegatingBackupRepository and alternative checksum verification (apache#2239)
  Revert "Revert "SOLR-17066: Switch HttpSolrClient away from coreURLs, pt 2 (apache#2231)""
  SOLR-17038: /admin/segments handler: Expose the term count (apache#2233)
@psalagnac psalagnac deleted the solr-17149 branch December 5, 2024 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants