Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-8393: Component for resource usage planning #1638

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

igiguere
Copy link
Contributor

@igiguere igiguere commented May 10, 2023

https://issues.apache.org/jira/browse/SOLR-8393

DRAFT :

  • V2 Implematation ?
  • Question (from @gerlowskija) : Are the calculations based on size-estimator-lucene-solr.xls accurate enough to use?
  • Suggestion (from @dsmiley) : Has the Metrics API been explored as a solution to the problem/need?

Description

New feature that attempts to extrapolate resources needed in the future by looking at resources currently used.

Original idea by Steve Molloy, with additional parameter based on comment from Shawn Heisey.

Documentation copied from the Jira ticket.

Solution

V1 API:
New component: SizeComponent.java. Component can be set on the /select handler to provide sizing for a single core.

New collection operation: ClusterSizing.java. Action 'clustersizing' is added to CollectionsHandler. Class ClusterSizing calls the size component for each core.

Old-style V2 API:
Adding a method in ClusterApi.java. It calls the V1 implementation, so the question of accuracy remains

Tests

The size component is tested in SizeComponentTest.java

Cluster sizing response is tested in ClusterSizingTest.java

Full test on a running instance of Solr.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

igiguere and others added 5 commits May 10, 2023 14:39
New component that attempts to extrapolate resources needed in the
future by looking at resources currently used.

Original idea by Steve Molloy, with additional parameter based on
comment from Shawn Heisey.

Documentation copied from the Jira ticket.
Revert to HttpSolrClient.

With Http2SolrClient, "clustersizing" requests on a secured Solr
(security.json) fail with HTTP 401 when trying to get the info of each
core.

Quick fix.  A better solution should be available.
@github-actions github-actions bot added documentation Improvements or additions to documentation client:solrj labels Jan 23, 2024
igiguere and others added 4 commits April 2, 2024 14:49
Setup PKI Authentication to request size per replica.

Add unit test for ClusterSizing
…f [email protected]:igiguere/solr.git into SOLR-8393-Component-for-Solr-resource-usage-planning
@epugh
Copy link
Contributor

epugh commented Apr 9, 2024

I'd prefer to see a V2 api added instead of a V1 api. Adding more V1 api's is just adding to the backlog of work on our V2 migration, so I'd love to see that instead added...!

@epugh
Copy link
Contributor

epugh commented Apr 9, 2024

The use of the hphenated (kebab style?) total-disk-size pattern I think should be changed to camelCase totalDiskSize, that is the pattern we use in the rest of our JSON output.

@epugh
Copy link
Contributor

epugh commented Apr 9, 2024

This looks very helpful, though I can't speak to if it's accurate or not... I'd love to see somethign replace the old excel spreadsheet that we recently removed as it was no longer useful/accurate. Maybe @janhoy you have some thoughts on this....

@igiguere
Copy link
Contributor Author

I'd prefer to see a V2 api added instead of a V1 api. Adding more V1 api's is just adding to the backlog of work on our V2 migration, so I'd love to see that instead added...!

Agreed, but, as mentioned, this is from a pre-existing patch.
I come back to Solr only about once a year, and usually to apply some old patch on a more recent version. That means I have a limited understanding of what ties into what and why. So, implementing everything needed for clustersizing v2 would be a long and difficult process for me.

Participation is welcomed!

igiguere and others added 2 commits April 10, 2024 16:37
@igiguere igiguere marked this pull request as draft April 10, 2024 20:48
Add partial implementation for v2 api.

fix unit tests.
@gerlowskija
Copy link
Contributor

About to take a look at the code and see if I can help with the v2 side of things, but before I dive into that I figured it was worth asking:

Does size-estimator-lucene-solr.xls actually work for folks? Do you use it regularly @igiguere ? Have you found it to be pretty accurate? Any other folks have experience with it?

I'm happy to be wrong if we have several groups of folks out there in the wild that are using it, but my initial reaction is to be a little skeptical that it's reliable enough to incorporate into Solr.

Primarily because, well, modeling resource-usage is a really really tough problem. There's a reason that the community's only response to sizing questions has always been pretty much "You'll have to Guess-and-Check".

And secondarily, because the spreadsheet this is all based off of was added in 2011 and hasn't really seen much iteration in the decade since. There's an absolute ton that's changed in both Lucene and Solr since then.

@igiguere
Copy link
Contributor Author

igiguere commented Apr 11, 2024

@gerlowskija

Does size-estimator-lucene-solr.xls actually work for folks? Do you use it regularly @igiguere ? Have you found it to be pretty accurate? Any other folks have experience with it?

Me, personally, no, I don't use it ;). I'll try to find out from client-facing people in the company. I doubt anyone has compiled usage vs success statistics.

UPDATE: I couldn't find anyone who really used size-estimator-lucene-solr.xls or the clusterzising feature (v1). So of course, nobody has any clue about accuracy.

... the community's only response to sizing questions has always been pretty much "You'll have to Guess-and-Check".

The cluster sizing feature is documented to estimate (i.e.: guess) resource usage. We could make the documentation clearer that it's not a fool-proof measure. But, at least it beats holding a finger to the wind. And it's a bit less complicated that the xls and a calculator.

And secondarily, because the spreadsheet this is all based off of was added in 2011 and hasn't really seen much iteration in the decade since. There's an absolute ton that's changed in both Lucene and Solr since then.

We're only calculating RAM, disk size, document size. Whatever has changed in Solr and Lucene, if it has an effect on RAM, disk space, doc size, then it should be reflected on the results... No?

Note that this feature is meant to be used on a current "staging" deployment, to evaluate the eventual size of a "production" environment, for the same version of Solr. No one is expected to draw conclusions from a previous version, so changes from one version to another are not a concern in that way.

As a more general note, I should add that I'm a linguist converted to Java dev. Not a mathematician ;) If there's an error in the math, I will never see it.

igiguere and others added 5 commits April 14, 2024 16:40
Old-style V2 API, in ClusterAPI
Add ClusterSizingRequestBody

Start implementation based on metrics... a lot more to be done, just to
sift through the metrics!
…f [email protected]:igiguere/solr.git into SOLR-8393-Component-for-Solr-resource-usage-planning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:api cat:search client:solrj documentation Improvements or additions to documentation tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants