Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17284: Remove deprecated BlobRepository #2447

Merged
merged 7 commits into from
May 23, 2024

Conversation

epugh
Copy link
Contributor

@epugh epugh commented May 8, 2024

https://issues.apache.org/jira/browse/SOLR-17284

Description

This approach to managing shared jar files was deprecated in the 8.x line and tagged for removal in Solr 9. Removing deprecated code.

Solution

Removed the code, and tested the Schema designer, which relies on .system existing, still works.

Tests

Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@epugh epugh changed the title Looking at BlobRepository.java SOLR-17294: Remove deprecated BlobRepository May 8, 2024
@epugh epugh requested a review from noblepaul May 8, 2024 13:03
@epugh
Copy link
Contributor Author

epugh commented May 8, 2024

This started with me spelunking around the code base and chasing some small typos, and then realized that this code was deprecated and we should remove it!

@dsmiley dsmiley changed the title SOLR-17294: Remove deprecated BlobRepository SOLR-17284: Remove deprecated BlobRepository May 9, 2024
@dsmiley
Copy link
Contributor

dsmiley commented May 9, 2024

I see BlobHandler and supporting infrastructure still. It's not explicitly deprecated.

Should the Schema Designer API use the "File Store" instead?

Any way, I suppose this PR, scoped to what it is, is low hanging fruit to remove.

CC @noblepaul

@gus-asf
Copy link
Contributor

gus-asf commented May 10, 2024

I'm not so keen on the loss of https://issues.apache.org/jira/browse/SOLR-8349 along with this. It was only tied to the blob repository because that seemed to be how folks wanted me to load resources at the time.

@epugh
Copy link
Contributor Author

epugh commented May 10, 2024

I see BlobHandler and supporting infrastructure still. It's not explicitly deprecated.

Should the Schema Designer API use the "File Store" instead?

Any way, I suppose this PR, scoped to what it is, is low hanging fruit to remove.

CC @noblepaul

I was actually sort of wondering how SchemaDesigner was working after i made my changes, I tested it manually and it all worked. Somehow I never stumbled over BlobHandler while spelunking in IntelliJ! I will poke around some more, and maybe it's own bit of work... (I also note that .system doesn't show up in admin ui or have much docs.. likewise blobhandler doesn't appear in ref guide as far as I can tell).

@epugh
Copy link
Contributor Author

epugh commented May 10, 2024

I'm not so keen on the loss of https://issues.apache.org/jira/browse/SOLR-8349 along with this. It was only tied to the blob repository because that seemed to be how folks wanted me to load resources at the time.

Can you share more about how this is being used? I took a quick looksee at SOLR-8349 and it points at SOLR-3443 which appears to never had been completed.

I do think thinking about sharing memory might be something we need to have a good handle on for our future ML powered world.. As we add more models to our query/ingest steps, then maybe we would want to share memory? (Dunno how that would all work!).

If we move forward with this commit, we can always restore the code from Github right? And use our FilestoreAPI or other newer APIs?

@dsmiley
Copy link
Contributor

dsmiley commented May 10, 2024

The "File Store" is the answer to saving & loading big resources. It's better than the Blob thing in that you have direct file access, which can be a requirement for some tools/plugins (that I see at work too).

@gus-asf
Copy link
Contributor

gus-asf commented May 10, 2024

I'm not so keen on the loss of https://issues.apache.org/jira/browse/SOLR-8349 along with this. It was only tied to the blob repository because that seemed to be how folks wanted me to load resources at the time.

Can you share more about how this is being used? I took a quick looksee at SOLR-8349 and it points at SOLR-3443 which appears to never had been completed.

I do think thinking about sharing memory might be something we need to have a good handle on for our future ML powered world.. As we add more models to our query/ingest steps, then maybe we would want to share memory? (Dunno how that would all work!).

If we move forward with this commit, we can always restore the code from Github right? And use our FilestoreAPI or other newer APIs?

Yeah I just noticed that I never circled back to 3443, but when I made the contribution the original patch (not using blobs) was used by a client because they had 40 core server that wanted to load a 1 GB post-code to lat/long dictionary (US & UK) in a custom component that was tagging documents with lat/long based on post codes. Loading it once rather than 40 times was pretty important. At a later date I think they converted this to the blob version to upgrade, but I was elsewhere by that time. (Edit: Memory foggy been years... Actually I think they were adding radius filter around the lat long to queries based on post code rather than tagging documents)

@gus-asf
Copy link
Contributor

gus-asf commented May 10, 2024

So basically it's a facility for folks writing custom components... which should have been also dogfooded back into 3443

@gus-asf
Copy link
Contributor

gus-asf commented May 10, 2024

The "File Store" is the answer to saving & loading big resources. It's better than the Blob thing in that you have direct file access, which can be a requirement for some tools/plugins (that I see at work too).

Yeah I've never cared what the location for the bytes to be fetched was, the only part I cared about wast that it was possible to fetch the bytes just once (from wherever). But I'm seeing the test class and tests for this removed (ResourceSharingTestComponent etc).

@epugh
Copy link
Contributor Author

epugh commented May 10, 2024

Okay, I dug around and start to see the FileStoreAPI. It also looks like we never really documented it in the Ref Guide.. Another "magic API".... So, the value of FileStoreAPI is you distribute things around the cluster, cool... I think the value of BlobStore is you have only 1 copy of the item in the cluster, reachable from anywhere.. So the SchemaDesigner uses it to store a single copy of some data to be analyzed, and it needs that single copy to be reachable from whereever, but doesn't need replciation or robustness..... Which is kind of cool.

@epugh
Copy link
Contributor Author

epugh commented May 12, 2024

I'd like to get this clean up merged soon. If, as it is currently structured with strenghts and weaknesses isn't good, then please do speak up with what we should do. Otherwise I'd like to get this code that we agreed to deprecate back in 8.x line completed!

@gus-asf
Copy link
Contributor

gus-asf commented May 13, 2024

I don't see any restoration of the resource sharing tests with file store?

@gus-asf gus-asf self-requested a review May 13, 2024 12:51
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class ResourceSharingTestComponent extends SearchComponent implements SolrCoreAware {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should not be going away. It should be using whatever the appropriate file store is (and probably should have been less closely tied to blob store at the start, but back then I was just happy to have a committer working with me on it).

}

@Test
public void test() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test demonstrates that the object shared remains visible across cores. We need to not loose this. It's really about the memory sharing more than the blobstore.

@epugh
Copy link
Contributor Author

epugh commented May 13, 2024

I'm not familiar with Resource sharing aspect. Not to put too fine a point on it, but would you be willing to port it over to the right api? I suspect you can migrate it much faster than I can! And then slap some docs into ref guide maybe so that others know this functionality exists? Right now it just feels like orphaned code that doesn't have any love. Heck, if you update the code, I'd be happy to take on the Ref Guide work...

@gus-asf
Copy link
Contributor

gus-asf commented May 14, 2024

I'm willing in principal, but I don't really have a ton of time for it right now. I might be faster in coding cycles, but probably not in wall clock.

@gus-asf
Copy link
Contributor

gus-asf commented May 14, 2024

The ideal thing would be to use it to solve https://issues.apache.org/jira/browse/SOLR-3443 with it at the same time since that's essentially the same problem my client faced. Not sure where in the ref guide it would go, since it's not an end-user feature directly but rather a component writer's aid, but if the fix for 3443 added a means to opt in to memory sharing... that would want to be documented.

@epugh
Copy link
Contributor Author

epugh commented May 14, 2024

I'd like to get this in for 9.7, so that isn't in a hurry. Why don't I leave this open till then, and then let's see what happens. If it get's fixed, great. if not, and someone really wants the feature, we can always resuscitate it. I suspect that there is going to be MORE work around rationalizing our various file storage apis... FileStoreAPI, BlobHandler, and of course BlobRepository. I'm interested in getting FileStoreAPI and BlobHandler better documented and rationalized (cause I think they are different enough to both have future value), and then if we are in a good place, and if this get's merged, then you or someone who wants this fancy component can resuscitate it. I am very NOT comfortable with anything involving shared memory or other "smart java" things which is why I don't want to update the Resource Sharing stuff. I suspect what would happen is I would get it to pass the tests, but anyone who USED it for real would immediately discovered I goofed and it doesn't actually work. ;-)

@epugh
Copy link
Contributor Author

epugh commented May 22, 2024

If this update passes all the tests, then I'm planning on merging it tomorrow, May 23rd....

@github-actions github-actions bot added the documentation Improvements or additions to documentation label May 23, 2024
@epugh
Copy link
Contributor Author

epugh commented May 23, 2024

I am going to merge this for 10 only.. That way no one get's a nasty surprise in 9.7 ;-)

@epugh epugh merged commit f5b36c8 into apache:main May 23, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants