-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17284: Remove deprecated BlobRepository #2447
Conversation
We have newer approaches such as the FileStoreAPI and the PackageManager.
This started with me spelunking around the code base and chasing some small typos, and then realized that this code was deprecated and we should remove it! |
I see BlobHandler and supporting infrastructure still. It's not explicitly deprecated. Should the Schema Designer API use the "File Store" instead? Any way, I suppose this PR, scoped to what it is, is low hanging fruit to remove. CC @noblepaul |
I'm not so keen on the loss of https://issues.apache.org/jira/browse/SOLR-8349 along with this. It was only tied to the blob repository because that seemed to be how folks wanted me to load resources at the time. |
I was actually sort of wondering how SchemaDesigner was working after i made my changes, I tested it manually and it all worked. Somehow I never stumbled over |
Can you share more about how this is being used? I took a quick looksee at SOLR-8349 and it points at SOLR-3443 which appears to never had been completed. I do think thinking about sharing memory might be something we need to have a good handle on for our future ML powered world.. As we add more models to our query/ingest steps, then maybe we would want to share memory? (Dunno how that would all work!). If we move forward with this commit, we can always restore the code from Github right? And use our FilestoreAPI or other newer APIs? |
The "File Store" is the answer to saving & loading big resources. It's better than the Blob thing in that you have direct file access, which can be a requirement for some tools/plugins (that I see at work too). |
Yeah I just noticed that I never circled back to 3443, but when I made the contribution the original patch (not using blobs) was used by a client because they had 40 core server that wanted to load a 1 GB post-code to lat/long dictionary (US & UK) in a custom component that was tagging documents with lat/long based on post codes. Loading it once rather than 40 times was pretty important. At a later date I think they converted this to the blob version to upgrade, but I was elsewhere by that time. (Edit: Memory foggy been years... Actually I think they were adding radius filter around the lat long to queries based on post code rather than tagging documents) |
So basically it's a facility for folks writing custom components... which should have been also dogfooded back into 3443 |
Yeah I've never cared what the location for the bytes to be fetched was, the only part I cared about wast that it was possible to fetch the bytes just once (from wherever). But I'm seeing the test class and tests for this removed (ResourceSharingTestComponent etc). |
Okay, I dug around and start to see the FileStoreAPI. It also looks like we never really documented it in the Ref Guide.. Another "magic API".... So, the value of FileStoreAPI is you distribute things around the cluster, cool... I think the value of BlobStore is you have only 1 copy of the item in the cluster, reachable from anywhere.. So the SchemaDesigner uses it to store a single copy of some data to be analyzed, and it needs that single copy to be reachable from whereever, but doesn't need replciation or robustness..... Which is kind of cool. |
I'd like to get this clean up merged soon. If, as it is currently structured with strenghts and weaknesses isn't good, then please do speak up with what we should do. Otherwise I'd like to get this code that we agreed to deprecate back in 8.x line completed! |
I don't see any restoration of the resource sharing tests with file store? |
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
public class ResourceSharingTestComponent extends SearchComponent implements SolrCoreAware { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should not be going away. It should be using whatever the appropriate file store is (and probably should have been less closely tied to blob store at the start, but back then I was just happy to have a committer working with me on it).
} | ||
|
||
@Test | ||
public void test() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test demonstrates that the object shared remains visible across cores. We need to not loose this. It's really about the memory sharing more than the blobstore.
I'm not familiar with Resource sharing aspect. Not to put too fine a point on it, but would you be willing to port it over to the right api? I suspect you can migrate it much faster than I can! And then slap some docs into ref guide maybe so that others know this functionality exists? Right now it just feels like orphaned code that doesn't have any love. Heck, if you update the code, I'd be happy to take on the Ref Guide work... |
I'm willing in principal, but I don't really have a ton of time for it right now. I might be faster in coding cycles, but probably not in wall clock. |
The ideal thing would be to use it to solve https://issues.apache.org/jira/browse/SOLR-3443 with it at the same time since that's essentially the same problem my client faced. Not sure where in the ref guide it would go, since it's not an end-user feature directly but rather a component writer's aid, but if the fix for 3443 added a means to opt in to memory sharing... that would want to be documented. |
I'd like to get this in for 9.7, so that isn't in a hurry. Why don't I leave this open till then, and then let's see what happens. If it get's fixed, great. if not, and someone really wants the feature, we can always resuscitate it. I suspect that there is going to be MORE work around rationalizing our various file storage apis... FileStoreAPI, BlobHandler, and of course BlobRepository. I'm interested in getting FileStoreAPI and BlobHandler better documented and rationalized (cause I think they are different enough to both have future value), and then if we are in a good place, and if this get's merged, then you or someone who wants this fancy component can resuscitate it. I am very NOT comfortable with anything involving shared memory or other "smart java" things which is why I don't want to update the Resource Sharing stuff. I suspect what would happen is I would get it to pass the tests, but anyone who USED it for real would immediately discovered I goofed and it doesn't actually work. ;-) |
If this update passes all the tests, then I'm planning on merging it tomorrow, May 23rd.... |
I am going to merge this for 10 only.. That way no one get's a nasty surprise in 9.7 ;-) |
https://issues.apache.org/jira/browse/SOLR-17284
Description
This approach to managing shared jar files was deprecated in the 8.x line and tagged for removal in Solr 9. Removing deprecated code.
Solution
Removed the code, and tested the Schema designer, which relies on .system existing, still works.
Tests
Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.