-
-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support searching only using the first N symbols of copy to increase search performance #845
Comments
I think the more reasonable approach here is to simply make searching faster. For example by migrating it to operate on an in-memory SearchKit index https://developer.apple.com/documentation/coreservices/search_kit |
Maccy already searches for the first 5000 characters, can you provide exact texts where the search is slow? |
I'm on 2.1.0 and the slow search problem is still present. Will return here after reproducing on a clean install It seems that the total searchable text is up to 5000*999 characters long, but copies as large as 5k chars are the minority in my case - so maybe the problem is me |
here is a reproducible recording: maccy_bug_report.movthis was filmed on macos sonoma 14.5, maccy 2.2.0 (updated today from 2.1.0) the script is here https://gist.github.com/750/e8635bd7f54d0b26de3129c60e0e1185 |
The video above features 5 copies of sizes ranging from N = 1 million to N = 10 million characters, total is 17 million characters. Each copy consists of a character repeated N times. After reading code related to search, I believe that searching is performed across all those characters, not just the first 5000 (I'm on "exact", not "fuzzy"). But this shouldn't be a problem |
@p0deje swift code
import Foundation
var hello: String? = nil;
var len = 100;
while len < 2_000_000_001 {
hello = String(repeating: "1", count: len) + "zu";
let ts1 = Date().timeIntervalSince1970;
hello!.range(of: "zu", options: .caseInsensitive)
let ts2 = Date().timeIntervalSince1970;
print("size:", len, "total seconds:", ts2-ts1);
len = Int(Double(len) * 10);
}
try the same for python
import time
t = time.time()
a = "1"*500_000_000+"zu"
print("to create initial string:", time.time()-t)
t = time.time()
a.lower().find("zu".lower())
print("to find in lowercase of that string:", time.time()-t)
try the same for js
const str = "1".repeat(500000000)+"zu";
const lower_str = str.toLowerCase();
const startTime = performance.now()
lower_str.indexOf("zu");
const endTime = performance.now()
console.log(`took ${endTime - startTime} milliseconds`)
Methodology is not perfect, but I think the results speak for themselves: finding a substring of a string in Swift is significantly slower than in python, while js does that even faster Maybe this is related to Swift's rich string implementation. I believe fixing this will solve this issue and also make swiftlang/swift#861 and 8f6bd14 not needed |
I am currently experimenting with using SearchKit as an alternative. Will report my findings. |
I am not sure it's a good fit, based on the following https://developer.apple.com/documentation/coreservices/search_kit
|
Apparently NSString.range is about 20 times faster than String.range: benchmarkimport Foundation
var test_string: String? = nil;
var test_string_ns: NSString? = nil;
var len = 1024;
for power in stride(from: 10, to: 29, by: 2) {
len = Int(pow(Double(2), Double(power)));
test_string = String(repeating: "1", count: len) + "zu";
test_string_ns = NSString(string: test_string!);
print("string size: 2^\(power) (\(len))")
var ts = Date().timeIntervalSince1970;
NSString(string: test_string!).range(of: "zu", options: NSString.CompareOptions.caseInsensitive)
print(" NSString(String)", "total seconds:", Date().timeIntervalSince1970-ts)
ts = Date().timeIntervalSince1970;
test_string_ns!.range(of: "zu", options: NSString.CompareOptions.caseInsensitive)
print(" NSString ", "total seconds:", Date().timeIntervalSince1970-ts)
ts = Date().timeIntervalSince1970;
test_string!.range(of: "zu", options: NSString.CompareOptions.caseInsensitive)
print(" String ", "total seconds:", Date().timeIntervalSince1970-ts)
}
created a bug report for swift swiftlang/swift-foundation#1068
If everything is correct - doesn't this mean that we can speed up search by about x20 with a single line change? Line 108 in 8f6bd14
|
@weisJ I'm curious, are you experimenting with searchKit just for performance or for a particular reason
SearchKit is aimed at words rather than symbols (judging by the docs). Another idea is to maintain a reverse index for characters so that we don't have to search all items on every query update. |
I simply wanted to try out SearchKit to see what it has to offer. |
(disregard this, i fucked up)
~~here is an excerpt~~
|
Status update:
The next step is to make a PR, that PR should also include fix for 4 (or at least a hint that those toggles require a restart) |
@750 Thanks for looking into this.
|
Before Submitting Your Feature Request
Problem
So my workflow is like this:
That workflow isn't quite well supported in Maccy: big items make search much slower which is quite unpleasant when I just need to find a 10 digit ID I copied two hours ago of which I remember the first two symbols
Here is a test i performed:
I then tried searching, it was is very slow. During search, Maccy becomes unresponsive (eventually it will finish the search in a few seconds, but still)
I understand that this is probably not the most popular scenario, but I use it very often as a daytime software developer for debugging purposes
And of course, thank you for your great free open source app, it's been a lifesaver
Solution
Maccy could support a configurable "only search in the first N symbols of item" setting, with
N=0
as the default (unlimited). That should make the search function virtually instant for all cases, independent of what was copied in Maccy previouslyThe text was updated successfully, but these errors were encountered: