Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support searching only using the first N symbols of copy to increase search performance #845

Open
2 tasks done
750 opened this issue Aug 20, 2024 · 14 comments
Open
2 tasks done
Labels
enhancement New feature or request

Comments

@750
Copy link

750 commented Aug 20, 2024

Before Submitting Your Feature Request

  • Check that there isn't already a similar feature request to avoid creating a duplicate.
  • I have seen the FAQ.

Problem

So my workflow is like this:

  • I copy a lot of text every day, and that sometimes includes many big jsons, like a 1MB one (or 20MB, that happens too).
  • When I copy big jsons, I don't really need it to be searchable. To paste it with Maccy, I just select them with arrow keys or mouse
  • At the same time I do need smaller copies to be searchable

That workflow isn't quite well supported in Maccy: big items make search much slower which is quite unpleasant when I just need to find a 10 digit ID I copied two hours ago of which I remember the first two symbols

Here is a test i performed:

  • Clean maccy storage
  • Copy 20 string of sizes between 5 symbols and and 5 million symbols (the total size is 55MB)

I then tried searching, it was is very slow. During search, Maccy becomes unresponsive (eventually it will finish the search in a few seconds, but still)

I understand that this is probably not the most popular scenario, but I use it very often as a daytime software developer for debugging purposes

And of course, thank you for your great free open source app, it's been a lifesaver

Solution

Maccy could support a configurable "only search in the first N symbols of item" setting, with N=0 as the default (unlimited). That should make the search function virtually instant for all cases, independent of what was copied in Maccy previously

@750 750 added the enhancement New feature or request label Aug 20, 2024
@weisJ
Copy link
Contributor

weisJ commented Aug 20, 2024

I think the more reasonable approach here is to simply make searching faster. For example by migrating it to operate on an in-memory SearchKit index https://developer.apple.com/documentation/coreservices/search_kit

@p0deje
Copy link
Owner

p0deje commented Oct 2, 2024

Maccy already searches for the first 5000 characters, can you provide exact texts where the search is slow?

@750
Copy link
Author

750 commented Nov 21, 2024

I'm on 2.1.0 and the slow search problem is still present. Will return here after reproducing on a clean install

It seems that the total searchable text is up to 5000*999 characters long, but copies as large as 5k chars are the minority in my case - so maybe the problem is me

@750
Copy link
Author

750 commented Nov 23, 2024

here is a reproducible recording:

maccy_bug_report.mov

this was filmed on macos sonoma 14.5, maccy 2.2.0 (updated today from 2.1.0)

the script is here https://gist.github.com/750/e8635bd7f54d0b26de3129c60e0e1185

@750
Copy link
Author

750 commented Nov 23, 2024

The video above features 5 copies of sizes ranging from N = 1 million to N = 10 million characters, total is 17 million characters. Each copy consists of a character repeated N times.

After reading code related to search, I believe that searching is performed across all those characters, not just the first 5000 (I'm on "exact", not "fuzzy"). But this shouldn't be a problem

@750
Copy link
Author

750 commented Nov 24, 2024

@p0deje
So here is something funny: "exact" uses String.range function to find a match. Apparently String.range is not very fast.

swift code

import Foundation

var hello: String? = nil;
var len = 100;
while len < 2_000_000_001 {
      hello = String(repeating: "1", count: len) + "zu";

      let ts1 = Date().timeIntervalSince1970;

      hello!.range(of: "zu", options: .caseInsensitive)

      let ts2 = Date().timeIntervalSince1970;
      print("size:", len, "total seconds:", ts2-ts1);     

      len = Int(Double(len) * 10);
}
size: 100 total seconds: 9.918212890625e-05
size: 1000 total seconds: 0.0003609657287597656
size: 10000 total seconds: 0.003032684326171875
size: 100000 total seconds: 0.02391982078552246
size: 1000000 total seconds: 0.11755490303039551
size: 10000000 total seconds: 1.1412279605865479
size: 100000000 total seconds: 11.311367988586426

try the same for python

import time
t = time.time()
a = "1"*500_000_000+"zu"
print("to create initial string:", time.time()-t)
t = time.time()
a.lower().find("zu".lower())
print("to find in lowercase of that string:", time.time()-t)
to create initial string: 0.2323617935180664
to find in lowercase of that string: 0.5006411075592041

try the same for js

const str = "1".repeat(500000000)+"zu";

const lower_str = str.toLowerCase();


const startTime = performance.now()

lower_str.indexOf("zu");

const endTime = performance.now()

console.log(`took ${endTime - startTime} milliseconds`)
took 23.399999998509884 milliseconds

Methodology is not perfect, but I think the results speak for themselves: finding a substring of a string in Swift is significantly slower than in python, while js does that even faster

Maybe this is related to Swift's rich string implementation.

I believe fixing this will solve this issue and also make swiftlang/swift#861 and 8f6bd14 not needed

@weisJ
Copy link
Contributor

weisJ commented Nov 24, 2024

I am currently experimenting with using SearchKit as an alternative. Will report my findings.

@p0deje
Copy link
Owner

p0deje commented Nov 24, 2024

I am not sure it's a good fit, based on the following https://developer.apple.com/documentation/coreservices/search_kit

The Search Kit API is appropriate when you want your application to have full control over indexing and searching, and when your focus is file content.

@750
Copy link
Author

750 commented Nov 24, 2024

Apparently NSString.range is about 20 times faster than String.range:

benchmark
import Foundation

var test_string: String? = nil;
var test_string_ns: NSString? = nil;
var len = 1024;
for power in stride(from: 10, to: 29, by: 2) {
      len = Int(pow(Double(2), Double(power)));

      test_string = String(repeating: "1", count: len) + "zu";
      test_string_ns = NSString(string: test_string!);

      print("string size: 2^\(power) (\(len))")

      var ts = Date().timeIntervalSince1970;
      NSString(string: test_string!).range(of: "zu", options: NSString.CompareOptions.caseInsensitive)
      print("  NSString(String)", "total seconds:", Date().timeIntervalSince1970-ts)

      ts = Date().timeIntervalSince1970;
      test_string_ns!.range(of: "zu", options: NSString.CompareOptions.caseInsensitive)
      print("  NSString        ", "total seconds:", Date().timeIntervalSince1970-ts)

      ts = Date().timeIntervalSince1970;
      test_string!.range(of: "zu", options: NSString.CompareOptions.caseInsensitive)
      print("  String          ", "total seconds:", Date().timeIntervalSince1970-ts)
}
string size: 2^10 (1024)
  NSString(String) total seconds: 3.3855438232421875e-05
  NSString         total seconds: 2.002716064453125e-05
  String           total seconds: 0.0007479190826416016
string size: 2^12 (4096)
  NSString(String) total seconds: 5.3882598876953125e-05
  NSString         total seconds: 4.1961669921875e-05
  String           total seconds: 0.001093149185180664
string size: 2^14 (16384)
  NSString(String) total seconds: 0.00011920928955078125
  NSString         total seconds: 8.20159912109375e-05
  String           total seconds: 0.0018987655639648438
string size: 2^16 (65536)
  NSString(String) total seconds: 0.0004029273986816406
  NSString         total seconds: 0.0003039836883544922
  String           total seconds: 0.00872802734375
string size: 2^18 (262144)
  NSString(String) total seconds: 0.0017659664154052734
  NSString         total seconds: 0.0013327598571777344
  String           total seconds: 0.03779292106628418
string size: 2^20 (1048576)
  NSString(String) total seconds: 0.006550312042236328
  NSString         total seconds: 0.004827022552490234
  String           total seconds: 0.11996912956237793
string size: 2^22 (4194304)
  NSString(String) total seconds: 0.025386810302734375
  NSString         total seconds: 0.019916057586669922
  String           total seconds: 0.4834160804748535
string size: 2^24 (16777216)
  NSString(String) total seconds: 0.10060501098632812
  NSString         total seconds: 0.07497215270996094
  String           total seconds: 2.0408670902252197
string size: 2^26 (67108864)
  NSString(String) total seconds: 0.4020881652832031
  NSString         total seconds: 0.3013949394226074
  String           total seconds: 7.639375686645508
string size: 2^28 (268435456)
  NSString(String) total seconds: 1.6302740573883057
  NSString         total seconds: 1.2082538604736328
  String           total seconds: 31.18701410293579

created a bug report for swift swiftlang/swift-foundation#1068

  1. Now, obviously i will need to use more than 1 test case in the benchmark 🤭
  2. Also need to check for ram usage
  3. Also need to research whether this is affected by some utf8-bytes-utf16 conflicts, idk. On paper it should be fine, byt maybe you guys know something I don't? @p0deje @weisJ

If everything is correct - doesn't this mean that we can speed up search by about x20 with a single line change?

if let range = searchString.range(of: string, options: options, range: nil, locale: nil) {

@750
Copy link
Author

750 commented Nov 24, 2024

@weisJ I'm curious, are you experimenting with searchKit just for performance or for a particular reason

  • like if a user has very big items and a very big history?
  • like if Maccy weakens the 999 items restriction?

SearchKit is aimed at words rather than symbols (judging by the docs). Another idea is to maintain a reverse index for characters so that we don't have to search all items on every query update.

@weisJ
Copy link
Contributor

weisJ commented Nov 24, 2024

I simply wanted to try out SearchKit to see what it has to offer.

750 added a commit to 750/Maccy that referenced this issue Nov 26, 2024
@750
Copy link
Author

750 commented Nov 26, 2024

(disregard this, i fucked up)
Okay so in a fork I:

  • reverted searchable text limit (8f6bd14)
  • implemented both searches and enabled them both (so the performance is worse then in release Maccy, it's performing both searches for every query)
  • Set up some logging

I'm gonna run this for my use case for a day or longer, will return with results. So far NSString is abot 10-15 times faster which is enough to

~~here is an excerpt~~

default	03:07:22.653424+0300	Maccy	Maccy845 new search, query char count: 1
default	03:07:22.653486+0300	Maccy	Maccy845 totalSize: 22419905 totalCount: 1002
default	03:07:22.656686+0300	Maccy	      Maccy845 StringDuration: 0.003 StringFound: 388
default	03:07:22.657718+0300	Maccy	      Maccy845 NSStringDuration: 0.001  NSStringFound: 388
default	03:08:17.498230+0300	Maccy	Maccy845 new search, query char count: 16
default	03:08:17.498347+0300	Maccy	Maccy845 totalSize: 22419921 totalCount: 1003
default	03:08:20.723622+0300	Maccy	      Maccy845 StringDuration: 3.225 StringFound: 0
default	03:08:20.847937+0300	Maccy	      Maccy845 NSStringDuration: 0.124  NSStringFound: 0
default	03:08:39.169125+0300	Maccy	Maccy845 new search, query char count: 15
default	03:08:39.169172+0300	Maccy	Maccy845 totalSize: 22419921 totalCount: 1003
default	03:08:42.415117+0300	Maccy	      Maccy845 StringDuration: 3.246 StringFound: 0
default	03:08:42.539819+0300	Maccy	      Maccy845 NSStringDuration: 0.125  NSStringFound: 0
default	03:08:43.607654+0300	Maccy	Maccy845 new search, query char count: 14
default	03:08:43.607789+0300	Maccy	Maccy845 totalSize: 22419921 totalCount: 1003
default	03:08:46.845654+0300	Maccy	      Maccy845 StringDuration: 3.238 StringFound: 0
default	03:08:46.970321+0300	Maccy	      Maccy845 NSStringDuration: 0.125  NSStringFound: 0
default	03:08:59.019176+0300	Maccy	Maccy845 new search, query char count: 9
default	03:08:59.019298+0300	Maccy	Maccy845 totalSize: 22419921 totalCount: 1003
default	03:09:00.707207+0300	Maccy	      Maccy845 StringDuration: 1.688 StringFound: 0
default	03:09:00.845962+0300	Maccy	      Maccy845 NSStringDuration: 0.139  NSStringFound: 0
default	03:09:07.524553+0300	Maccy	Maccy845 new search, query char count: 1
default	03:09:07.524669+0300	Maccy	Maccy845 totalSize: 22419921 totalCount: 1003
default	03:09:09.092750+0300	Maccy	      Maccy845 StringDuration: 1.568 StringFound: 446
default	03:09:09.153953+0300	Maccy	      Maccy845 NSStringDuration: 0.061  NSStringFound: 446
default	03:09:22.155061+0300	Maccy	Maccy845 new search, query char count: 5
default	03:09:22.155169+0300	Maccy	Maccy845 totalSize: 22419921 totalCount: 1003
default	03:09:23.825563+0300	Maccy	      Maccy845 StringDuration: 1.670 StringFound: 4
default	03:09:23.937309+0300	Maccy	      Maccy845 NSStringDuration: 0.112  NSStringFound: 4

@750
Copy link
Author

750 commented Dec 1, 2024

Status update:

  1. NSString.range is much faster: takes <1s for several MB strings while String.range is absolutely unusable taking dozens of seconds. Im thinking of adding a "Optimize search for big copies" toggle with a remark about it being slower for trivial cases (like searching for something that is at the very beginning of all large items) for 2. This seems to be a macos-swift bug String.range is much slower than NSString.range [macOS m1] swiftlang/swift-foundation#1068 ?
  2. creating NSString with String adds some overhead, but it is still <1s for copy history with large items
  3. there is a bug with "Show special symbols" switch not actually changing displayed items on toggle despite having code to literally do that if im not mistaken - https://github.com/p0deje/Maccy/blob/master/Maccy/Observables/History.swift#L94
  4. Testing with external tooling is difficult: there is a limit to how low the clipboard check interval can be set (~0.15s i think). That means testing full history cases takes a lot of time to create that history
  5. i'm gonna create a CONTIBUTING.md for xcode newbies like me if the maintainers don't mind
  6. The code related to (a) item text vs (b) visible item title while not searching vs (c) visible item title while searching vs (d) what we are actually searching in is somewhat convoluted, i need to make sense of it first. This was less of a problem before 8f6bd14 - (a) and (d) were the same thing

The next step is to make a PR, that PR should also include fix for 4 (or at least a hint that those toggles require a restart)

@p0deje
Copy link
Owner

p0deje commented Dec 5, 2024

@750 Thanks for looking into this.

  1. Good to know.
  2. That should be fine.
  3. Hmm, can you create a separate issue for this?
  4. You should be able to adjust using defaults write clipboardCheckInterval 0.01.
  5. Looking forward to the PR.
  6. The reason for the change was to avoid creating large text views in SwiftUI, we'll have to keep that. Still, if the search is fast, we can search within the whole text rather than stripped title.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants