Why the second search can be faster much than the first time? #2687

fang13 · 2023-12-14T08:37:00Z

fang13
Dec 14, 2023

I have not found any indexing file gererated and just want to know why ？

Dec 14, 2023

It somewhat depends on the operating system you're using, but generally speaking, all modern operating systems have a cache of the contents of recently read files in main memory. This means that subsequent reads of that file will not actually read from your disk, but from your RAM instead. Reading from RAM is typically much faster (by orders of magnitude, depending on how slow your disk is). Since these days folks tend to have a lot of RAM, even if you're searching a large file or a large code repository, it's plausible that it will fit into your RAM and subsequent searches will be a lot faster.

This is also why you don't usually need an index to speed up searches for even sizeable code r…

View full answer

BurntSushi · 2023-12-14T14:28:03Z

BurntSushi
Dec 14, 2023
Maintainer

It somewhat depends on the operating system you're using, but generally speaking, all modern operating systems have a cache of the contents of recently read files in main memory. This means that subsequent reads of that file will not actually read from your disk, but from your RAM instead. Reading from RAM is typically much faster (by orders of magnitude, depending on how slow your disk is). Since these days folks tend to have a lot of RAM, even if you're searching a large file or a large code repository, it's plausible that it will fit into your RAM and subsequent searches will be a lot faster.

This is also why you don't usually need an index to speed up searches for even sizeable code repositories. Even behemoth code repositories like Chromium fit into memory, and you don't even need a ton of RAM for that to happen. Of course, once you get into proprietary code land, repositories tend to get quite a bit bigger than even Chromium. So even ripgrep might fall over at that point.

This is also a common flaw in ad hoc benchmarking. Plenty of folks will run grep on a file, notice it's slow and then run rg on the same file and see an enormous speed-up. While ripgrep may still be faster, it's also possible that the majority of the speed-up can be explained by simple caching.

When benchmarking greps, it's typical to focus on the cache case for a few reasons:

It's probably the common case. Usually one is running multiple searches against the same haystack. And a lot can fit into RAM.
The uncached case is typically just going to be blocked on waiting for disk to return data. There are strategies to speed this up (for example, focusing on reading data sequentially off the disk if it's an actual hard disk), but generally speaking, there isn't a whole lot you can do otherwise. This is perhaps slowly starting to change though, with the existence of extremely fast disks.

So the cached case represents what I believe is the common case and the case where you (as in, the author of a grep) have the most room to impact the final search time.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the second search can be faster much than the first time? #2687

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Why the second search can be faster much than the first time? #2687

fang13 Dec 14, 2023

Replies: 1 comment

BurntSushi Dec 14, 2023 Maintainer

fang13
Dec 14, 2023

BurntSushi
Dec 14, 2023
Maintainer