-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shave an instruction off len()
#357
Comments
It would probably affect as_str, as_bytes, reserve and as_mut_bytes, which is where the static str repr is special cased |
Thanks for coming back so swiftly. Do you think this is worth further investigation or, does it sound like it's not going to be viable for that reason? |
I think it worths giving it a shot |
OK cool. I'll take a closer look when I get some time. |
This is great! In your variant the pub fn len_new_new(c: &C) -> usize {
let last_byte = c.5 as usize;
let mut len = last_byte.wrapping_sub(LENGTH_MASK as usize);
let is_heap = len == HEAP_MASK_AFTER_SUB;
len = len.min(MAX_SIZE);
if is_heap {
len = c.1;
}
len
} example::len_new:
movq 8(%rdi), %rcx
movzbl 23(%rdi), %edx
addq $-192, %rdx
cmpq $24, %rdx
movl $24, %eax
cmovbq %rdx, %rax
cmoveq %rcx, %rax
retq
example::len_new_new:
movzbl 23(%rdi), %ecx
addq $-192, %rcx
cmpq $24, %rcx
movl $24, %eax
cmovbq %rcx, %rax
cmoveq 8(%rdi), %rax
retq Without the |
Bingo! That's 2 less instructions, and 2 less registers. I think I'd misunderstood the purpose of Is that not the point of it at all? And, if I may, another basic question: Aside from reducing instructions, is minimising the number of registers a function uses a useful thing to do? (NB I'm new to all this reading the runes of ASM business, so really feeling my way here. Any pointers much appreciated.) |
It depends though, if it uses callee-saved register then I think it could be useful. Looking at |
Also, if any call to the function is inlined, those saved registers can potentially have a positive impact on the caller |
@overlookmotel this is great, thank you!
Yes! This is an idea I've had bouncing around for a bit, I think there are some optimizations we can make if we modify the discriminant a bit. I'll whip up a PR this weekend, thanks again for the idea! |
Thanks for taking it up. Glad you like it. I've learned a lot from the bit-fiddling tricks this library uses, so glad to contribute an idea (even if not the implementation!) |
I believe it's possible to shave an instruction off
len()
(and probablyas_slice()
too) if static string had same discriminant as heap.Static string could instead be represented as a heap string with capacity 0, which I assume isn't otherwise a possible state. Then:
On 64-bit machines,
MAX_SIZE
andHEAP_MASK_AFTER_SUB
are the same (24). So this shaves off acmp
instruction aslen == HEAP_MASK_AFTER_SUB
and.min(MAX_SIZE)
both work off the result of a singlecmp
, rather than requiring 2. And (if I'm reading the ASM right) it also uses one less register.https://godbolt.org/z/e7Kc4PMTE
Quite possibly handling the special case of zero-capacity heap strings (which are actually static strings) would introduce costs elsewhere, but I'm not familiar enough with the code to say. Can anyone advise if that's likely to negate the gain?
The text was updated successfully, but these errors were encountered: