Performance table row for looking up existing strings #78

Hawk777 · 2024-12-30T08:07:34Z

I like the concise data about performance and features in the table in the README. However, it seems to be missing a row for the performance of looking up strings in the interner, which seems like it would be a very common operation. If I understand correctly, “fill” means adding new strings (i.e. calling get_or_intern on a string that is not already present) while “resolve” means converting symbols back to strings (i.e. calling resolve), but there doesn’t seem to be a row for calling get_or_intern on a string that is already present. For some applications, that’s going to be the vast majority of operations; for example, consider an application where you have a modest vocabulary that will all get interned quite soon after startup, but a very long sequence of words from that vocabulary that need to be converted into symbols as they’re received.

The text was updated successfully, but these errors were encountered:

Hawk777 · 2024-12-30T08:27:51Z

… and apparently the answer is that all the backends will perform exactly the same in this case because the frontend has a HashMap in it that’s used for converting strings to symbols. But AFAICT that’s not mentioned anywhere in the documentation; I only discovered this by digging through the source code as I was curious how the different backends worked, and realized that they weren’t responsible for looking up existing strings!

… except that they won’t actually perform the same, because the HashMap lookup involves symbol resolution. Which is quite unintuitive, considering that I’m interested in the performance of the conversion in the opposite direction (string→symbol, not symbol→string).

Robbepop · 2024-12-30T10:54:26Z

Hi @Hawk777 that is a very valuable obervation. Would you be up to file a PR to improve the docs in this area? No problem if not, then it will be done at a later point in time. Certainly needs clarification.

Hawk777 · 2024-12-31T03:01:48Z

Maybe I could. What would you suggest? Is there data to gather, or should I just reword things so that it’s clear that the resolve row applies to both symbol→string and string→symbol performance, since the resolve step of string→symbol is the only part that differs from backend to backend?

Robbepop · 2025-02-11T14:55:27Z

@Hawk777 sorry I forgot to answer you.

I have implemented this in #84.

Closed.

Hawk777 · 2025-02-11T16:42:19Z

Looks good thanks!

Robbepop closed this as completed Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance table row for looking up existing strings #78

Performance table row for looking up existing strings #78

Hawk777 commented Dec 30, 2024

Hawk777 commented Dec 30, 2024 •

edited

Loading

Robbepop commented Dec 30, 2024

Hawk777 commented Dec 31, 2024

Robbepop commented Feb 11, 2025

Hawk777 commented Feb 11, 2025

Performance table row for looking up existing strings #78

Performance table row for looking up existing strings #78

Comments

Hawk777 commented Dec 30, 2024

Hawk777 commented Dec 30, 2024 • edited Loading

Robbepop commented Dec 30, 2024

Hawk777 commented Dec 31, 2024

Robbepop commented Feb 11, 2025

Hawk777 commented Feb 11, 2025

Hawk777 commented Dec 30, 2024 •

edited

Loading