Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance table row for looking up existing strings #78

Closed
Hawk777 opened this issue Dec 30, 2024 · 5 comments
Closed

Performance table row for looking up existing strings #78

Hawk777 opened this issue Dec 30, 2024 · 5 comments

Comments

@Hawk777
Copy link

Hawk777 commented Dec 30, 2024

I like the concise data about performance and features in the table in the README. However, it seems to be missing a row for the performance of looking up strings in the interner, which seems like it would be a very common operation. If I understand correctly, “fill” means adding new strings (i.e. calling get_or_intern on a string that is not already present) while “resolve” means converting symbols back to strings (i.e. calling resolve), but there doesn’t seem to be a row for calling get_or_intern on a string that is already present. For some applications, that’s going to be the vast majority of operations; for example, consider an application where you have a modest vocabulary that will all get interned quite soon after startup, but a very long sequence of words from that vocabulary that need to be converted into symbols as they’re received.

@Hawk777
Copy link
Author

Hawk777 commented Dec 30, 2024

… and apparently the answer is that all the backends will perform exactly the same in this case because the frontend has a HashMap in it that’s used for converting strings to symbols. But AFAICT that’s not mentioned anywhere in the documentation; I only discovered this by digging through the source code as I was curious how the different backends worked, and realized that they weren’t responsible for looking up existing strings!

… except that they won’t actually perform the same, because the HashMap lookup involves symbol resolution. Which is quite unintuitive, considering that I’m interested in the performance of the conversion in the opposite direction (string→symbol, not symbol→string).

@Robbepop
Copy link
Owner

Hi @Hawk777 that is a very valuable obervation. Would you be up to file a PR to improve the docs in this area? No problem if not, then it will be done at a later point in time. Certainly needs clarification.

@Hawk777
Copy link
Author

Hawk777 commented Dec 31, 2024

Maybe I could. What would you suggest? Is there data to gather, or should I just reword things so that it’s clear that the resolve row applies to both symbol→string and string→symbol performance, since the resolve step of string→symbol is the only part that differs from backend to backend?

@Robbepop
Copy link
Owner

@Hawk777 sorry I forgot to answer you.

I have implemented this in #84.

Closed.

@Hawk777
Copy link
Author

Hawk777 commented Feb 11, 2025

Looks good thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants