Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement the RBS Constant Pool in pure C #2221

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

amomchilov
Copy link
Contributor

@amomchilov amomchilov commented Jan 6, 2025

This PR replaces the Ruby-based implementation of the constant pool added in #2175, and uses a pure C implementation from Prism.

@amomchilov amomchilov mentioned this pull request Jan 6, 2025
7 tasks
@amomchilov amomchilov force-pushed the pure-C-constant-pool branch 3 times, most recently from 5acc00e to f4c3823 Compare January 9, 2025 16:00
@@ -181,7 +184,7 @@ static VALUE location_add_required_child(VALUE self, VALUE name, VALUE start, VA
rg.start = rbs_loc_position(FIX2INT(start));
rg.end = rbs_loc_position(FIX2INT(end));

rbs_loc_add_required_child(loc, rbs_find_constant_id_from_ruby_symbol(name), rg);
Copy link
Contributor Author

@amomchilov amomchilov Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an oversight in the current Ruby-backed implementation. The Ruby symbols were already interned by the fact they were used in Ruby code, so the lookup here always succeeded. It didn't matter then, but it matters now that this is its own constant pool, not effected by the Ruby source code.

add_required_child/add_optional_child need this to have an "insert" operation here, not a "find", to ensure that if a child is added with a new name, that its name is present in the constant pool.

location_aref can keep using find semantics, so there's a new rbs_constant_pool_find_ruby_symbol() function for that, below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's technically a fix, can this and other similar changes be upstreamed separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't an issue in the previous implementation, and it only matters starting in this PR. It's small enough that I don't think it's worth shipping separately.

static VALUE location_aref(VALUE self, VALUE name) {
rbs_loc *loc = rbs_check_location(self);

rbs_constant_id_t id = rbs_find_constant_id_from_ruby_symbol(name);
rbs_constant_id_t id = rbs_constant_pool_find_ruby_symbol(name);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a rename, but keeps the find semantics. So doing loc[name] with a bunch of names will not permenantly leak a bunch of constants into the global pool.

// | sort -u \
// | wc -l
// ```
const size_t num_uniquely_interned_strings = 26;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, the rbs_constant_pool_init implementation will round this up to the next power of 2 (32), so there's some wiggle room to insert a few extra constants without triggering any reallocations.

// constant pool at all. The initial capacity needs to be a power of 2. Picking 2 means that we won't need to realloc
// in 85% of cases.
//
// TODO: recalculate these statistics based on a real world codebase, rather than the test suite.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 2 is probably too low for real world use, but it's not a big deal. The constant pool will resize itself just like a Ruby Hash.

We'll come back and hand-tune it when we have better real-world data, to minimize those reallocations.

VALUE name = rb_sym2str(symbol);

return rbs_constant_pool_find(RBS_GLOBAL_CONSTANT_POOL, (const uint8_t *) RSTRING_PTR(name), RSTRING_LEN(name));
// Constants inserted here will never be freed, but that's acceptable because:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be documented above the function itself too?

@@ -224,10 +227,16 @@ static VALUE rbs_new_location_from_loc_range(VALUE buffer, rbs_loc_range rg) {
return obj;
}

static rbs_constant_id_t rbs_constant_pool_find_ruby_symbol(VALUE symbol) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's just rbs_find_constant_id_from_ruby_symbol renamed and moved to a different place?

If that's the case, can we not move the location of so it's easier to review?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff was pretty messed up anyway, because every single line of it changed. So I just moved it closer to its point of use (there's only 1 caller)

@amomchilov amomchilov marked this pull request as ready for review January 9, 2025 17:10
@amomchilov amomchilov force-pushed the pure-C-constant-pool branch 2 times, most recently from 16fd1e6 to 309f775 Compare January 10, 2025 23:39
@amomchilov
Copy link
Contributor Author

Hi @soutaro, this PR is ready to review, but I think there's some issue with CI.

Two of the test runners report that the constants.c and ruby_objs.c are stale. They're not (this PR doesn't even touch those templates). Perhaps there's a stale copy of the repo on the runners, for some reason?

@soutaro
Copy link
Member

soutaro commented Jan 15, 2025

@amomchilov I'm sorry for the late reply.

The CI issue should be fixed now by #2232.

Two of the test runners report that the constants.c and ruby_objs.c are stale.

They just checks if the files are update by timestamps. It's in fact a false alert.

@soutaro
Copy link
Member

soutaro commented Jan 15, 2025

The check can be safely disabled during CI because we have confirm_lexer and confirm_templates tasks to make sure they are updated.

@amomchilov amomchilov force-pushed the pure-C-constant-pool branch from 309f775 to 87714ba Compare January 16, 2025 21:39
@amomchilov
Copy link
Contributor Author

amomchilov commented Jan 16, 2025

Great @soutaro, I rebased this and it's green now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants