-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement the RBS Constant Pool in pure C #2221
base: master
Are you sure you want to change the base?
Conversation
5acc00e
to
f4c3823
Compare
@@ -181,7 +184,7 @@ static VALUE location_add_required_child(VALUE self, VALUE name, VALUE start, VA | |||
rg.start = rbs_loc_position(FIX2INT(start)); | |||
rg.end = rbs_loc_position(FIX2INT(end)); | |||
|
|||
rbs_loc_add_required_child(loc, rbs_find_constant_id_from_ruby_symbol(name), rg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an oversight in the current Ruby-backed implementation. The Ruby symbols were already interned by the fact they were used in Ruby code, so the lookup here always succeeded. It didn't matter then, but it matters now that this is its own constant pool, not effected by the Ruby source code.
add_required_child
/add_optional_child
need this to have an "insert" operation here, not a "find", to ensure that if a child is added with a new name, that its name is present in the constant pool.
location_aref
can keep using find
semantics, so there's a new rbs_constant_pool_find_ruby_symbol()
function for that, below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's technically a fix, can this and other similar changes be upstreamed separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't an issue in the previous implementation, and it only matters starting in this PR. It's small enough that I don't think it's worth shipping separately.
static VALUE location_aref(VALUE self, VALUE name) { | ||
rbs_loc *loc = rbs_check_location(self); | ||
|
||
rbs_constant_id_t id = rbs_find_constant_id_from_ruby_symbol(name); | ||
rbs_constant_id_t id = rbs_constant_pool_find_ruby_symbol(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a rename, but keeps the find
semantics. So doing loc[name]
with a bunch of names will not permenantly leak a bunch of constants into the global pool.
// | sort -u \ | ||
// | wc -l | ||
// ``` | ||
const size_t num_uniquely_interned_strings = 26; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, the rbs_constant_pool_init
implementation will round this up to the next power of 2 (32), so there's some wiggle room to insert a few extra constants without triggering any reallocations.
// constant pool at all. The initial capacity needs to be a power of 2. Picking 2 means that we won't need to realloc | ||
// in 85% of cases. | ||
// | ||
// TODO: recalculate these statistics based on a real world codebase, rather than the test suite. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 2
is probably too low for real world use, but it's not a big deal. The constant pool will resize itself just like a Ruby Hash
.
We'll come back and hand-tune it when we have better real-world data, to minimize those reallocations.
VALUE name = rb_sym2str(symbol); | ||
|
||
return rbs_constant_pool_find(RBS_GLOBAL_CONSTANT_POOL, (const uint8_t *) RSTRING_PTR(name), RSTRING_LEN(name)); | ||
// Constants inserted here will never be freed, but that's acceptable because: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be documented above the function itself too?
@@ -224,10 +227,16 @@ static VALUE rbs_new_location_from_loc_range(VALUE buffer, rbs_loc_range rg) { | |||
return obj; | |||
} | |||
|
|||
static rbs_constant_id_t rbs_constant_pool_find_ruby_symbol(VALUE symbol) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's just rbs_find_constant_id_from_ruby_symbol
renamed and moved to a different place?
If that's the case, can we not move the location of so it's easier to review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff was pretty messed up anyway, because every single line of it changed. So I just moved it closer to its point of use (there's only 1 caller)
16fd1e6
to
309f775
Compare
Hi @soutaro, this PR is ready to review, but I think there's some issue with CI. Two of the test runners report that the |
@amomchilov I'm sorry for the late reply. The CI issue should be fixed now by #2232.
They just checks if the files are update by timestamps. It's in fact a false alert. |
The check can be safely disabled during CI because we have |
309f775
to
87714ba
Compare
Great @soutaro, I rebased this and it's green now. |
This PR replaces the Ruby-based implementation of the constant pool added in #2175, and uses a pure C implementation from Prism.