Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

types: use slab allocator for type bounds #231

Merged
merged 10 commits into from
Jul 4, 2024
Merged

Conversation

apoelstra
Copy link
Collaborator

Our existing type inference engine assumes a "global" set of type bounds, which has two bad consequences: one is that if you are constructing multiple programs, there is no way to "firewall" their type bounds so that you cannot accidentally combine type variables from one program with type variables from another. You just need to be careful. The other consequence is that if you construct infinitely sized types, which are represented as a reference cycle, the existing inference engine will leak memory.

To fix this, we need to stop allocating type bounds using untethered Arcs and instead use a slab allocator, which allows all bounds to be dropped at once, regardless of their circularity. This should also improve memory locality and our speed, as well as reducing the total amount of locking and potential mutex contention if type inference is done in a multithreaded context.

This is a 2000-line diff but the vast majority of the changes are "API-only" stuff where I was moving types around and threading new parameters through dozens or hundreds of call sites. I did my best to break everything up into commits such that the big-diff commits don't do much of anything and the real changes happen in the small-diff ones to make review easier.

By itself, this PR does not fix the issue of reference cycles, because it includes an Arc<Context> inside the recursive Type type itself. Future PRs will:

  • Take a single mutex lock during calls to the top-level bind and unify calls, so that these all happen atomically, including all recursive calls.
  • Add another intermediate type under Type which eliminates the Arc<Context> and its potential for circular references. Along the way, make the Bound type private, which is not really used outside of the types module anyway.
  • Do "checkpointing" during type inference that makes node construction atomic; this is Failed type inference should not pollute type bounds #226 which is not fixed by this PR.
  • (Maybe) move node allocation into the type inference context so that nodes can be slab-allocated as well, which will address Drop on Node should not be recursive #229 "for free" without us figuring out a non-recursive Drop impl for Arc<Node<N>>.

apoelstra added 7 commits July 3, 2024 20:01
This looks like a purely noise-increasing change, but when we introduce
type inference contexts, we will need the auxiliary type to hold an
inference context (which will be unused except for sanity-checking that
users are being consistent with their inference contexts).
This is a super noisy commit, but all it does is introduce the
src/types/context.rs module with a dummy Context type, and make all the
node construction APIs take this and make copies of handles of it.

The next commit will actually _use_ the context in type inference.

This one can therefore be mostly skimmed. The maybe-interesting parts
are the conversion methods, which can be found by searching the diff for
calls to `from_inner`.
This is again an API-only change.
This is again an "API only" change.

Also fixes a couple code-cleanliness things in the benches/ crate.
The sum and product constructors don't obviously need to be passed a
type context -- after all, the two child types already have a type
context embedded in them.

However, it turns out in practice we have a context available
every single time we call these methods, and it's a little bit
difficult to pull the context out of Types, since Types only
contain a weak pointer to the context object, and in theory
the context might have been dropped, leading to a panic.

Since the user already has a context handle, just make them pass it
in. Then we know it exists.
Our union-bound algorithm uses some features of Arc (notably: cheap
clones and pointer equality) and therefore holds an Arc<T>. But in a
later commit we will want to replace the Arcs with a different reference
type, and to do so, we need to make this code more generic.
…ion-bound

This introduces the BoundRef type, an opaque reference type which
currently just holds an Arc<BoundMutex>, but which in a later commit
will be tied to the type inference context and refer to data within the
context.

It currently preserves the get() and set() methods on BoundRef, which
come from BoundMutex, but these will need to be replaced in a later
commit, since eventually BoundRef by itself will not contain enough
information to update its data.

This is essentially an API-only change though it does move some data
structures around.
Comment on lines +38 to +44
// Having `Clone` makes it easier to derive Clone on structures
// that contain Arrow, even though it is potentially confusing
// to use `.clone` to mean a shallow clone.
impl Clone for Arrow {
fn clone(&self) -> Self {
self.shallow_clone()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8eeab8f: Would it make sense to introduce a ShallowClone trait with Clone as a subtrait? ShallowClone would be almost like a marker trait and it would provide the method shallow_clone which may call clone by default. If people see that both ShallowClone and Clone is implemented for a type, then they can infer that the type is cheaply cloneable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Promoted to #232. I like this idea but it's separate from this PR really.

cc https://smallcultfollowing.com/babysteps/blog/2024/06/21/claim-auto-and-otherwise/

src/types/context.rs Outdated Show resolved Hide resolved
src/types/context.rs Outdated Show resolved Hide resolved
uncomputable
uncomputable previously approved these changes Jul 4, 2024
Copy link
Collaborator

@uncomputable uncomputable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 3275c5b This PR was easier to review than expected. One change flowed into the other and most of the changes were "API only". I left some nits.

@uncomputable
Copy link
Collaborator

To summarize:

  • Allocating all types of a given program in a single slab allows us to differentiate them from types of a different program. The slab can be dropped when the program is dropped / finalized.
  • Doing the unification in LockedContext ensures we keep a write lock on the slab for the entire time of the operation. No other threads can write to the slab until we are finished.
    Is that correct?

@apoelstra
Copy link
Collaborator Author

Is that correct?

Correct on both counts.

apoelstra added 3 commits July 4, 2024 13:40
Once our BoundRefs start requiring an inference context to access their
data, we won't be able to call .set() and .get() on them individually.
Remove these methods, and instead add them on Context.

Doing this means that everywhere we currently call .get and .set, we
need a context available. To achieve this, we add the context to Type,
and swap the fmt::Debug/fmt::Display impls for Type and Bound so that
Type is the primary one (since it has the context object available).

The change to use BoundRef in the finalization code means we can change
our occurs-check from directly using Arc::<Bound>::as_ptr to using a
more "principled" OccursCheckId object yielded from the BoundRef. This
in turn means that we no longer need to use Arc<Bound> anywhere, and
can instead directly use Bound (which is cheap to clone and doesn't need
to be wrapped in an Arc, except when we are using Arc to obtain a
pointer-id for use in the occurs check).

Converting Arc<Bound> to Bound in turn lets us remove a bunch of
Arc::new and Arc::clone calls throughout.

Again, "API only", but there's a lot going on here.
Pulls the unify and bind methods out of Type and BoundMutex and
implement them on a new private LockedContext struct. This locks the
entire inference context for the duration of a bind or unify operation,
and because it only locks inside of non-recursive methods, it is
impossible to deadlock.

This is "API-only" in the sense that the actual type bounds continue to
be represented by free-floating Arcs, but it has a semantic change in
that binds and unifications now happen atomically (due to the
continuously held lock on the context) which fixes a likely class of
bugs wherein if you try to unify related variables from multiple threads
at once, the old code probably would due weird things, due to the very
local locking and total lack of other synchronization.

The next commit will finally delete BoundMutex, move the bounds into the
actual context object, and you will see the point of all these massive
code lifts :).
…xt slab

This completes the transition to using type contexts to keep track of
(and allocate/mass-deallocate) type bounds :). There are three major
improvements in this changeset:

* We no longer leak memory when infinite type bounds are constructed.
* It is no longer possible to create distinct programs where the
  variables are mixed up. (Ok, you can do this still, but you have
  to explicitly use the same type context for both programs, which
  is an obvious bug.)
* Unification and binding happen atomically, so if you are doing type
  inference across multiple threads, crosstalk won't happen between
  them.
@apoelstra
Copy link
Collaborator Author

Fixed both nits. Will leave the ShallowClone stuff to a later PR.

Copy link
Collaborator

@uncomputable uncomputable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c22dbaa

@uncomputable uncomputable merged commit 8c17b94 into master Jul 4, 2024
27 checks passed
@apoelstra apoelstra deleted the 2024-06--types-2 branch July 4, 2024 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants