types: use slab allocator for type bounds #231

apoelstra · 2024-07-03T21:10:48Z

Our existing type inference engine assumes a "global" set of type bounds, which has two bad consequences: one is that if you are constructing multiple programs, there is no way to "firewall" their type bounds so that you cannot accidentally combine type variables from one program with type variables from another. You just need to be careful. The other consequence is that if you construct infinitely sized types, which are represented as a reference cycle, the existing inference engine will leak memory.

To fix this, we need to stop allocating type bounds using untethered Arcs and instead use a slab allocator, which allows all bounds to be dropped at once, regardless of their circularity. This should also improve memory locality and our speed, as well as reducing the total amount of locking and potential mutex contention if type inference is done in a multithreaded context.

This is a 2000-line diff but the vast majority of the changes are "API-only" stuff where I was moving types around and threading new parameters through dozens or hundreds of call sites. I did my best to break everything up into commits such that the big-diff commits don't do much of anything and the real changes happen in the small-diff ones to make review easier.

By itself, this PR does not fix the issue of reference cycles, because it includes an Arc<Context> inside the recursive Type type itself. Future PRs will:

Take a single mutex lock during calls to the top-level bind and unify calls, so that these all happen atomically, including all recursive calls.
Add another intermediate type under Type which eliminates the Arc<Context> and its potential for circular references. Along the way, make the Bound type private, which is not really used outside of the types module anyway.
Do "checkpointing" during type inference that makes node construction atomic; this is Failed type inference should not pollute type bounds #226 which is not fixed by this PR.
(Maybe) move node allocation into the type inference context so that nodes can be slab-allocated as well, which will address Drop on Node should not be recursive #229 "for free" without us figuring out a non-recursive Drop impl for Arc<Node<N>>.

This looks like a purely noise-increasing change, but when we introduce type inference contexts, we will need the auxiliary type to hold an inference context (which will be unused except for sanity-checking that users are being consistent with their inference contexts).

This is a super noisy commit, but all it does is introduce the src/types/context.rs module with a dummy Context type, and make all the node construction APIs take this and make copies of handles of it. The next commit will actually _use_ the context in type inference. This one can therefore be mostly skimmed. The maybe-interesting parts are the conversion methods, which can be found by searching the diff for calls to `from_inner`.

This is again an API-only change.

This is again an "API only" change. Also fixes a couple code-cleanliness things in the benches/ crate.

The sum and product constructors don't obviously need to be passed a type context -- after all, the two child types already have a type context embedded in them. However, it turns out in practice we have a context available every single time we call these methods, and it's a little bit difficult to pull the context out of Types, since Types only contain a weak pointer to the context object, and in theory the context might have been dropped, leading to a panic. Since the user already has a context handle, just make them pass it in. Then we know it exists.

Our union-bound algorithm uses some features of Arc (notably: cheap clones and pointer equality) and therefore holds an Arc<T>. But in a later commit we will want to replace the Arcs with a different reference type, and to do so, we need to make this code more generic.

…ion-bound This introduces the BoundRef type, an opaque reference type which currently just holds an Arc<BoundMutex>, but which in a later commit will be tied to the type inference context and refer to data within the context. It currently preserves the get() and set() methods on BoundRef, which come from BoundMutex, but these will need to be replaced in a later commit, since eventually BoundRef by itself will not contain enough information to update its data. This is essentially an API-only change though it does move some data structures around.

uncomputable · 2024-07-04T08:14:52Z

src/types/arrow.rs

+// Having `Clone` makes it easier to derive Clone on structures
+// that contain Arrow, even though it is potentially confusing
+// to use `.clone` to mean a shallow clone.
+impl Clone for Arrow {
+    fn clone(&self) -> Self {
+        self.shallow_clone()
+    }


8eeab8f: Would it make sense to introduce a ShallowClone trait with Clone as a subtrait? ShallowClone would be almost like a marker trait and it would provide the method shallow_clone which may call clone by default. If people see that both ShallowClone and Clone is implemented for a type, then they can infer that the type is cheaply cloneable

Promoted to #232. I like this idea but it's separate from this PR really.

cc https://smallcultfollowing.com/babysteps/blog/2024/06/21/claim-auto-and-otherwise/

src/types/context.rs

uncomputable

ACK 3275c5b This PR was easier to review than expected. One change flowed into the other and most of the changes were "API only". I left some nits.

uncomputable · 2024-07-04T09:10:19Z

To summarize:

Allocating all types of a given program in a single slab allows us to differentiate them from types of a different program. The slab can be dropped when the program is dropped / finalized.
Doing the unification in LockedContext ensures we keep a write lock on the slab for the entire time of the operation. No other threads can write to the slab until we are finished.
Is that correct?

apoelstra · 2024-07-04T13:39:34Z

Is that correct?

Correct on both counts.

Once our BoundRefs start requiring an inference context to access their data, we won't be able to call .set() and .get() on them individually. Remove these methods, and instead add them on Context. Doing this means that everywhere we currently call .get and .set, we need a context available. To achieve this, we add the context to Type, and swap the fmt::Debug/fmt::Display impls for Type and Bound so that Type is the primary one (since it has the context object available). The change to use BoundRef in the finalization code means we can change our occurs-check from directly using Arc::<Bound>::as_ptr to using a more "principled" OccursCheckId object yielded from the BoundRef. This in turn means that we no longer need to use Arc<Bound> anywhere, and can instead directly use Bound (which is cheap to clone and doesn't need to be wrapped in an Arc, except when we are using Arc to obtain a pointer-id for use in the occurs check). Converting Arc<Bound> to Bound in turn lets us remove a bunch of Arc::new and Arc::clone calls throughout. Again, "API only", but there's a lot going on here.

Pulls the unify and bind methods out of Type and BoundMutex and implement them on a new private LockedContext struct. This locks the entire inference context for the duration of a bind or unify operation, and because it only locks inside of non-recursive methods, it is impossible to deadlock. This is "API-only" in the sense that the actual type bounds continue to be represented by free-floating Arcs, but it has a semantic change in that binds and unifications now happen atomically (due to the continuously held lock on the context) which fixes a likely class of bugs wherein if you try to unify related variables from multiple threads at once, the old code probably would due weird things, due to the very local locking and total lack of other synchronization. The next commit will finally delete BoundMutex, move the bounds into the actual context object, and you will see the point of all these massive code lifts :).

…xt slab This completes the transition to using type contexts to keep track of (and allocate/mass-deallocate) type bounds :). There are three major improvements in this changeset: * We no longer leak memory when infinite type bounds are constructed. * It is no longer possible to create distinct programs where the variables are mixed up. (Ok, you can do this still, but you have to explicitly use the same type context for both programs, which is an obvious bug.) * Unification and binding happen atomically, so if you are doing type inference across multiple threads, crosstalk won't happen between them.

apoelstra · 2024-07-04T13:40:52Z

Fixed both nits. Will leave the ShallowClone stuff to a later PR.

uncomputable

ACK c22dbaa

apoelstra added 7 commits July 3, 2024 20:01

types: make bind and unify go through Context

8e08900

This is again an API-only change.

types: add &Context to type constructors

65b35a9

This is again an "API only" change. Also fixes a couple code-cleanliness things in the benches/ crate.

uncomputable reviewed Jul 4, 2024

View reviewed changes

src/types/context.rs Outdated Show resolved Hide resolved

uncomputable reviewed Jul 4, 2024

View reviewed changes

src/types/context.rs Outdated Show resolved Hide resolved

uncomputable previously approved these changes Jul 4, 2024

View reviewed changes

apoelstra added 3 commits July 4, 2024 13:40

apoelstra dismissed uncomputable’s stale review via c22dbaa July 4, 2024 13:40

apoelstra force-pushed the 2024-06--types-2 branch from 3275c5b to c22dbaa Compare July 4, 2024 13:40

uncomputable approved these changes Jul 4, 2024

View reviewed changes

uncomputable merged commit 8c17b94 into master Jul 4, 2024
27 checks passed

apoelstra deleted the 2024-06--types-2 branch July 4, 2024 13:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

types: use slab allocator for type bounds #231

types: use slab allocator for type bounds #231

apoelstra commented Jul 3, 2024

uncomputable Jul 4, 2024

apoelstra Jul 4, 2024

uncomputable left a comment •

edited

Loading

uncomputable commented Jul 4, 2024

apoelstra commented Jul 4, 2024

apoelstra commented Jul 4, 2024

uncomputable left a comment

types: use slab allocator for type bounds #231

types: use slab allocator for type bounds #231

Conversation

apoelstra commented Jul 3, 2024

uncomputable Jul 4, 2024

Choose a reason for hiding this comment

apoelstra Jul 4, 2024

Choose a reason for hiding this comment

uncomputable left a comment • edited Loading

Choose a reason for hiding this comment

uncomputable commented Jul 4, 2024

apoelstra commented Jul 4, 2024

apoelstra commented Jul 4, 2024

uncomputable left a comment

Choose a reason for hiding this comment

uncomputable left a comment •

edited

Loading