-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
types: use slab allocator for type bounds #231
Conversation
This looks like a purely noise-increasing change, but when we introduce type inference contexts, we will need the auxiliary type to hold an inference context (which will be unused except for sanity-checking that users are being consistent with their inference contexts).
This is a super noisy commit, but all it does is introduce the src/types/context.rs module with a dummy Context type, and make all the node construction APIs take this and make copies of handles of it. The next commit will actually _use_ the context in type inference. This one can therefore be mostly skimmed. The maybe-interesting parts are the conversion methods, which can be found by searching the diff for calls to `from_inner`.
This is again an API-only change.
This is again an "API only" change. Also fixes a couple code-cleanliness things in the benches/ crate.
The sum and product constructors don't obviously need to be passed a type context -- after all, the two child types already have a type context embedded in them. However, it turns out in practice we have a context available every single time we call these methods, and it's a little bit difficult to pull the context out of Types, since Types only contain a weak pointer to the context object, and in theory the context might have been dropped, leading to a panic. Since the user already has a context handle, just make them pass it in. Then we know it exists.
Our union-bound algorithm uses some features of Arc (notably: cheap clones and pointer equality) and therefore holds an Arc<T>. But in a later commit we will want to replace the Arcs with a different reference type, and to do so, we need to make this code more generic.
…ion-bound This introduces the BoundRef type, an opaque reference type which currently just holds an Arc<BoundMutex>, but which in a later commit will be tied to the type inference context and refer to data within the context. It currently preserves the get() and set() methods on BoundRef, which come from BoundMutex, but these will need to be replaced in a later commit, since eventually BoundRef by itself will not contain enough information to update its data. This is essentially an API-only change though it does move some data structures around.
// Having `Clone` makes it easier to derive Clone on structures | ||
// that contain Arrow, even though it is potentially confusing | ||
// to use `.clone` to mean a shallow clone. | ||
impl Clone for Arrow { | ||
fn clone(&self) -> Self { | ||
self.shallow_clone() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8eeab8f: Would it make sense to introduce a ShallowClone
trait with Clone
as a subtrait? ShallowClone
would be almost like a marker trait and it would provide the method shallow_clone
which may call clone
by default. If people see that both ShallowClone
and Clone
is implemented for a type, then they can infer that the type is cheaply cloneable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Promoted to #232. I like this idea but it's separate from this PR really.
cc https://smallcultfollowing.com/babysteps/blog/2024/06/21/claim-auto-and-otherwise/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 3275c5b This PR was easier to review than expected. One change flowed into the other and most of the changes were "API only". I left some nits.
To summarize:
|
Correct on both counts. |
Once our BoundRefs start requiring an inference context to access their data, we won't be able to call .set() and .get() on them individually. Remove these methods, and instead add them on Context. Doing this means that everywhere we currently call .get and .set, we need a context available. To achieve this, we add the context to Type, and swap the fmt::Debug/fmt::Display impls for Type and Bound so that Type is the primary one (since it has the context object available). The change to use BoundRef in the finalization code means we can change our occurs-check from directly using Arc::<Bound>::as_ptr to using a more "principled" OccursCheckId object yielded from the BoundRef. This in turn means that we no longer need to use Arc<Bound> anywhere, and can instead directly use Bound (which is cheap to clone and doesn't need to be wrapped in an Arc, except when we are using Arc to obtain a pointer-id for use in the occurs check). Converting Arc<Bound> to Bound in turn lets us remove a bunch of Arc::new and Arc::clone calls throughout. Again, "API only", but there's a lot going on here.
Pulls the unify and bind methods out of Type and BoundMutex and implement them on a new private LockedContext struct. This locks the entire inference context for the duration of a bind or unify operation, and because it only locks inside of non-recursive methods, it is impossible to deadlock. This is "API-only" in the sense that the actual type bounds continue to be represented by free-floating Arcs, but it has a semantic change in that binds and unifications now happen atomically (due to the continuously held lock on the context) which fixes a likely class of bugs wherein if you try to unify related variables from multiple threads at once, the old code probably would due weird things, due to the very local locking and total lack of other synchronization. The next commit will finally delete BoundMutex, move the bounds into the actual context object, and you will see the point of all these massive code lifts :).
…xt slab This completes the transition to using type contexts to keep track of (and allocate/mass-deallocate) type bounds :). There are three major improvements in this changeset: * We no longer leak memory when infinite type bounds are constructed. * It is no longer possible to create distinct programs where the variables are mixed up. (Ok, you can do this still, but you have to explicitly use the same type context for both programs, which is an obvious bug.) * Unification and binding happen atomically, so if you are doing type inference across multiple threads, crosstalk won't happen between them.
3275c5b
to
c22dbaa
Compare
Fixed both nits. Will leave the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK c22dbaa
Our existing type inference engine assumes a "global" set of type bounds, which has two bad consequences: one is that if you are constructing multiple programs, there is no way to "firewall" their type bounds so that you cannot accidentally combine type variables from one program with type variables from another. You just need to be careful. The other consequence is that if you construct infinitely sized types, which are represented as a reference cycle, the existing inference engine will leak memory.
To fix this, we need to stop allocating type bounds using untethered
Arc
s and instead use a slab allocator, which allows all bounds to be dropped at once, regardless of their circularity. This should also improve memory locality and our speed, as well as reducing the total amount of locking and potential mutex contention if type inference is done in a multithreaded context.This is a 2000-line diff but the vast majority of the changes are "API-only" stuff where I was moving types around and threading new parameters through dozens or hundreds of call sites. I did my best to break everything up into commits such that the big-diff commits don't do much of anything and the real changes happen in the small-diff ones to make review easier.
By itself, this PR does not fix the issue of reference cycles, because it includes an
Arc<Context>
inside the recursiveType
type itself. Future PRs will:bind
andunify
calls, so that these all happen atomically, including all recursive calls.Type
which eliminates theArc<Context>
and its potential for circular references. Along the way, make theBound
type private, which is not really used outside of the types module anyway.Node
should not be recursive #229 "for free" without us figuring out a non-recursiveDrop
impl forArc<Node<N>>
.