Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial stab at documenting the kind system #3559

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
335 changes: 335 additions & 0 deletions jane/doc/extensions/kinds.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,335 @@
# The kind system

Oxcaml includes a system of _kinds_, which classify types. Roughly, the kind
system adds an extra level to the type system: terms are classified by types,
and types are classified by kinds. Kinds capture properties of types along
several different dimensions. For example, kinds can be used to identify which
types have values that are passed in floating point registers, or are safely
ignored by the garbage collector.

The kind of a type has three components: the _layout_, the _modal bounds_, and
the _non-modal bounds_. The layout describes the shape of the data at runtime,
and is used to support unboxed types. The modal bounds describe how different
types interact with our mode system. In particular, some types don't have
interesting interactions with some modes, so values of these types can safely
ignore those modal axes. The non-modal bounds capture a grab-bag of other
properties.

This page describes the kind system at a high level, and contains complete
details for the non-modal bounds. It does not exhaustively describe the possible
layouts (which are documented on the [unboxed types
page](unboxed-types/index.md)) or the modal axes (which are documented on the
[modes page]()), but does explain how those components appear in kinds.

CR ccasinghino: add links to modes documentation after moving it here.

# The basic structure of kinds

Basic kinds have the form:

```
<layout> mod <bounds>
```

For example, the type `(int -> int) option` has the kind `value mod
contended`. Here, the layout `value` indicates that members of this type have
the shape of normal OCaml data, and the bound `contended` indicates that they
can safely ignore the "contention" access (it is safe to treat them as if they
were created by the current thread, even if they were not).

The "bounds" portion of the kind often has multiple components, and includes
both modal and non-modal bounds. For example, the type `int -> int` has a kind
with two modal bounds, `value mod aliased contended`. The type `int` has all
the bounds:

```
value mod external non_null global contended portable aliased many
```

This kind indicates that `int` mode crosses on all five of our modal axes. The
non-modal bounds `external` and `non_null` capture two other properties of
`int`s: `external` describes types that can safely be ignored by the OCaml
garbage collector, and `non_null` describes types that do not have `NULL` as a
valid member. These non-modal axes are described in more detail below.

# The meaning of kinds

In additional to `value`, Oxcaml supports layouts like `float64` (unboxed
floating point numbers that are passed in int registers), `bits64` and `bits32`
(for types represented by unboxed/untagged integers) and product layouts like
`float64 & bits32` (an unboxed pair that is passed in two registers). More
detail on layouts and the unboxed types language feature can be found
[here](unboxed-types/index.md).

Modal bounds all correspond to modal axes, which are described in more detail in
the [modes documentation](). The logic for which types can cross on which axes is
specific to each axis, often involving both the semantic meaning of the mode and
details of the implementation of related features in the OCaml runtime. See the
documentation for each mode to understand which types cross on its axis.

Formally, these are called modal _bounds_ because the represent upper or lower
bounds on the appropriate modal axes. For _future_ modal axes (like portability
and linearity), the kind records an upper bound on the mode of values of this
type. For example, `int` is `mod portable` because if you have an `int` that is
`non-portable`, it's safe to treat it as `portable`. For _past_ modal axes
(like contention and uniqueness), the kind records a lower bound on
_expectations_. For example, `int` is `mod contended` because in a place where
an `uncontended` value is expected, it's still safe to use a `contended` int.

Why do past and future modal axes get different treatment in kinds? This is
covered in the "Advanced Topics" section below, but isn't essential to
understand for day-to-day use of the system.

# Subkinding

There is a partial order on kinds, which we'll write `k1 <= k2`. The
relationship `k1 <= k2` holds when `k2` tells us less about a type than `k1`.
Thus, it is always safe to use a type of kind `k1` where a type of kind `k2` is
expected. There is a maximum mode, written `any` (which, somewhat confusingly,
is also the name of the maximum layout).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But layouts don't really have names any more. Only kinds do.


As an example, `value mod portable <= value`. This means that if we know a type
is represented as a normal OCaml value and mode crosses on the portability axis,
it's safe to use the type where we just need a regular OCaml type but don't care
how it interacts with portability. This relation forms a partial order because
some kinds are unrelated, like `float64 mod contended` and `bits64 mod
contended`, or `value mod portable` and `value mod aliased`.

If you want to get nerdy about it, each individual piece of a kind (the layout
and each possible axis of bounds) is a join semilattice, and the order we're
talking about here is the one corresponding to the product join semilattice that
combines all these components. The order for each component is described in the
documentation for that component.

# Kinds in the syntax

You can write kind annotations in several places. In type declarations, they can appear
both on the type being declared itself, and on the type parameters. E.g.,:
```ocaml
type ('a : <kind1>) t : <kind2>
```
Kind annotations can also appear anywhere a type variable or underscore is legal
in a type expression, as in:
```ocaml
let foo (x : ('a : value mod portable)) (y : (_ : bits32 & float64)) = ...
```
And on locally abstract types:
```ocaml
let foo (type (a : value mod portable) (b : bits32 & float64)) (x : a) (y : b)
= ...
```

It's legal to omit the `mod ...` portion of the kind when you only want to
specify the layout (as we did in a couple of these examples). The bounds of
unspecified axes are the maximum on those axes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is subtle. When considered as a mode, aliased is the maximum on the uniqueness axis. But when considered as a modal bound, unique is the maximum: it mode-crosses the least. This means that value mod aliased <= value mod unique (because the latter is identically value), even though unique <= aliased.

This is sad and confusing, yet true of all past axes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a paragraph explaining what "maximum" means here - let me know if you think that covers it.


Note that the meaning of a kind annotation on a type declaration differs
depending on whether the type declaration is abstract. For an abstract type declaration
like `type t : value mod portable`, the kind annotation will be used as the
exact kind of the type `t`.

For a type declaration that has a RHS, the kind annotation is just a check. For
example, consider this type declaration:
```ocaml
type t : value = (int -> int) option
```
This declaration is allowed because `(int -> int) option` does have the kind
value (due to subkinding, see the previous section). But the type system treats
`t` and `(int -> int) option` as complete equal types, and therefore still knows
the more precise kind `value mod contended` for `t` despite the annotation.

On the other hand, the following declaration is rejected because the kind of
`(int -> int) option` is not less than or equal to `value mod portable`:
```
type t : value mod portable = (int -> int) option
```

## Kind abbreviations

Kinds can get long, so we have built-in abbreviations for several common kinds,
and there is a mechanism to define aliases of your own. The builtin
abbreviations include kinds like "immediate", which is secretly shorthand for
the kind we showed for `int` above. Here is a list of the built-in abbreviations:

| Alias | Meaning |
|------------------|------------------------------------------------------------------------|
| `immediate` | `value mod external non_null global contended portable aliased many` |
| `immediate64` | `value mod external64 non_null global contended portable aliased many` |
| `immutable_data` | `value mod contended portable many non_null` |
| `mutable_data` | `value mod portable many non_null` |

It's allowed to extend an alias with an additional bounds. For example,
`mutable_data mod contended` is equivalent `immutable_data`.

User-defined kind abbreviations may be added with `kind_abbrev_`, which is legal
in structures and signatures. For example, we could have this alias:
```ocaml
kind_abbrev_ portable_value_pair = (value & value) mod portable
```

# Inclusion and variance

This is accepted:

```ocaml
module M1 : sig
type t : value
end = struct
type t = int
end
```

This makes sense because the kind of `int` is `immediate`, which is a subkind
of `value`. Even though users of `M1.t` will be expecting a `value`, the `immediate`
they get works great. Thus, the kinds of type declarations are *covariant* in the module
inclusion check: a module type `S1` is included in `S2` when the kind of a type `t`
in `S1` is included in the kind of `t` in `S2`.

Similarly, this is accepted:

```ocaml
module M2 : sig
type ('a : immediate) t
end = struct
type ('a : value) t
end
```

This makes sense because users of `M2.t` are required to supply an `immediate`; even
though the definition of `M2.t` expects a `value`, the `immediate` it gets works great.
Thus, the kinds of type declaration arguments are *contravariant* in the module
inclusion check: a module type `S1` is included in `S2` when the kind of the argument
to a type `t` in `S2` is included in the kind of that argument in `S1`.

Contravariance in type arguments allows us to have

```ocaml
module Array : sig
type ('a : any) t = 'a array
(* ... *)
end
```

and still pass `Array` to functors expecting a `type 'a t`, which assumes `('a :
value)`.

CR ccasinghino: I just stole the above from the layouts docs, verbatim, except
changing the word layout to kind. Probably we should delete it from there and
point here?

Relatedly, a `with type` constraint on a signature can fill in a type with one
that has a more specific kind. E.g., this is legal:
```ocaml
module type S_any = sig
type t : any
end

module type S_imm = S_any with type t = int
```

This can be particularly useful for common signatures that might be implemented
by types with any kind (e.g., `Sexpable`).

# With kinds

Sometimes the kind of a type constructor depends on the kinds of the types that
instantiate its parameters. For example, the type `'a list` can mode cross on
the portability axis if `'a` does.

We could have a `list` whose kind is restricted to work on types that more cross
on the portability axis, to record this fact, as in:
```ocaml
type ('a : value mod portable) portable_list : value mod portable
```
But it would be annoying to have many different list types, and we certainly
don't want to restrict the normal `list` type to work only on a subset of
`value`s.

The solution to this problem is "with kinds": kinds that record dependencies on
types. The actual kind of list in our.
```ocaml
type 'a list : value mod non_null (contended portable many with 'a)
```

CR ccasinghino: I'm giving up on explaining with-kinds here for now. I think
there is no way to write the kind of list in the current system? (The above is
not legal syntax)

# Non-modal kind axes

## Externality

The externality axis records whether all a type's values may safely be ignored
by the GC. This may be because they are OCaml "immediates" (values represented
by a tagged integer), because they are unboxed types like `float#` or `int32#`,
or because they are allocated elsewhere.

The axis has two possible values, with `external < internal`. Here, `external`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually has 3 possible values: external_ < external64 < internal, so that immediate64 can be mod external64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️

I added a note about external64 and jsoo/wasm - you may want to check its accuracy.

means that all values of the type are safely ignored by the GC, and internal
means values of the type may need to be scanned.

The compiler uses the externality axis for certain runtime optimizations. In
particular, updating a mutable reference to a type that is `external` can skip
the write barrier (i.e., it does not need a call to `caml_modify`).

In the future, we plan to make externality a mode, rather than just a property
of types.

## Nullability

The nullability axis records whether `NULL` (the machine word 0) is a possible
value of a type, and is used to support the non-allocating option `'a or_null`
type. The axis has two possible values, with `non_null < maybe_null`.

A type may be `non_null` if none of its value values are `NULL`. Such types are
compatible with `or_null`, whose definition is:
```ocaml
type ('a : value mod non_null) or_null : value mod maybe_null =
| Null
| This of 'a
```

# Advanced topics

## Why are past and future modal bounds different?

Historically, we thought of modal bounds in kinds as being upper bounds on the
mode of values of the type. This turns out to be the correct meaning only for
the future axes. For past axes, modal bounds in kinds are instead _lower bounds_
on _expected modes_.

This distinction only matters for modal axes with at least three values.
Consider the contention axis, where `uncontended < shared < contended`. What
does a kind like `value mod shared` mean?

Our answer is that it's the kind of a type like:
```ocaml
type 'a t = { shared : 'a @@ shared } [@@unboxed]
```
That is, the meaning of `value mod shared` is related to types using `shared` as
a modality.

The type `t` is a "box" containing something we know is shared. How do you use
this type? In short, this is a box that knows that its contents are `shared`,
even when the box itself is `uncontended`. If we have a `t @ uncontended`, the
mode of `t.shared` is `shared`. But if we have a `t @ contended`, the mode of
`t.shared` must still be `contended` - this `t` may have come from another
thread, and for thread safety its contents must now be treated as contended
regardless of their mode when they went into the box.

So, for this type, we don't care about the difference between `uncontended` and
`shared`, but we do care about the difference between `shared` and `contended`.
If we thought of `mod shared` as representing a upper bound on the mode, that
would suggest we can treat `contended` things as `shared`, which we've just seen
is wrong.

Instead, for past axes, a modal bound is _lower bound_ on expectations: `mod
shared` means that even if you're expecting to get an `uncontended` thing of
this type, a `shared` thing is just as good.

## Kind inference

CR ccasinghino: I'm imagining the "with kinds" section above to be a practical
introduction to why we have with kinds and how to write them. It probably should
not cover the details of how we compute the kind of type. Is it worth explaining
that down here?
Comment on lines +358 to +361
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in great detail. But I do think there should be some commentary mentioning at least these highlights:

  • The default kind for type t is value
  • The kind for an extensible type is value
  • The kind for a boxed variant with no fields is immediate
  • The kind for a boxed record or variant with no mutable fields (but at least one field) is immutable_data with <<all the fields>>
  • The kind for a boxed record or variant with at least one mutable field is mutable_data with <<all the fields>>
  • The kind for an unboxed record is <<product of the fields' layouts>> mod <<everything>> with <<all the fields>>
  • The kind for an [@@unboxed] record or variant is <<layout of the payload>> mod <<everything>> with <<the field>> (by further special decree, this with does take nullability into account!)
  • The kind for an unboxed tuple is <<layout of the components>> mod <<everything>> with <<the components>>
  • The kind for a closed polymorphic variant with no fields is immediate
  • The kind of a function is value mod aliased contended
  • The kind for other types is value
  • Kinds for other types may improve over time. For example, we expect to change the kind of a boxed tuple to be immutable_data with <<the components>>. Similarly, we expect to improve on the kinds of polymorphic variants and first-class modules.
  • The kind of a private abbreviation is the kind of its right-hand side. We expect to allow users to give a private abbreviation a superkind of its right-hand side (thus improving abstraction) soon.

A full specification of how with-bounds interact with subsumption is probably beyond the scope of this document. I can produce some hand-wavy text about how we try our best here if you like.

Loading