Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop a more principled type inference algorithm #3601

Closed
Scott-Guest opened this issue Aug 24, 2023 · 8 comments
Closed

Develop a more principled type inference algorithm #3601

Scott-Guest opened this issue Aug 24, 2023 · 8 comments
Assignees

Comments

@Scott-Guest
Copy link
Contributor

In order to improve performance and eventually add support for more powerful types, we should develop a robust type inference algorithm and only rely on the current Z3 implementation as a fallback.

@Scott-Guest Scott-Guest self-assigned this Aug 24, 2023
@Scott-Guest
Copy link
Contributor Author

Scott-Guest commented Aug 24, 2023

Ideally, we want to support:

  • subtyping
  • parametric sorts
  • parametric rules

@Scott-Guest
Copy link
Contributor Author

Scott-Guest commented Aug 24, 2023

The SimpleSub algorithm provides some nice inspiration:

In summary, this algorithm is quite similar to the usual HM-style Algorithm W, but tracking subtyping constraints rather than performing unification. The results are then encoded using union and intersection types, roughly,

  • encoding "any type which is a subtype of X" as α ∧ X
  • encoding "any type which is a supertype of X" as α ∨ X

followed by a few simplifications passes to produce compact types.

We can use this to infer a general solution for the sorts, then can find all the maximal models (in the sense of the current type inference algorithm) by enumerating the monomorphizations and interpreting ∧/∨ as meet/join in the sort poset.

@Scott-Guest
Copy link
Contributor Author

Scott-Guest commented Sep 11, 2023

Translation from Sort Inference to SimpleSub

Disregarding ambiguities, we can apply the SimpleSub algorithm to our problem as follows:

  1. Create an initial typing context Γ by letting P : ∀α1. ... ∀αN. S1 -> ... -> SK -> S for each Production with parameters α1, ..., αN, sort S, and non-terminals of sorts S1, ..., SK. Here, P is just some fresh identifier we associate to the Production.
  2. For T a given Term, construct a corresponding SimpleSub-like term λx1. ... λxN. T, where x1, ..., xN are the variables occurring in the rule by treatingProductionReferences as function applications of the corresponding identifier we mention above.
  3. Infer the type of this SimpleSub-like term in the context Γ. The resulting type will be of the form ∀α1. ... ∀αK. S1 -> ... -> SN -> S. In some sense, this tells us that x1 : S1, ..., xN : SN and the rule itself has sort S.
  • More explicitly, if K = 0 and the type is fully monomorphic, this gives use the exact sorts of the variables. Otherwise, this give us the general form of all solutions, and we can find the all maximal solutions by enumerating the monomorphizations.

This translation is just conceptual - the actual implementation can work with the existing Term data structures as-is.

To extend this to handle ambiguities, we begin by performing type inference on each unambiguous section of the parse-forest, but parametric over any contained ambiguities. Specifically,

  1. Split the provided Term into numerous sub-trees by cutting it at each Ambiguity. That is, each sub-tree will have a root which is either the top-most Term or some sub-Term whose parent was originally an Ambiguity node, and the branches of this sub-tree only extend until the first encountered child Ambiguity node.
  2. For each of these sub-trees, construct a corresponding SimpleSub-term as described before, but replace each leaf Ambiguity node with a term amb x1 ... xN where amb is a fresh-variable and x1, ..., xN are again all the variables we are attempting to infer. Abstract this SimpleSub-term over amb, i.e., prepend it with λamb. ...
  3. For each sub-tree, the inferred type will now be of the form ∀α1. ... ∀αK. (T1 -> ... -> TN -> T) -> .... -> S1 -> ... -> SN -> S, with an inferred argument type T1 -> .... TN -> T for each ambiguity. Here, T will tell us the expected type of the term in the ambiguity

Once we have this local information for each unambiguous section, it in effect shows us all the typing-constraints that a particular section of the parse-forest places on the variables. We can then use this to prune the parse-forest by subsituting in the inferred types of ambiguities, forming a set of all possibilities at each ambiguity point.

@Scott-Guest
Copy link
Contributor Author

Parametric Sorts with Non-Structural Subtyping

Consider types as finite trees built up from a finite number of type constructors. There are two types of subtyping orders you can have over such types:

  • structural subtyping, where subtyping relationships can only occur between types with the same constructor and arity.
  • non-structural subtyping, where subtyping relationships can occur even when the types may not have the same constructor or arity.

On first glance, I believed this algorithm would support parametric sorts quite easily. However, it in fact only does so in a limited context, namely

  • structural subtyping where each parameter is explicitly specified as either covariant, contravariant, or invariant.
  • a limited form of non-structural subtyping where constructors much match but arity may not (e.g. extensible records or variants)

K, on the other hand, supports completely arbitrary subsorting regardless of constructor or arity. As far as I'm aware, there is no standard algorithm which can actually handle this in full generality.

As a fallback then, if we eventually want to fully support parametric sorts, we'll likely still need to rely on a constraint solver when we run into these sort of constraints. I'll try to design things though so that we can still handle most cases directly with the inference algorithm and only pass off some small required formulas to Z3.

@radumereuta
Copy link
Contributor

When this is concluded we should close this issue: #3551

@radumereuta
Copy link
Contributor

Performance issues with the C-semantics
run with --profile-rule-parsing <file>

@Scott-Guest
Copy link
Contributor Author

#3673's description includes a more flushed out explanation of the proposal above

@Baltoli
Copy link
Contributor

Baltoli commented Oct 26, 2023

It seems like the "fall back to Z3" strategy we discussed previously will be viable; @Scott-Guest will look into this once there is an end-to-end implementation for the partial deterministic inferencer.

@radumereuta mentions that we should keep consistency in mind as well as correctness - @Scott-Guest has a script to make sure that the inference results are the same as in Z3; if results are different it needs to be checked why that's the case. Z3 might not be correct!

rv-jenkins pushed a commit that referenced this issue Nov 15, 2023
…ms (#3814)

Previously, when parsing programs, the subsort declarations `syntax
KItem ::= Sort` were only added to the parsing module. However, sort
inference is run in the disambiguation module, so they should be
included there as well.

This is necessary for #3601.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants