Add string literals as primitives to improve string compilation speed #112

ollef · 2018-06-14T18:26:53Z

On my machine

x = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

takes almost 8 seconds to compile. This obviously won't do!

ollef · 2018-06-20T18:44:37Z

The problem here is that the vectors that strings desugar to contain implicit type arguments of size on the order of the length of the string for each element in the vector, meaning that the term's size is quadratic in the length of the string.

The commit 2be98a0 improves the example down to about half a second, because it allows the big type arguments to be constant folded after type-checking, which in turn means that LLVM is no longer fed thousands of lines of pointless type calculations that it's slow (but good) at optimising.

We still have quadratic terms during typechecking though, so maybe we should just add strings as a primitive literal instead of desugaring them to arrays/vectors.

ollef · 2018-06-29T15:35:25Z

We might be able to use sharing to fix the quadratic size of the terms by maximally sharing the length parameters, which would take them down to linear size again.

AnthonyJacob · 2018-12-01T08:49:30Z

You might be interested in Quantitative type theory.

McBride [ 25 ] has recently proposed a resolution to this conflict by combining the work on erasability and quantitative types. His insight is to use the 0 of the semiring to represent information that is erased at runtime, but is still available for use in types (i.e., extensionally). ... In this paper, we fix and extend McBride’s system, and present semantic interpretations that fully exploit the usage information.

ollef · 2018-12-01T12:08:59Z

Thanks, that's indeed generally relevant research to Sixten.

To clarify, the runtime representation isn't the problem in this issue, and I don't think erasure would help. Strings are compiled to flat arrays of characters, as you'd want.

IIRC what's currently taking the most time is that we compute the type of every subterm during desugaring. This is done so that the code generator has easy access to the runtime representation (size in memory) of any subterm. This is quadratic for size-indexed structures like vectors, even though it's simplified away to constant integers in the end. These types are not generally erasable since Sixten uses them at runtime, though I think we could be more clever about trying to not evaluate them as often as we sometimes do. Perhaps we could e.g. find a scheme that only computes the size of the whole vector and not every subterm as we do now.

ollef added Type: Bug Priority: High labels Jun 14, 2018

ollef added Type: Feature and removed Priority: High Type: Bug labels Jun 20, 2018

ollef changed the title ~~Long(ish) strings are too slow to compile~~ Add string literals as primitives to improve string compilation speed Jun 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add string literals as primitives to improve string compilation speed #112

Add string literals as primitives to improve string compilation speed #112

ollef commented Jun 14, 2018

ollef commented Jun 20, 2018

ollef commented Jun 29, 2018

AnthonyJacob commented Dec 1, 2018

ollef commented Dec 1, 2018 •

edited

Loading

Add string literals as primitives to improve string compilation speed #112

Add string literals as primitives to improve string compilation speed #112

Comments

ollef commented Jun 14, 2018

ollef commented Jun 20, 2018

ollef commented Jun 29, 2018

AnthonyJacob commented Dec 1, 2018

ollef commented Dec 1, 2018 • edited Loading

ollef commented Dec 1, 2018 •

edited

Loading