Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add string literals as primitives to improve string compilation speed #112

Open
ollef opened this issue Jun 14, 2018 · 4 comments
Open

Add string literals as primitives to improve string compilation speed #112

ollef opened this issue Jun 14, 2018 · 4 comments

Comments

@ollef
Copy link
Owner

ollef commented Jun 14, 2018

On my machine

x = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

takes almost 8 seconds to compile. This obviously won't do!

@ollef
Copy link
Owner Author

ollef commented Jun 20, 2018

The problem here is that the vectors that strings desugar to contain implicit type arguments of size on the order of the length of the string for each element in the vector, meaning that the term's size is quadratic in the length of the string.

The commit 2be98a0 improves the example down to about half a second, because it allows the big type arguments to be constant folded after type-checking, which in turn means that LLVM is no longer fed thousands of lines of pointless type calculations that it's slow (but good) at optimising.

We still have quadratic terms during typechecking though, so maybe we should just add strings as a primitive literal instead of desugaring them to arrays/vectors.

@ollef ollef changed the title Long(ish) strings are too slow to compile Add string literals as primitives to improve string compilation speed Jun 20, 2018
@ollef
Copy link
Owner Author

ollef commented Jun 29, 2018

We might be able to use sharing to fix the quadratic size of the terms by maximally sharing the length parameters, which would take them down to linear size again.

@AnthonyJacob
Copy link

You might be interested in Quantitative type theory.

McBride [ 25 ] has recently proposed a resolution to this conflict by combining the work on erasability and quantitative types. His insight is to use the 0 of the semiring to represent information that is erased at runtime, but is still available for use in types (i.e., extensionally). ... In this paper, we fix and extend McBride’s system, and present semantic interpretations that fully exploit the usage information.

@ollef
Copy link
Owner Author

ollef commented Dec 1, 2018

Thanks, that's indeed generally relevant research to Sixten.

To clarify, the runtime representation isn't the problem in this issue, and I don't think erasure would help. Strings are compiled to flat arrays of characters, as you'd want.

IIRC what's currently taking the most time is that we compute the type of every subterm during desugaring. This is done so that the code generator has easy access to the runtime representation (size in memory) of any subterm. This is quadratic for size-indexed structures like vectors, even though it's simplified away to constant integers in the end. These types are not generally erasable since Sixten uses them at runtime, though I think we could be more clever about trying to not evaluate them as often as we sometimes do. Perhaps we could e.g. find a scheme that only computes the size of the whole vector and not every subterm as we do now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants