Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IR, bytecode->IR compiler, and optimizations #15

Merged
merged 68 commits into from
Jun 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
e86615f
add `ir.type` and start CFG.
Jakobeha Feb 13, 2024
35e8694
add `ir.cfg` and PIR instructions
Jakobeha Mar 26, 2024
7a0c027
update to Java 22, update dependencies
Jakobeha Apr 16, 2024
a58c0dd
add `parseprint`
Jakobeha Apr 16, 2024
af99d63
add PIR CFG parsing and printing w/tests
Jakobeha Apr 18, 2024
af89a0d
fix various IntelliJ/spotless warnings
Jakobeha May 9, 2024
5e6981b
bugfixes, mainly CFGEdit and parsing/printing
Jakobeha May 9, 2024
d16b4d0
merge RValue and Env + fix/disable remaining testPirIsParseableAndPri…
Jakobeha May 10, 2024
c0a2105
add compiler package
Jakobeha May 13, 2024
5884ca9
change `bc` package-info
Jakobeha May 13, 2024
ef06197
fixed `bc-compiler` rebase issues
Jakobeha May 23, 2024
865b49e
refactor + grow CFG API, fix verification and phis
Jakobeha May 13, 2024
b03161c
re-enable `CompilerTest`s which were previously failing
Jakobeha May 25, 2024
7590d25
add CFG tests to git
Jakobeha May 25, 2024
308e86a
skip some CFG tests when `FAST_TESTS` is set
Jakobeha May 25, 2024
e390a3a
fix commit hook staging entire partial commit
Jakobeha May 25, 2024
4d19dd4
update Java and R versions on GitHub actions
Jakobeha May 25, 2024
c6b4d8b
fix rebase
Jakobeha May 27, 2024
6589ab7
starting bc2ir
Jakobeha May 27, 2024
a02d0da
improve and fix tests, particularly wrt. GNU-R version
Jakobeha Jun 6, 2024
9dce9ca
finish draft closure compiler
Jakobeha Jun 7, 2024
55cf8a9
improve errors when starting GNU-R
Jakobeha Jun 13, 2024
287440d
parse and print closures and SEXPs
Jakobeha Jun 14, 2024
fdb2298
add raw SEXPs
Jakobeha Jun 14, 2024
d3ca1a5
fix IntelliJ warnings
Jakobeha Jun 14, 2024
0935e8b
bugfixes in parsing and printing and BC->IR
Jakobeha Jun 15, 2024
5301615
refactor `parseprint`
Jakobeha Jun 19, 2024
18dc586
test re-parsing and re-printing closures
Jakobeha Jun 19, 2024
bbc7e0d
fix IR, BC->IR, parsing and printing, and more
Jakobeha Jun 19, 2024
77cee88
refactor BB and node IDs *again*
Jakobeha Jun 22, 2024
e5f78c0
fixed printing NA vector elements
Jakobeha Jun 22, 2024
a8b2a81
fixed parsing and printing null record elements
Jakobeha Jun 22, 2024
01f3fd1
fixed compiling instructions after returns
Jakobeha Jun 22, 2024
8aa0f2a
don't report unsupported bytecode tests as failures
Jakobeha Jun 22, 2024
8165a6b
fix for-loop, fix/improve cleanup, and more
Jakobeha Jun 22, 2024
77c9b58
various fixes
Jakobeha Jun 24, 2024
cf8671f
compile for loop fixes
Jakobeha Jun 24, 2024
1d36318
fix parsing escaped unicode
Jakobeha Jun 24, 2024
7c4fb55
fix printing escaped unicode in names (R symbols)
Jakobeha Jun 24, 2024
8ab6311
fix `replaceInArgs`
Jakobeha Jun 24, 2024
6bec490
don't print base env parent, because it's always empty
Jakobeha Jun 24, 2024
84549c4
fix small typo
Jakobeha Jun 24, 2024
86fa5a3
fix parsing and printing `null`
Jakobeha Jun 24, 2024
94a1db5
fix parsing environments referenced in ancestors
Jakobeha Jun 24, 2024
0a2df2a
implement `switch`
Jakobeha Jun 24, 2024
92c0bb6
fix more bugs
Jakobeha Jun 24, 2024
34bba8c
fail verification on unset phi inputs
Jakobeha Jun 25, 2024
98cbe2d
fix branch->goto cleanup unsetting phi inputs
Jakobeha Jun 25, 2024
e0e6eb3
fix `CFGVerify` incorrect use-before-def on auxiliary node
Jakobeha Jun 25, 2024
508f239
add missing IR, fix builtin calls, `TryDispatchBuiltin`
Jakobeha Jun 25, 2024
1c52c70
add `ir2c` package-info
Jakobeha Jun 25, 2024
b61a270
fix `CFGPirSerialize` for changed `LdVar`
Jakobeha Jun 25, 2024
4cc15e6
dedup bc-compiler and closure-IR-compiler tests
Jakobeha Jun 25, 2024
77e5c00
resolve all `pmd` violations
Jakobeha Jun 25, 2024
ce4d572
update pmd
Jakobeha Jun 25, 2024
6c8c808
suppress unnecessary CPD violation
Jakobeha Jun 25, 2024
90de942
fix verify + add some methods
Jakobeha Jun 26, 2024
b62d314
skeleton + initial optimizations
Jakobeha Jun 27, 2024
714cb88
fix javadoc and add javadoc testing to `mvn verify`
Jakobeha Jun 27, 2024
f00eb06
filter stdlib closure tests when `FAST_TESTS` is set
Jakobeha Jun 27, 2024
410f325
draft LICM + `BB#move` + bugfixes + improvements
Jakobeha Jun 28, 2024
aef6668
extend BB move functionality + improve CFG edits
Jakobeha Jun 28, 2024
6445a08
fix `CFG` not invalidating analyses caches
Jakobeha Jun 28, 2024
f8f4335
slightly update `docs` documentation
Jakobeha Jun 28, 2024
c39e5c7
comment that `isCoherent_Lattice` and `isCoherent_RType` occasionally…
Jakobeha Jun 28, 2024
f300707
simplify `BaseGuard`
Jakobeha Jun 28, 2024
e5a0de6
add names to compiled loads
Jakobeha Jun 28, 2024
5e2fb28
fix parsing name and quoted bugs
Jakobeha Jun 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
1 change: 0 additions & 1 deletion .githooks/pre-commit.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ if [ -n "$no_partial_stages" ]; then
echo "Reformatted a partially-staged file. Re-interactively-stage and commit again."
exit 1
}
git add -u
elif [ -n "$everything_staged" ]; then
# Format only staged changes. We must re-add them because the formats aren't committed.
mvn spotless:apply "$format_only_staged"
Expand Down
16 changes: 8 additions & 8 deletions .github/workflows/maven.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ jobs:

steps:
- uses: actions/checkout@v3
- name: Set up JDK 21
- name: Set up JDK 22
uses: actions/setup-java@v3
with:
java-version: "21"
java-version: "22"
distribution: "temurin"
cache: maven
- name: Set up GNU-R
Expand All @@ -47,10 +47,10 @@ jobs:

steps:
- uses: actions/checkout@v3
- name: Set up JDK 21
- name: Set up JDK 22
uses: actions/setup-java@v3
with:
java-version: "21"
java-version: "22"
distribution: "temurin"
cache: maven
- name: Set up GNU-R
Expand All @@ -65,10 +65,10 @@ jobs:

steps:
- uses: actions/checkout@v3
- name: Set up JDK 21
- name: Set up JDK 22
uses: actions/setup-java@v3
with:
java-version: "21"
java-version: "22"
distribution: "temurin"
cache: maven
- name: Set up GNU-R
Expand All @@ -83,10 +83,10 @@ jobs:

steps:
- uses: actions/checkout@v3
- name: Set up JDK 21
- name: Set up JDK 22
uses: actions/setup-java@v3
with:
java-version: "21"
java-version: "22"
distribution: "temurin"
cache: maven
- name: Set up GNU-R
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
**/src/test/snapshots/**/*.fail*
!src/test/resources/**/*.log

### Build ###
target/
Expand Down
2 changes: 1 addition & 1 deletion .idea/codeStyles/codeStyleConfig.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

191 changes: 191 additions & 0 deletions .idea/inspectionProfiles/Extra_Strict.xml

Large diffs are not rendered by default.

19 changes: 19 additions & 0 deletions .idea/inspectionProfiles/Project_Default.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

26 changes: 26 additions & 0 deletions .idea/inspectionProfiles/profiles_settings.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 9 additions & 2 deletions .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file added .jqwik-database
Binary file not shown.
9 changes: 6 additions & 3 deletions .pmd-rules.xml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ under the License.
<ruleset name="Default Maven PMD Plugin Ruleset"
xmlns="http://pmd.sourceforge.net/ruleset/2.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://pmd.sourceforge.net/ruleset/2.0.0 http://pmd.sourceforge.net/ruleset_2_0_0.xsd">
xsi:schemaLocation="http://pmd.sourceforge.net/ruleset/2.0.0 https://pmd.sourceforge.io/ruleset_2_0_0.xsd">

<description>
This is the default pmd ruleset with the following changes:
Expand All @@ -40,12 +40,15 @@ under the License.
<rule ref="category/java/bestpractices.xml/UnusedFormalParameter" />
<rule ref="category/java/bestpractices.xml/UnusedLocalVariable" />
<rule ref="category/java/bestpractices.xml/UnusedPrivateField" />
<rule ref="category/java/bestpractices.xml/UnusedPrivateMethod" />
<rule ref="category/java/bestpractices.xml/UnusedPrivateMethod">
<properties>
<property name="ignoredAnnotations" value="org.prlprg.parseprint.ParseMethod,org.prlprg.parseprint.PrintMethod" />
</properties>
</rule>

<rule ref="category/java/codestyle.xml/EmptyControlStatement" />
<rule ref="category/java/codestyle.xml/ExtendsObject" />
<rule ref="category/java/codestyle.xml/ForLoopShouldBeWhileLoop" />
<rule ref="category/java/codestyle.xml/TooManyStaticImports" />
<rule ref="category/java/codestyle.xml/UnnecessaryFullyQualifiedName" />
<rule ref="category/java/codestyle.xml/UnnecessaryModifier" />
<rule ref="category/java/codestyle.xml/UnnecessaryReturn" />
Expand Down
3 changes: 3 additions & 0 deletions doc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Misc documentation

See the Javadoc for better documentation. It can be viewed in IntelliJ or generated with `mvn javadoc:javadoc` (in `site/apidocs/index.html`). In the future, this documentation will probably be moved into Javadoc.
39 changes: 39 additions & 0 deletions doc/ir.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Intermediate Representation

Some properties:

- SSA form.
- Instructions can return multiple values.

## Mutability

`CFG` and `BB`, and `InstrOrPhi`s are mutable by the user. `Instr`s store their content in `InstrData` which is an immutable record for easy construction and pattern matching; the instruction itself must be mutable so its data can be replaced without affecting other instructions' data (which would recursively affect other instructions and so on, making constructing a cyclic dependency impossible besides having ridiculous time complexity).

## Package structure

`BB` has to access internal `CFG` functions to update exits, track/untrack nodes, and split/merge without adding recording operations (because we record the split or merge entirely, we don't want to record adding or removing the BB which is part of it). `BB` also has to access an internal function in `InstrData` to create instructions, so unfortunately we also must put `Instr` in the same package. From there, we put `Node` in the same package too because why not (instructions comprise the majority of nodes, and other nodes are simply auxillary so their constructor must be internal and visible to the instruction).

A lot of other classes are in the `cfg` package.

`CFG` and `BB` are contain all compound operations, because something like `bb.inline(...)` is a lot nicer to use than `BBUtils.inline(BB, ...)` or `BBInliner.inline(BB, ...)`. However, all these methods implemented inline would create a very large file which is bad for the IDE, so they are instead implemented in package-private interfaces (e.g. `CFGCleanup`, `BBInline`) that `CFG` and `BB` implement. This also ensures that the more complex operations don't use internal `BB` methods or methods that they shouldn't need, enforcing looser coupling.

## Terminology

**Replace vs subst**: if an instruction is "replaced", it means occurrences aren't replaced (other instructions that refer to it aren't updated to refer to the instruction that replaced it). If an instruction is "substituted", it means they are.

## Time complexity

Various ways to improve efficiency:

- When possible, substitution is implemented by mutating the instruction (chaging it's internal data), so we don't have to find and mutate the instructions that reference it, which is expensive. However, this isn't possible when subsititing an instruction with an existing instruction, a new instruction which has a different # of return values, or a new instruction whose superclass is a different one of `Phi | Statement | Jump`. Since these cases are expensive, we delay and batch substitute them in `BatchSubst`. The time complexity of batching many of these substitutions is `O(#instructions)`, as opposed to `O(#instructions * #toSubst)` if we substituted immediately.

- We also batch inserting or removing multiple instructions when possible, unless we're inserting or removing from the end of a basic block. Basic blocks store instructions in an array, so inserting or removing instructions from a basic block with a large list of them may be expensive.

General collection data structures:

- The `CFG` class contains a `Map` of basic block ids to the blocks it contains, a `Map` of node ids to nodes the blocks contain, and a `Set` of basic block exits.
- The `BB` class contains an auto-sorted `SequencedSet` of predecessors, a `SequencedSet` of phis, and an `ArrayList` of statements. Successors are stored inside of the jump, which is `@Nullable Jump`.
- The `Phi` class (and subclasses) contain an auto-sorted `SequencedSet` of inputs.
- Instruction data is stored in Java records. Node arguments and jump targets are memoized every time the instruction's data changes.

`BB` and `Node` store a pointer to their `CFG`, but nodes don't store a pointer to their `BB`, except jumps.
36 changes: 36 additions & 0 deletions doc/rtype.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# `RType` design and debug representation

**TODO:** This is not ideal and maybe we should change it once we actually use types. It seems that the types we need to represent in instructions aren't represented well in this implementation.

`RType` represents the type of SEXPs in the IR. An `RType` consists of an `RValueType`, `RPromiseType`, and `missing: Troolean`. `RValueType` is the type of the value ignoring whether or not it's in a (potentially-lazy) promise or missing, `RPromiseType` determines whether said value may be or is inside a promise and promise details (laziness, known exact promise), and `missing: Troolean` determines whether the value may be or is missing.

Nothing (`⊥`, subtype of everything else) consists of a null `RValueType`, `RPromiseType`, and `missing: Troolean`; in every other `RType`, `RPromiseType` and `missing: Troolean` are both non-null, and the type of the missing value is the only other type where `RValueType` is null Any (`⊤`, supertype of everything else) consists of an "any" `RGenericValueType` that encompasses "any" value that is a supertype of every other `RValueType`, a "maybe lazy, maybe promise-wrapped" `RPromiseType` that is also a supertype of every other `RPromiseType`, and `missing = MAYBE`.

## `RGenericValueType` and `BaseRType`

Each `RValueType` subclass represents a specialized type, such as a primitive function or vector, *except* `RGenericValueType` represents an "any-typed" value as well as other types such as S4 objects. `BaseRType` distinguishes the former "any-typed" `RGenericValueType`s (which can be supertypes of specific `RValueTypes`) from the latter "other-typed" ones (which cannot).

## Common `RType` properties

These are common properties in every `RValueType` which propagate to the `RType`.

- `base: BaseRType`: This is a simplified version of `RType` which only encodes some properties: the SEXPTYPE (or a subset of SEXPTYPEs for functions and non-promises); if the type is a vector, its element type (if static/known); and if the type is a promise, its inner type and whether it's maybe lazy.
- `exactValue: @Nullable SEXP`: Non-null iff this type has one static, known instance, it's this. For example, the type of an `LdConst` instruction. Of course, any `RType` that is a union of two or more `RValueType`s can't have an exact value.
- `attributes: AttributesType`: Refines the SEXP's attributes (associated key-value pairs) in the same way `RType` refines the SEXP itself. Some types of SEXPs don't have attributes, in which case this is always `AttributesTypes.NONE`. Other, object SEXPs are guaranteed to have `class` set to the object type.
- `referenceCount: MaybeNat`: If all instances have a known reference count, this may be set. TODO on its usefulness because there's no case where this is known so far; `MaybeNat` may change to something more specific (e.g. to determine the reference count is "less than X" without knowing it exactly, currently `MaybeNat` can't represent that).

## Specific `RValueType`s

These can be accessed via methods like `function` and `primVec` in `RType`, which return the `RValueType` casted to subclass iff it's an instance *and* the `RPromiseType` and `missing: Troolean` are `VALUE` and `NO` respectively (otherwise the projection methods return `null`).

### Functions

`RFunctionType` is an `RValueType` which contains a set of overloads for the function. Each overload has known parameter types, effects, and a return type. Each parameter has a name, optionality (a parameter is optional iff it has a default value), and required type.

To permit "generic" (parametric polymorphism), the effects and return type are functions of the supplied argument types, e.g. a function can be defined that returns the type of its first argument. However, the parameter types can't be functions of earlier supplied arguments, so a function that takes two arguments of any type, as long as they are the same type, can't be represented.

### Primitive vectors

`RPrimVectorType` encodes a "primitive vector", which is a vector of a number, logical, or string (every vector except generic or expression). It contains (each if static/known) the type of the vector's elements, length, whether the vector (if numeric) has any NA or NaN elements.

TODO
Loading
Loading