-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IR, bytecode->IR compiler, and optimizations #15
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@wip init package, CFG, BB, Node, Value, ValueType, Instr @wip refine @wip add instructions list initial value-type writeup @wip CFG, instrs, ValueType @wip draft RType implementation @wip drafting Stmts, Jumps, ... @wip more work... @wip minor fixes to the types and replace function @wip more BB and CFG work @wip try putting inheritance in types it's cleaner to the user (less parallel representation), but a lot urliger to implement...I probably need to see if there's a better way to implement without as much duplication, or if the lack of parallel representation is worth it @wip new `RType` design `RType` is a union of `RSexpType`s, which are for particular types and special values (currently functions, primitive vectors, the missing value, and everything else) @wip redo the type system @wip make it compile + remove most GNU-R bytecodes @wip type system improvements @wip arbitrary providers + bugfixes TODO add providers for function, primitive vector, and generic value types which aren't exact, and fix the tests @wip bugfixes @wip bugfixes main issues is that jqwik gives stack overflows trying to shrink the generated results. I still need to test non-trivial cases and ideally would like shrinking, but if I can't figure out why it doesn't work I'll have to disable it. @wip fixed jqwik generation, the issue now seems to be with function types @wip @wip dominator tree @wip simplify RType arbitrary @wip update notes @wip redo function types much better (untested) includes re-implementing `Rf_matchArgs_NR` (`match.args`?) @wip add `desc`s to instructions and fix to build comment code + test fixes to get this to build @wip fix function types and tests @wip really fix function types and tests Property tests run in OK time (<1min) and haven't gotten a failure yet. @wip document `RType` in notes How it works and explains some decisions. It may change a lot from here though. @wip various fixups @wip simplify RType based on feedback - `RType` is no longer a union, just has one `RValueType`. `RValueType`'s name is more accurate. - The missing type is better represented; it's orthogonal like `RPromiseType`, except that `isMissing = YES` implies `value = null`. This way, we can still represent "known type OR missing". - No more numeric primitive vector. I kept "numeric or logical" in case because binary operators support all of them, but not the non-numeric string and raw (comparison operators don't care). - Potential bugfixes
@wip CFG, BB, and Node... @wip CFG, BB, and Node (particularly BB)... @wip BB#inlineAt + bugfixes @wip starting CFGEdit @wip CFGEdit @wip remove useless CFGCommand and CFGAction @wip refactor @wip try to refactor into something sensible @wip `@TypeIs`, serialization and deserialization @wip serialize and deserialize somewhat from PIR @wip add more PIR instructions A lot of TODO design decisions, because idk how similar this will be to PIR @wip add remaining PIR instructions Still TODO how similar this will be to PIR, also a lot of unimplemented computeType and computeEffects, and maybe some unresolved compile-time errors
@wip start CFG parser and printer with the new API @wip draft closure and closure version @wip use better terminology idk if `Scanner` could be considered a lexer, but it's similar to `java.util.Scanner`. @wip more progress on parsing and printing CFGs @wip implement CFG parser and printer, enough to start writing tests. @wip wrote a test and started fixing bugs @wip begin writing a parser and printer for CFGs and BBs which is not PIR @wip improve typeclass map, parse exceptions, and CFG tests @wip "default" CFG parser and printer + bugfixes @wip parser and printer bugfixes @wip parsing and printing symbol/language parsing and printing + call instruction printing @wip parsing and printing - prints something reasonable The tests fail because right now I'm using IntelliJ's "click to see difference". There's a lot of lost information from the original, and stub data. There are also almost definitely a couple things which are being deserialized from the original or serialized into the reprint incorrectly. But it's only testing the CFG/parser/printer (so some stubs are OK) and it "mostly" works. Instructions from the original one-to-one map to those in the reprint, they line up too (BFS order is correct). Next will start testing CFG recording, writing more tests in general, and checking `mvn verify`. @wip most successfully parse and print, and the rest are infeasible for various reasons Now need to figure out edits, also other tests and stuff @wip small further improvements
@wip fix `scanToEndOfLine` bug @wip fix CFGEdit alised mutation bug @wip fix CFGEdit not storing NodeIds in InstrData and StmtData (+ another case) TODO fix global node IDs @wip progress towards ensuring global nodes can be recovered from their IDs (for CFGEdit) @wip fixed global nodes, testPirObserverCanRecreate passes 100%, except something fails to parse (different problem) @wip fixed small scanner bug testObserverCanRecreate now passes 100% @wip fix inverse, TODO true idempotency @wip fix idempotency and tests all CFG observer tests pass now @wip improve testPirIsParseableAndPrintableWithoutError the goal is to minimize failures and then simply ignore them, so we have a regression test to check that currently parseable PIR data stays parseable
…ntableWithoutError RValue and Env merged because sometimes they can't be statically told apart Also explicitly add `environment` `CallSafeBuiltin` The PIR parser/printer is a mess with many TODOs inserted throughout the code, so will rewrite them and maybe remove some functionality (causing more PIR code to fail to parse) when working on the next step: the R bytecode to IR compiler
to include that it includes the BC compiler
TODO - Fix CFG-edit bijectivity by adding phi nodes to the `InsertJump` edit - Fix other IR issues - Get R session to run on macOS and in the Github container - Refactor? - Start bytecode compiler - ...
# Summary - Add `BatchSubst` and `DefUseAnalysis`. - Allow mutating instructions and phis directly, not via a method on the basic block. - Clarify method names and docs, add new helpers. - Fix phi nodes, at least better than before. In particular, phis' inputs must exactly match the block's predecessors on creation, and are automatically added and removed when predecessors change; stubs are added for new predecessors, and one changes the phi input's node via `setInput`. - Fix some edits not being recorded as `CFGEdits`, or not being bijective. - Change PIR parse/print tests and improve PIR parsing/printing, so that all of them are final PIR (valid CFGs that pass `verify`; `PrintPirAfterOpt` gives CFGs with single-input phi nodes). All current tests pass except some `CompilerTest`s (of course not everything is tested) # More details (specific commits) @wip cleanup `ir` TODOs (features, bugfixes, and refactoring) @wip `BatchSubst`, `DefUseAnalysis`, properly record `InstrOrPhi#replace`, and add labels to compound operations (part of "cleanup `ir` TODOs"; features, bugfixes, and refactoring) @wip refactor to store `BB` in `ReplaceInArgs` edit @wip fix (maybe) phis @wip refactor instr mutation and substitution so it doesn't require BB This makes the API cleaner, since needing the BB seems "unnecessary" and I really doubt it helps time complexity. Also fix some bugs with predecessors/incoming BBs/jump targets not being updated properly. Need to fix DefUseAnalysis not catching all definitions... @wip phi fixes and add test to verify CFG fixed phi node and verify issue + other small improvements fixed verification and PIR parsing/printing replaced PIR tests with ones that are all final PIR, so that we can check verification works. This also caused new PIR-parse/print failures, most of which were because of weird PIR prints that had to be special-cased (not useful), a couple actual bugs in the parsing.
+ explain why the remaining 2 tests are still disabled
18 compiler tests fail because they produce different bytecode, no crashes and other tests pass fix rebase onto main
- Refactored the CFG api a bit more, in particular `builder` @wip continue compiler work (phi stack) and refactor CFG api (always compute phi id from input nodes, improve id syntax and API) - `name` no longer includes disambiguator (more consistent) - `RenameInstr` and `MutateInstrArgs` have been separated TODO: need to make phi IDs remain the same doing forward/reverse edits, and debug more now-failing CFG tests. @wip make edits store ids, cleanup edit API @wip fix and refactor node IDs, symbol parsing, and other things @wip finish fixing node IDs change SEXP builtins to use `BuiltinId`s instead of `String` names improve IR API and add some instructions compile trivial and some less-trivial bytecode instructions also figure out the issues and challenges in implementing the GNUR Bc->IR compile @wip bc->IR compiler - improve bc->IR boilerplate API - compile a few more bytecode instructions, including for loops - fixes idk how to compile complex assignment and dispatch functions... @wip fix CFG tests (bc->IR compiler)
add `VERBOSE` logging to tests because I can't see the GitHub actions output, it also makes tests slightly slower *maybe* fix LatticeTest rare failure expose environment variable for GNU-R binary fast fail compiler tests if the version is wrong revert GitHub actions to use the correct GNU-R version for tests
- (frame states and promises) + initial tests @wip draft implementation of bc->PIR-IR some things are probably not correct though, also a lot of specialized GNU-R bytecodes get converted into CallBuiltin because we don't have PIR instructions with the same specialization some things are also still not implemented (e.g. `MakeClosure`) @wip fix bugs and add documentation to the draft implementation @wip closure and closure version overhaul and begin their compiler also brainstorm high-level (how compilation/evaluation will work) @wip closure and closure version compiler + created `Module` WIP: - How promises will be compiled (and where they're needed). - Add new PIR instructions to implement missing functionality required to compile some CFG bytecodes. - (Maybe longer term) try to implement PushContext and PopContext because unless I'm mistaken they are created by GNU-R wherever there's a `next` or `break`, whereas RIR only needs them for niche complex cases. @wip improve `CFGTests` Report test differently if we failed to parse, but in an acceptable way, so it will be apparent (unfortunately only to the human) if a large amount of tests are failing this way (which is not acceptable). @wip further improve `CFGPirTests` and further fix `CFGEdit` @wip improve `CFGCompiler` - create a call stack instead of putting the function on the regular stack and having a call arguments stack (this may have fixed semantics) cleanup `Compiler` warnings + other small refactors @wip finish draft closure compiler (frame-states and promises) + initial tests
Necessary to inspect the tests, and I suspect some are only failing because printing isn't supported (but necessary to inspect even the passing ones to see if the output resembles something actually successful) @wip parse and print closures properly Draft attempt to properly parse and print the inner closures and promises after. @wip parse and print closures properly Parse and print promises. @wip parse and print closures properly Parse and print the inner closures and promises by forwarding the context in CFGs. @wip parse and print SEXPs (draft impl)
+ RDS reader tests don't throw `Exception`
- mainly in parsing and printing - also in BC->IR, implemented complex assignments @wip parse and print SEXP bugfixes @wip parse and print closure bugfixes @wip refactored/fixed parsing and printing inner code objects @wip fixed printing bytecode (code/const index formatting) @wip bugfixes both in the BC->IR compiler and in parsing/printing fixed node IDs parsed and printed outside of the CFG they were defined in. @wip further bugfixes + implemented complex assignment `inlineSlotAssign` prints something. @wip further bugfixes + improve parsing and printing
- Resolve parse and print methods that take a superclass of the context class. - Parse methods can be constructors
+ bugfixes + refactors
@wip no longer parse and print CFG next ID (internal detail that can be inferred good enough) @wip improve `Tests` resource path methods @wip fix PIR parse/print test to actually delete non-trivial files @wip fix `RealSXP#equals` @wip fix phi incoming BB and `SmallBinarySet` removal @wip bugfixes: `ClosureCompilerTests#functionInlining` passes @wip fixed a few more cases @wip lots more bug-fixes ...then I need to fix `CFG#cleanup`, so almost all of the `ClosureCompilerTests` are failing. But without cleanup and verification, all of them pass up to the test that verifies all functions in a few packages. @wip fix `DomTree` @wip fix `BB#filter` @wip fix `Scanner#read...EndOfLine` @wip fix cleanup trying to merge looping entry @wip fix interprocedural cleanup and verification @wip fix dumb `BB` error `removeAllPhis` untracking *all* its phis from the CFG, not the ones you actually removed. @wip more bugfixes @wip fix transitive substitution bug @wip fix `CFGPirSerialize` providing valid `CFG` specifically for inner closures and promises, which are stubs when they come from deserialized PIR, they need to have valid CFGs.
Now they won't have a disambiguator (number prefix) if it's unnecessary, and will get lower ones. TODO add a step in cleanup which re-assigns instructions IDs with the lowest disambiguators possible. Otherwise this is hopefully the last BB and node ID refactor. Also, all `ClosureCompilerTests` before `switch` up to builtins pass now, and all other tests pass.
`andThem` => `andThen`
+ allow setting static environment parents after initialization (necessary to implement this) Every BC->IR test passes except those with `switch` instructions. But is the IR produced correct?
+ bugfixes
All tests "pass" except there are unset phi inputs, so I have to figure out why...
now all BC->IR tests pass, except we don't support dots, and haven't tested the code. Also, I need to fix `CallBuiltin` so that there are "safe" and "dispatch" variants, and the dispatch variants (at least in some cases) `eval` the AST instead of using arguments.
+ refactor test class hierarchy in general
- Should pass `mvn verify` and therefore CI. - had to slightly refactor `PirId.GlobalLogical` constructor due to a bug in the parser.
`Map...View`s are practically all duplicate and very straightforward. We could technically refactor to save a few lines by converting field references to protected instance methods, but it seems unnecessary
including `Optimizer`, tests, and "optimization passes" that call `cleanup`, `verify`, and `compute...Properties`.
It still passes if there are missing Javadocs (not an issue). It fails if there are javadocs that are clearly broken (e.g. missing reference, unclosed or unknown HTML delimiter).
0 tests fail. But the optimization mutates 0 CFGs (almost certainly we can never disprove taint), so it's only testing that the analyses don't crash.
+ improve `CFGEdit`
`docs` is sparse because documentation is mostly in Javadocs anyways.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.