Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe memoization during parse graph flattening #2136

Open
wants to merge 18 commits into
base: feat/error-recovery
Choose a base branch
from

Conversation

PieterOlivier
Copy link
Contributor

@PieterOlivier PieterOlivier commented Jan 30, 2025

This PR improves the memoization strategy during parse graph flattening. The original proved both incorrect and too slow in the context of the highly ambiguous and cyclic parse forests produced by error recovery.

Special thanks to @arnoldlankamp who came up with the following three heuristics specifying when nodes can be safely cached for memoization:

  1. Any non-zero length node that is part of a link with a non-zero length prefix.
  2. Any node, which is a part of a prefix for which the above above holds true.
  3. Any nullable node that is part of a link for which all prefixes adhere to rule one at some point.

This PR implements a new caching strategy based on these heuristics.

We have gone through great lengths to test the correctness and performance of the code in this PR.
We already had a test suite that tested error recovery on all characters in all Rascal source files in the rascal repo using
two tests: delete the single character and delete all characters until the end-of-line.

We modified these tests to:

  • First try to do the error recovery parse without memoization during flattening
  • If this succeeded within two seconds, do the same with memoization
  • Compare the results

In all cases where the parse without memoization succeeded, the parse with memoization resulted in
exactly the same tree as the tree build without memoization. This in contrast with the old memoization approach where in the same test the result with memoization was often different from the result without memoization.

We have also setup a performance benchmark to compare the speed of "normal" parsing (no error recovery and no ambiguities)
of all Rascal source files in the rascal repo with the speed before this PR. We could not find any speed differences between
the two version, so we are confident this PR does not degrade performance of "normal" parses.

Copy link

codecov bot commented Jan 30, 2025

Codecov Report

Attention: Patch coverage is 47.84314% with 133 lines in your changes missing coverage. Please review.

Project coverage is 49%. Comparing base (8c405fc) to head (d647f70).
Report is 254 commits behind head on feat/error-recovery.

Files with missing lines Patch % Lines
src/org/rascalmpl/library/util/ErrorRecovery.java 0% 66 Missing ⚠️
...c/org/rascalmpl/parser/gtd/result/struct/Link.java 71% 11 Missing and 9 partials ⚠️
...rg/rascalmpl/parser/util/ParseStateVisualizer.java 0% 15 Missing ⚠️
src/org/rascalmpl/parser/util/DebugUtil.java 0% 14 Missing ⚠️
src/org/rascalmpl/util/visualize/dot/DotGraph.java 0% 9 Missing ⚠️
...pl/parser/gtd/result/out/DefaultNodeFlattener.java 90% 2 Missing and 1 partial ⚠️
...ascalmpl/parser/gtd/result/out/INodeFlattener.java 50% 1 Missing and 1 partial ⚠️
...ser/gtd/result/out/ListContainerNodeFlattener.java 91% 2 Missing ⚠️
...pl/parser/gtd/result/out/SkippedNodeFlattener.java 0% 0 Missing and 1 partial ⚠️
...ser/gtd/result/out/SortContainerNodeFlattener.java 92% 1 Missing ⚠️
Additional details and impacted files
@@                  Coverage Diff                   @@
##             feat/error-recovery   #2136    +/-   ##
======================================================
  Coverage                     49%     49%            
+ Complexity                  6619    6572    -47     
======================================================
  Files                        687     696     +9     
  Lines                      61218   61043   -175     
  Branches                    8874    8910    +36     
======================================================
+ Hits                       30369   30383    +14     
+ Misses                     28620   28410   -210     
- Partials                    2229    2250    +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DavyLandman
Copy link
Member

Hi @arnoldlankamp, your comments on #2100 really helped out. Do you have some time to take a look at this? Thanks.

ps. @PieterOlivier I think this means you should close #2100 ?

@PieterOlivier
Copy link
Contributor Author

PieterOlivier commented Jan 31, 2025

ps. @PieterOlivier I think this means you should close #2100 ?

Definitely, I have done so now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants