Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve trace reconciliation performance #15725

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

bendikberg
Copy link

Purpose

When loading a file with a lot of bindings stored in CallSite.SingleRunTraceData, the trace reconciliation spends a lot of time reading nested data due to list allocations. And in general it does unecessary work copying lists and iterating through IEnumerables, for .Contains-testing and the like.

These changes have a large impact on the graph execution time for graphs with a lot of orphaned serializable data, as seen in these profiling results.

devenv_Hu6ocw92K8
devenv_qPyrZ0YqIQ

Declarations

Check these if you believe they are true

  • The codebase is in a better state after this PR
  • Is documented according to the standards
  • The level of testing this PR includes is appropriate
  • User facing strings, if any, are extracted into *.resx files
  • All tests pass using the self-service CI.
  • Snapshot of UI changes, if any.
  • Changes to the API follow Semantic Versioning and are documented in the API Changes document.
  • This PR modifies some build requirements and the readme is updated
  • This PR contains no files larger than 50 MB

Release Notes

N/A

Reviewers

(FILL ME IN) Reviewer 1 (If possible, assign the Reviewer for the PR)
@mjkkirschner

(FILL ME IN, optional) Any additional notes to reviewers or testers.

FYIs

(FILL ME IN, Optional) Names of anyone else you wish to be notified of
@dimven


var currentSerializables = traceData.SelectMany(td => td.RecursiveGetNestedData());
result.AddRange(beforeFirstRunSerializables.Where(hs => !currentSerializables.Contains(hs)).ToList());
var currentSerializables = traceData.SelectMany(td => td.RecursiveGetNestedData()).ToHashSet();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line with the added .ToHashSet() call and the line below are the likeliest culprits of the bad performance, as they would previously keep reloading the TraceData from the CallSite object for every .Contains test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I'm understanding that correctly, then it could be solved by actualizing the currentSerializables outside the deferred call - right?

Is the cost to create the hashset worth the lookup cost?

Copy link

github-actions bot commented Jan 2, 2025

UI Smoke Tests

Test: success. 11 passed, 0 failed.
TestComplete Test Result
Workflow Run: UI Smoke Tests
Check: UI Smoke Tests

@mjkkirschner mjkkirschner self-assigned this Jan 2, 2025
var orphanedSerializables = cs.GetOrphanedSerializables().ToList();
if (callsiteToOrphanMap.ContainsKey(cs.CallSiteID))
var orphanedSerializables = cs.GetOrphanedSerializables();
if (callsiteToOrphanMap.TryGetValue(cs.CallSiteID, out var serializablesForCallsite))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this change improve performance?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the immediate action after the ContainsKey is to mutate the value at that key, using TryGetValue avoids a redundant dictionary lookup

C# lint CA1854

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, seems like a pretty safe change, though under profiling does it make an actual difference here?

}
else
{
callsiteToOrphanMap.Add(cs.CallSiteID, orphanedSerializables);
callsiteToOrphanMap.Add(cs.CallSiteID, orphanedSerializables.ToList());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so orphanedSerializables is already a List<string> -

var result = new List<string>();
though it's declared here as an IList - I assume that this calls the List constructor passing in the existing list, have you verified there is any benefit to moving this ToList call around?

If it's really such a big performance impact you could probably change this data structure to be a dict of IList

@@ -897,7 +897,7 @@ internal IList<string> GetOrphanedSerializablesAndClearHistoricalTraceData()

if (Nodes.All(n => n.GUID != nodeGuid))
{
orphans.AddRange(nodeData.Value.SelectMany(CallSite.GetAllSerializablesFromSingleRunTraceData).ToList());
orphans.AddRange(nodeData.Value.SelectMany(CallSite.GetAllSerializablesFromSingleRunTraceData));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does calling add range immediately execute the deferred selectMany call?


foreach (var ws in Workspaces.OfType<HomeWorkspaceModel>())
foreach (var maybeWs in Workspaces)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is any value in iterating all the nodes of a custom node workspace... maybe you can say a bit more about why you need this other map to improve performance.

{
foreach (var node in maybeWs.Nodes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use braces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also consider that this could be a very large number of nodes.

if (!nodeToWorkspaceMap.TryGetValue(nodeGuid, out var nodeSpace))
continue;

//var nodeSpace =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??

@mjkkirschner
Copy link
Member

mjkkirschner commented Jan 7, 2025

hey @bendikberg I would like some more description in the PR about what your changes are intending and how you validated that they make a substantial improvement. Maybe not all of these changes are equally important.


if (!wsOrphans.Any())
continue;

if (!workspaceOrphanMap.ContainsKey(ws.Guid))
Copy link
Member

@mjkkirschner mjkkirschner Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the benefit of this change - and the downstream changes below.

ws =>
ws.Nodes.FirstOrDefault(n => n.GUID == nodeGuid)
!= null);
if (!nodeToWorkspaceMap.TryGetValue(nodeGuid, out var nodeSpace))
Copy link
Member

@mjkkirschner mjkkirschner Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... on first glance memoizing this lookup makes sense, but I'm unclear exactly how many times we recreate the map vs how fast we bail on first finding the node in the existing code.

Is the map recreated only once per workspace open? Does it need to be updated at some point?


// Add the node's orphaned serializables to the workspace
// orphan map.
if (workspaceOrphanMap.ContainsKey(nodeSpace.Guid))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question about this lookup change.


public void RecursiveFillWithNestedData(List<string> listToFill)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems this can be private or internal.


public void RecursiveFillWithNestedData(List<string> listToFill)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the intention of your change here?

@mjkkirschner
Copy link
Member

@dimven
Copy link
Contributor

dimven commented Jan 8, 2025

@mjkkirschner To add some more context: this is more relevant to D4R because that writes huge amounts of trace data tying in the dynamo wrapper elements to the matching Revit elements for continuous updates between Dynamo/Revit sessions. In cases where the graph stored a lot of references that are no longer valid, the first execution would be very slow. Look at the attached trace data images. The GetOrphanedSerializables method was taking 42% of the graph execution time and after the changes that dropped to an insignificant number.

I'm not familiar with Dynamo for Civil3d but if it tracks the elements in a similar way, then this would be relevant there as well.

Comment on lines -477 to +479
if (!beforeFirstRunSerializables.Any())
return result;
if (beforeFirstRunSerializables.Count == 0)
return new List<string>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this any more performant?

@bendikberg
Copy link
Author

bendikberg commented Jan 14, 2025

@mjkkirschner

hey @bendikberg I would like some more description in the PR about what your changes are intending and how you validated that they make a substantial improvement. Maybe not all of these changes are equally important.

I made this PR during the D4R performance profiling project. This code was originally written to be more of a research and test thing, and is in a pretty raw state. The reason for the raw state of the PR is that I assumed someone on the Dynamo team would just grab the parts of this PR they deemed relevant and implement those.

The thing I was trying to do was to figure out where and why the Callsite TraceData was slowing down the first run of a graph, so I touched on a few different files related to the TraceData handling.

Most of the .ToList and ContainsKey/TryGetValue-juggling can be left out of this PR or skipped entirely when they are compared to the performance gained from algorithmic optmizations. They don't cause the same type of exponentital slowdowns, but still do unecessary work. I made those changes since I figured as long as I was upgrading the TraceData performance on the whole, I would also include simple (but tiny) optimizations.

I made some benchmarks for the most impactful changes:


This gets slower as the number of TraceData items on a node increases (this can easily get very large for a Revit file).

RecursiveFillWithNestedData change

Tests for 0 sublevels, 0 children per item per level (total: 1)
        RecursiveFillManyLists         :   100 runs : mean:         13 us
        RecursiveFillOneList           :   100 runs : mean:          4 us
        NonRecursiveFill               :   100 runs : mean:          4 us
Tests for 1 sublevels, 100 children per item per level (total: 101)
        RecursiveFillManyLists         :   100 runs : mean:         32 us
        RecursiveFillOneList           :   100 runs : mean:         21 us
        NonRecursiveFill               :   100 runs : mean:         22 us
Tests for 100 sublevels, 1 children per item per level (total: 101)
        RecursiveFillManyLists         :   100 runs : mean:         48 us
        RecursiveFillOneList           :   100 runs : mean:         22 us
        NonRecursiveFill               :   100 runs : mean:         19 us
Tests for 1 sublevels, 10000 children per item per level (total: 10001)
        RecursiveFillManyLists         :   100 runs : mean:        656 us
        RecursiveFillOneList           :   100 runs : mean:        364 us
        NonRecursiveFill               :   100 runs : mean:        450 us
Tests for 1 sublevels, 100000 children per item per level (total: 100001)
        RecursiveFillManyLists         :   100 runs : mean:      15818 us
        RecursiveFillOneList           :   100 runs : mean:       2122 us
        NonRecursiveFill               :   100 runs : mean:       3612 us

This scales poorly with the number of nodes on a canvas.

Node memoization

Node lookup

Test input: Workspaces 1 with 10 nodes. Selected workspace: 1. Lookup: 1 node ids RandomFromWorkspace
        Lookup linq (current)          :   100 runs : mean:          2 us
        Lookup dict                    :   100 runs : mean:          3 us
Test input: Workspaces 1 with 10 nodes. Selected workspace: 1. Lookup: 5 node ids RandomFromWorkspace
        Lookup linq (current)          :   100 runs : mean:         12 us
        Lookup dict                    :   100 runs : mean:          7 us
Test input: Workspaces 1 with 100 nodes. Selected workspace: 1. Lookup: 100 node ids AllInWorkspace
        Lookup linq (current)          :   100 runs : mean:       1974 us
        Lookup dict                    :   100 runs : mean:         36 us
Test input: Workspaces 5 with 100 nodes. Selected workspace: 5. Lookup: 10 node ids RandomFromWorkspace
        Lookup linq (current)          :   100 runs : mean:        707 us
        Lookup dict                    :   100 runs : mean:         99 us
Test input: Workspaces 5 with 100 nodes. Selected workspace: 5. Lookup: 50 node ids RandomFromWorkspace
        Lookup linq (current)          :   100 runs : mean:       3382 us
        Lookup dict                    :   100 runs : mean:        108 us
Test input: Workspaces 5 with 100 nodes. Selected workspace: 5. Lookup: 1000 node ids NotInWorkspace
        Lookup linq (current)          :   100 runs : mean:      67577 us
        Lookup dict                    :   100 runs : mean:        143 us
Test input: Workspaces 1 with 1000 nodes. Selected workspace: 1. Lookup: 1000 node ids AllInWorkspace
        Lookup linq (current)          :   100 runs : mean:     136409 us
        Lookup dict                    :   100 runs : mean:        221 us
Test input: Workspaces 1 with 1000 nodes. Selected workspace: 1. Lookup: 500 node ids RandomFromWorkspace
        Lookup linq (current)          :   100 runs : mean:      67649 us
        Lookup dict                    :   100 runs : mean:        218 us
Test input: Workspaces 1 with 1000 nodes. Selected workspace: 1. Lookup: 1000 node ids NotInWorkspace
        Lookup linq (current)          :   100 runs : mean:     130951 us
        Lookup dict                    :   100 runs : mean:        250 us

The main culprit

Tests for 10 callsites, each with 10 tracedata (datatype System.Guid) (Orphaned tracedata: 500)
        Lookup linq (current)          :    10 runs : mean:      26486 us
        Lookup dict (fill many lists)  :    50 runs : mean:        146 us
        Lookup dict (fill one list)    :    50 runs : mean:        157 us
Tests for 10 callsites, each with 1000 tracedata (datatype System.Guid) (Orphaned tracedata: 500)
        Lookup linq (current)          :    10 runs : mean:     225777 us
        Lookup dict (fill many lists)  :    50 runs : mean:       1642 us
        Lookup dict (fill one list)    :    50 runs : mean:       1227 us
Tests for 10 callsites, each with 10000 tracedata (datatype System.Guid) (Orphaned tracedata: 500)
        Lookup linq (current)          :     1 run        :    7132931 us
        Lookup dict (fill many lists)  :    50 runs : mean:      23625 us
        Lookup dict (fill one list)    :    50 runs : mean:      10033 us
Tests for 10 callsites, each with 10 tracedata (datatype System.Guid) (Orphaned tracedata: 50000)
        Lookup linq (current)          :    10 runs : mean:     262322 us
        Lookup dict (fill many lists)  :    50 runs : mean:       1413 us
        Lookup dict (fill one list)    :    50 runs : mean:       1303 us
Tests for 10 callsites, each with 1000 tracedata (datatype System.Guid) (Orphaned tracedata: 50000)
        Lookup linq (current)          :    10 runs : mean:   20235897 us
        Lookup dict (fill many lists)  :    50 runs : mean:       2595 us
        Lookup dict (fill one list)    :    50 runs : mean:       2028 us
Tests for 10 callsites, each with 10000 tracedata (datatype System.Guid) (Orphaned tracedata: 50000)
        Lookup linq (current)          :     1 run        :  526709143 us
        Lookup dict (fill many lists)  :    50 runs : mean:      24024 us
        Lookup dict (fill one list)    :    50 runs : mean:      11245 us

Benchmark source

For small input sizes it doesn't add up to many microseconds, but it scales very poorly with larger input, which is bad since users might save whatever data in their files, which will make Dynamo hang for seconds or minutes.


TL;DR:

This PR can be reduced to only include the algorithmic improvements that I've shown in the benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants