-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance: parallelize diffgraph application #288
Conversation
thank you, will do 👍🏻 |
Ok, it wasn't so bad to restore the early cleanup logic (so that the GC can reclaim structures that are not needed anymore, even while the diffgraph application is still running). Main things to consider are:
|
import timing results: #286 (comment) |
Won't merge for now, because the performance gains currently don't justify the added complexity. But this might come in handy in future, so let's keep the PR open. Context: |
java.lang.Thread.currentThread() match { | ||
case fjt: concurrent.ForkJoinWorkerThread => fjt.getPool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not import the classes rather than the concurrent
namespace?
that would make this more readably IMO, especially given the mix of 'current and 'concurrent'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the current
, good catch, that looked ugly.
Otherwise, I guess that's a matter of personal coding style -- I prefer if names are at least somewhat qualified, such that the import lists are small and mostly the same across all files, and the qualified names are mostly the same across all files. Like the ubiquituous import scala.collection.mutable
, or import io.shiftleft.codepropertygraph.generated.nodes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal style is fine to a degree, but in general it's best to stick to community guidelines, so that more than just one person has a good chance of understanding the code. Scala's guidelines on that front is essentially "import the class unless there's a good reason not to".
import scala.collection.mutable
has a good reason, because it deviates from the norm, namely not being immutable.
Our import io.shiftleft.codepropertygraph.generated.nodes
is a historical decision that we changed some time ago - in joern it's not used any longer.
…hutdown in converter
@mpollmeier @ml86 ready to merge? For my test-CPG, I observe: 1100 ms load / 2200 ms store with this PR, compared to master 2600 load / 4100 store. If we decide against this, then the Zstd directBuffer thing must be backported in some way -- the long JNI critical section in zstd-ni was an inacceptable bug (and the only reason it didn't bite us so far was that the surrounding code was single-threaded and we already moved from G1 parallel garbage collector for many testsets). Using a custom replacement for |
will re-review in a bit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(note that I just pushed two minor code formatting changes)
Can you @mpollmeier test that on the same thing you use to evaluate ArrayList vs ArrayBuffer?
Can you test this on a big cpg creation?
(the issue with cpg creation is temp memory during diffgraph application. If we parallelize, then more stuff is alive at the same time. I also had to remove some early clearing of memory, because that would require more synchronization to figure out when an object is truly dead)