Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance: parallelize diffgraph application #288

Merged
merged 15 commits into from
Jan 27, 2025
Merged

Conversation

bbrehm
Copy link
Contributor

@bbrehm bbrehm commented Jan 9, 2025

Can you @mpollmeier test that on the same thing you use to evaluate ArrayList vs ArrayBuffer?
Can you test this on a big cpg creation?

(the issue with cpg creation is temp memory during diffgraph application. If we parallelize, then more stuff is alive at the same time. I also had to remove some early clearing of memory, because that would require more synchronization to figure out when an object is truly dead)

@mpollmeier
Copy link
Contributor

thank you, will do 👍🏻

@bbrehm
Copy link
Contributor Author

bbrehm commented Jan 9, 2025

Ok, it wasn't so bad to restore the early cleanup logic (so that the GC can reclaim structures that are not needed anymore, even while the diffgraph application is still running).

Main things to consider are:

  1. The atomic refcount rigamarole
  2. The Callable closure is kept alive by the java.util.concurrent.ForkJoinTask.AdaptedCallable after it finished, until we do submissions.clear(). So we need to be careful about what we capture in the closure!

@mpollmeier
Copy link
Contributor

import timing results: #286 (comment)

@mpollmeier mpollmeier self-requested a review January 10, 2025 12:36
@mpollmeier
Copy link
Contributor

Won't merge for now, because the performance gains currently don't justify the added complexity. But this might come in handy in future, so let's keep the PR open.

Context:
#286

README.md Show resolved Hide resolved
Comment on lines 17 to 18
java.lang.Thread.currentThread() match {
case fjt: concurrent.ForkJoinWorkerThread => fjt.getPool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not import the classes rather than the concurrent namespace?
that would make this more readably IMO, especially given the mix of 'current and 'concurrent'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the current, good catch, that looked ugly.

Otherwise, I guess that's a matter of personal coding style -- I prefer if names are at least somewhat qualified, such that the import lists are small and mostly the same across all files, and the qualified names are mostly the same across all files. Like the ubiquituous import scala.collection.mutable, or import io.shiftleft.codepropertygraph.generated.nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal style is fine to a degree, but in general it's best to stick to community guidelines, so that more than just one person has a good chance of understanding the code. Scala's guidelines on that front is essentially "import the class unless there's a good reason not to".

import scala.collection.mutable has a good reason, because it deviates from the norm, namely not being immutable.

Our import io.shiftleft.codepropertygraph.generated.nodes is a historical decision that we changed some time ago - in joern it's not used any longer.

core/src/main/scala/flatgraph/Misc.scala Outdated Show resolved Hide resolved
@mpollmeier mpollmeier self-requested a review January 22, 2025 07:59
core/src/main/scala/flatgraph/Misc.scala Outdated Show resolved Hide resolved
@bbrehm
Copy link
Contributor Author

bbrehm commented Jan 27, 2025

@mpollmeier @ml86 ready to merge?

For my test-CPG, I observe: 1100 ms load / 2200 ms store with this PR, compared to master 2600 load / 4100 store.

If we decide against this, then the Zstd directBuffer thing must be backported in some way -- the long JNI critical section in zstd-ni was an inacceptable bug (and the only reason it didn't bite us so far was that the surrounding code was single-threaded and we already moved from G1 parallel garbage collector for many testsets).

Using a custom replacement for mutable.LinkedHashMap for the stringpool / deduplication brings the store down to 1700 ms. I'll put that into a separate PR that we can then decide to merge or not to merge.

@ml86 ml86 self-requested a review January 27, 2025 11:33
@mpollmeier
Copy link
Contributor

will re-review in a bit

Copy link
Contributor

@mpollmeier mpollmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

(note that I just pushed two minor code formatting changes)

@bbrehm bbrehm merged commit edabc69 into master Jan 27, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants