Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance variability boxplots #12

Open
wants to merge 49 commits into
base: develop
Choose a base branch
from

Conversation

jarusified
Copy link
Contributor

This PR migrates the performance variability boxplots from CallFlow into Hatchet's jupyter notebook interface through the roundtrip. Merging of this PR is blocked by [todo]

The following have been added to the hatchet's codebase.

  1. VIS code to render boxplots. Refer hatchet/external/roundtrip/boxplot.js.
  2. D3 utility code to support rendering operations through an SVG element. These utilities generalize several rendering functions to the jupyter element. Refer hatchet/external/roundtrip/lib/d3_utils.js.
  3. A sample notebook showcasing the functionality. Refer docs/examples/tutorial/performance_boxplot.ipynb. For a 2-min demo, see video.
  4. Improvements to roundtrip interface to generalize the %loadVisualization and %fetchData APIs for other any visualization. Refer hatchet/external/roundtrip/roundtrip.py.

User interface

# Load roundtrip
%load_ext roundtrip  

# Boxplot data computation 
bp = BoxPlot(tgt_gf=gf, bkg_gf=None, callsites=callsites, metrics=["time"])

# Dump the data as JSON.
boxplot = bp.unpack() 

# Load visualization for boxplots. 
%loadVisualization roundtrip_path "boxplot" boxplot_data

# Fetch data from Visualization.
%fetchData "boxplot" variance_df

# The data in the vis is dumped as a csv text. (columns separated by `,` and rows separated by `;`. 
# This data can be converted to a Pandas dataframe using the below code. 
columns = variance_df.split(';')[0].split(',')
data = [x.split(',') for x in variance_df.split(';')[1:]]
df = pd.DataFrame(data, columns=columns).set_index('name')

Schema for boxplot_data:

{
  "callsite1" : {
      "tgt" : {
          "metric1": {
              "min": number,
              "max": number,
              "mean": number,
              "imb": number,
              "kurt": number,
              "skew": number,
              "q": [q0, q1, q2, q3, q4],
              "outliers: {
                  "values": array,
                  "keys": array
              }
          },
          "metric2": {
              ...
          }
      },
      "bkg": {
          // Refer "tgt" key.
      }
  }, 
  { 
     "callsite2": {
      ...
      }
  }
}

@jarusified jarusified added area-visualization Issues and PRs involving any of the Hatchet provided visualizations priority-urgent Urgent priority issues and PRs status-revisions-needed Revisions have been requested from a reviewer for this PR labels Feb 3, 2022
@slabasan slabasan force-pushed the develop branch 17 times, most recently from b461833 to 48d44ce Compare August 9, 2022 05:03
@ilumsden ilumsden removed the priority-urgent Urgent priority issues and PRs label May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-visualization Issues and PRs involving any of the Hatchet provided visualizations status-revisions-needed Revisions have been requested from a reviewer for this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants