Skip to content

Commit

Permalink
Fix performance tracking action (#1296)
Browse files Browse the repository at this point in the history
* Fix performance tracking action

Signed-off-by: elronbandel <[email protected]>

* Fix

Signed-off-by: elronbandel <[email protected]>

* Fix

Signed-off-by: elronbandel <[email protected]>

* Update

Signed-off-by: elronbandel <[email protected]>

* try

Signed-off-by: elronbandel <[email protected]>

* try

Signed-off-by: elronbandel <[email protected]>

---------

Signed-off-by: elronbandel <[email protected]>
  • Loading branch information
elronbandel authored Oct 20, 2024
1 parent 12c2293 commit 1e51279
Show file tree
Hide file tree
Showing 10 changed files with 136 additions and 194 deletions.
67 changes: 19 additions & 48 deletions .github/workflows/performance.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,70 +25,41 @@ jobs:
with:
python-version: '3.9'

- run: curl -LsSf https://astral.sh/uv/install.sh | sh
- run: uv pip install --system -e ".[tests]"
- run: pip install coverage[toml]
- name: Install Requirements
run: |
curl -LsSf https://astral.sh/uv/install.sh | sh
uv pip install --system -e ".[tests]"
- name: Save card_profiler python script
uses: actions/upload-artifact@v4
with:
name: card_profiler
path: performance_profile/card_profiler.py
compression-level: 0
overwrite: true
- name: Prepare the dirs for performance evaluation in main
run: |
mkdir -p performance_action
mkdir -p performance_action/logs
echo "" > performance_action/__init__.py
echo " " > performance_action/logs/cards_benchmark.prof
echo " " > performance_action/logs/cards_benchmark.json
cp performance/card_profiler.py performance_action/card_profiler.py
cp performance/compare_performance_results.py performance_action/compare_performance_results.py
- name: Checkout main branch
uses: actions/checkout@v4
with:
ref: main

- name: Prepare the dirs for performance evaluation in main
run: |
mkdir -p performance_profile
mkdir -p performance_profile/logs
echo "" > performance_profile/__init__.py
echo " " > performance_profile/logs/cards_benchmark.prof
echo " " > performance_profile/logs/cards_benchmark.json
- name: Download card_profiler python script
uses: actions/download-artifact@v4
with:
name: card_profiler
path: performance_profile/
clean: false

- name: Run performance on main branch
run: |
python -m performance_profile.card_profiler
cp performance_profile/logs/cards_benchmark.json performance_profile/logs/main_cards_benchmark.json
- name: Save main performance json
uses: actions/upload-artifact@v4
with:
name: main_performance_json
path: performance_profile/logs/main_cards_benchmark.json
compression-level: 0
overwrite: true
python performance_action/card_profiler.py --output_file performance_action/main_results.json
- name: Checkout PR branch
uses: actions/checkout@v4
with:
ref: ${{ github.head_ref }}

- name: Create performance_profile/logs dir
run: |
mkdir -p performance_profile/logs
echo " " > performance_profile/logs/cards_benchmark.prof
clean: false

- name: Run performance on PR branch
run: |
python -m performance_profile.card_profiler
cp performance_profile/logs/cards_benchmark.json performance_profile/logs/pr_cards_benchmark.json
- name: Download main performance result
uses: actions/download-artifact@v4
with:
name: main_performance_json
path: performance_profile/logs/
python performance_action/card_profiler.py --output_file performance_action/pr_results.json
- name: Compare main and PR performance results
run: python -m performance_profile.compare_performance_results
run: |
python performance_action/compare_performance_results.py performance_action/main_results.json performance_action/pr_results.json >> $GITHUB_STEP_SUMMARY
110 changes: 67 additions & 43 deletions performance_profile/card_profiler.py → performance/card_profiler.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
import argparse
import cProfile
import json
import os
import pstats
import tempfile
from io import StringIO

from unitxt.api import load_recipe
Expand All @@ -15,35 +18,35 @@
settings = get_settings()
settings.allow_unverified_code = True

"""Profiles the execution-time of api.load_dataset(), over a benchmark of cards.

Usage: set values for variables cards (the benchmark)
class CardProfiler:
"""Profiles the execution-time of api.load_dataset(), over a benchmark of cards.
from unitxt root dir, run the following linux commands:
Usage: set values for variables cards (the benchmark)
python performance_profile/card_profiler.py
from unitxt root dir, run the following linux commands:
The script computes the total runtime of the benchmark, and the time spent in loading the dataset,
accumulated across the cards in the benchmark, and wraps both results into a json file:
performance_profile/logs/cards_benchmark.json
python performance/card_profiler.py
In addition, the script generates a binary file named performance_profile/logs/cards_benchmark.prof,
which can be nicely and interactively visualized via snakeviz:
The script computes the total runtime of the benchmark, and the time spent in loading the dataset,
accumulated across the cards in the benchmark, and wraps both results into a json file:
performance/logs/cards_benchmark.json
(pip install snakeviz)
snakeviz performance_profile/logs/cards_benchmark.prof
In addition, the script generates a binary file named performance/logs/cards_benchmark.prof,
which can be nicely and interactively visualized via snakeviz:
snakeviz opens an interactive internet browser window allowing to explore all time-details.
See exporing options here: https://jiffyclub.github.io/snakeviz/
(can also use the -s flag for snakeviz which will only set up a server and print out the url
to use from another computer in order to view results shown by that server)
(pip install snakeviz)
snakeviz performance/logs/cards_benchmark.prof
In the browser window, look (ctrl-F) for methods named profiler_... to read profiling data for the major steps in the process.
You will find the total time of each step, accumulated along all cards in the benchmark.
"""
snakeviz opens an interactive internet browser window allowing to explore all time-details.
See exporing options here: https://jiffyclub.github.io/snakeviz/
(can also use the -s flag for snakeviz which will only set up a server and print out the url
to use from another computer in order to view results shown by that server)
In the browser window, look (ctrl-F) for methods named profiler_... to read profiling data for the major steps in the process.
You will find the total time of each step, accumulated along all cards in the benchmark.
"""

class CardProfiler:
def profiler_instantiate_recipe(self, **kwargs) -> StandardRecipe:
return load_recipe(**kwargs)

Expand Down Expand Up @@ -112,32 +115,53 @@ def profile_from_cards():


def main():
# Parse command-line arguments
parser = argparse.ArgumentParser(description="Card Profiler")
parser.add_argument(
"--output_file",
type=str,
required=True,
help="Path to save output files (without extension)",
)
args = parser.parse_args()

# Ensure the directory for the output file exists
output_dir = os.path.dirname(args.output_file)
if output_dir:
os.makedirs(output_dir, exist_ok=True)

logger.info(f"benchmark cards are: {cards}")

cProfile.run(
"profile_from_cards()", "performance_profile/logs/cards_benchmark.prof"
)
f = StringIO()
pst = pstats.Stats("performance_profile/logs/cards_benchmark.prof", stream=f)
pst.strip_dirs()
pst.sort_stats("name") # sort by function name
pst.print_stats("profiler_do_the_profiling|profiler_load_by_recipe")
s = f.getvalue()
assert s.split("\n")[7].split()[3] == "cumtime"
assert "profiler_do_the_profiling" in s.split("\n")[8]
tot_time = round(float(s.split("\n")[8].split()[3]), 3)
assert "profiler_load_by_recipe" in s.split("\n")[9]
load_time = round(float(s.split("\n")[9].split()[3]), 3)
diff = round(tot_time - load_time, 3)

# Data to be written
dictionary = {
"total_time": tot_time,
"load_time": load_time,
"net_time": diff,
}
with open("performance_profile/logs/cards_benchmark.json", "w") as outfile:
json.dump(dictionary, outfile)
# Create a temporary .prof file
with tempfile.NamedTemporaryFile(suffix=".prof", delete=False) as temp_prof_file:
temp_prof_file_path = temp_prof_file.name
cProfile.run("profile_from_cards()", temp_prof_file_path)

f = StringIO()
pst = pstats.Stats(temp_prof_file_path, stream=f)
pst.strip_dirs()
pst.sort_stats("name") # sort by function name
pst.print_stats("profiler_do_the_profiling|profiler_load_by_recipe")
s = f.getvalue()
assert s.split("\n")[7].split()[3] == "cumtime"
assert "profiler_do_the_profiling" in s.split("\n")[8]
tot_time = round(float(s.split("\n")[8].split()[3]), 3)
assert "profiler_load_by_recipe" in s.split("\n")[9]
load_time = round(float(s.split("\n")[9].split()[3]), 3)
diff = round(tot_time - load_time, 3)

# Data to be written
dictionary = {
"total_time": tot_time,
"load_time": load_time,
"net_time": diff,
}

# Write the profiling results to the JSON file (user-specified)
with open(args.output_file, "w+") as outfile:
json.dump(dictionary, outfile)

logger.info(f"JSON output saved to: {args.output_file}")


if __name__ == "__main__":
Expand Down
49 changes: 49 additions & 0 deletions performance/compare_performance_results.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import argparse
import json
import sys

# Argument parser to get file paths from the command line
parser = argparse.ArgumentParser(description="Compare performance profiles.")
parser.add_argument(
"main_perf_file", type=str, help="Path to main performance profile JSON file"
)
parser.add_argument(
"pr_perf_file", type=str, help="Path to PR performance profile JSON file"
)
args = parser.parse_args()

# Reading both performance JSON files:
with open(args.main_perf_file) as openfile:
main_perf = json.load(openfile)

with open(args.pr_perf_file) as openfile:
pr_perf = json.load(openfile)

# Check for valid net_time in the main performance profile
if main_perf["net_time"] == 0:
print("Net run time on main is 0, can't calculate ratio of times.")
sys.exit(1)

# Calculate the ratio between PR and main branch net times
ratio = pr_perf["net_time"] / main_perf["net_time"]

# Markdown table formatting
table_header = "| Branch | Net Time (seconds) | Performance Ratio |\n"
table_divider = "|--------------|--------------------|-------------------|\n"
table_main = f"| Main Branch | {main_perf['net_time']:<18} | - |\n"
table_pr = f"| PR Branch | {pr_perf['net_time']:<18} | {ratio:.2f} |\n"

# Print markdown table
print("### Performance Comparison Results\n")
print(table_header + table_divider + table_main + table_pr)

# Performance degradation check (5% threshold)
if ratio > 1.05:
print("\n**Warning**: Performance degradation exceeds 5%!")
print(
"Explore branch performance via 'python performance_profile/card_profiler.py',"
" followed by 'snakeviz performance_profile/logs/cards_benchmark.prof'."
)
sys.exit(1)

print("\nPerformance of the PR branch is within acceptable limits.")
File renamed without changes.
File renamed without changes.
1 change: 0 additions & 1 deletion performance_profile/__init__.py

This file was deleted.

63 changes: 0 additions & 63 deletions performance_profile/compare_branches.sh

This file was deleted.

37 changes: 0 additions & 37 deletions performance_profile/compare_performance_results.py

This file was deleted.

1 change: 0 additions & 1 deletion performance_profile/logs/cards_benchmark.json

This file was deleted.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ target-version = "py38"
"utils/hf/prepare_dataset.py" = ["T201"]
"utils/hf/prepare_metric.py" = ["T201"]
"utils/compare_unitxt_datasets_between_versions.py" = ["C901"]
"performance_profile/run_profile.py" = ["T201"]
"performance/*.py" = ["T201"]

[tool.ruff.lint]
# Enable Pyflakes (`F`) and a subset of the pycodestyle (`E`) codes by default.
Expand Down

0 comments on commit 1e51279

Please sign in to comment.