Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance tracking action #1296

Merged
merged 6 commits into from
Oct 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 19 additions & 48 deletions .github/workflows/performance.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,70 +25,41 @@ jobs:
with:
python-version: '3.9'

- run: curl -LsSf https://astral.sh/uv/install.sh | sh
- run: uv pip install --system -e ".[tests]"
- run: pip install coverage[toml]
- name: Install Requirements
run: |
curl -LsSf https://astral.sh/uv/install.sh | sh
uv pip install --system -e ".[tests]"

- name: Save card_profiler python script
uses: actions/upload-artifact@v4
with:
name: card_profiler
path: performance_profile/card_profiler.py
compression-level: 0
overwrite: true
- name: Prepare the dirs for performance evaluation in main
run: |
mkdir -p performance_action
mkdir -p performance_action/logs
echo "" > performance_action/__init__.py
echo " " > performance_action/logs/cards_benchmark.prof
echo " " > performance_action/logs/cards_benchmark.json
cp performance/card_profiler.py performance_action/card_profiler.py
cp performance/compare_performance_results.py performance_action/compare_performance_results.py

- name: Checkout main branch
uses: actions/checkout@v4
with:
ref: main

- name: Prepare the dirs for performance evaluation in main
run: |
mkdir -p performance_profile
mkdir -p performance_profile/logs
echo "" > performance_profile/__init__.py
echo " " > performance_profile/logs/cards_benchmark.prof
echo " " > performance_profile/logs/cards_benchmark.json

- name: Download card_profiler python script
uses: actions/download-artifact@v4
with:
name: card_profiler
path: performance_profile/
clean: false

- name: Run performance on main branch
run: |
python -m performance_profile.card_profiler
cp performance_profile/logs/cards_benchmark.json performance_profile/logs/main_cards_benchmark.json

- name: Save main performance json
uses: actions/upload-artifact@v4
with:
name: main_performance_json
path: performance_profile/logs/main_cards_benchmark.json
compression-level: 0
overwrite: true
python performance_action/card_profiler.py --output_file performance_action/main_results.json

- name: Checkout PR branch
uses: actions/checkout@v4
with:
ref: ${{ github.head_ref }}

- name: Create performance_profile/logs dir
run: |
mkdir -p performance_profile/logs
echo " " > performance_profile/logs/cards_benchmark.prof
clean: false

- name: Run performance on PR branch
run: |
python -m performance_profile.card_profiler
cp performance_profile/logs/cards_benchmark.json performance_profile/logs/pr_cards_benchmark.json

- name: Download main performance result
uses: actions/download-artifact@v4
with:
name: main_performance_json
path: performance_profile/logs/
python performance_action/card_profiler.py --output_file performance_action/pr_results.json

- name: Compare main and PR performance results
run: python -m performance_profile.compare_performance_results
run: |
python performance_action/compare_performance_results.py performance_action/main_results.json performance_action/pr_results.json >> $GITHUB_STEP_SUMMARY
110 changes: 67 additions & 43 deletions performance_profile/card_profiler.py → performance/card_profiler.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
import argparse
import cProfile
import json
import os
import pstats
import tempfile
from io import StringIO

from unitxt.api import load_recipe
Expand All @@ -15,35 +18,35 @@
settings = get_settings()
settings.allow_unverified_code = True

"""Profiles the execution-time of api.load_dataset(), over a benchmark of cards.

Usage: set values for variables cards (the benchmark)
class CardProfiler:
"""Profiles the execution-time of api.load_dataset(), over a benchmark of cards.

from unitxt root dir, run the following linux commands:
Usage: set values for variables cards (the benchmark)

python performance_profile/card_profiler.py
from unitxt root dir, run the following linux commands:

The script computes the total runtime of the benchmark, and the time spent in loading the dataset,
accumulated across the cards in the benchmark, and wraps both results into a json file:
performance_profile/logs/cards_benchmark.json
python performance/card_profiler.py

In addition, the script generates a binary file named performance_profile/logs/cards_benchmark.prof,
which can be nicely and interactively visualized via snakeviz:
The script computes the total runtime of the benchmark, and the time spent in loading the dataset,
accumulated across the cards in the benchmark, and wraps both results into a json file:
performance/logs/cards_benchmark.json

(pip install snakeviz)
snakeviz performance_profile/logs/cards_benchmark.prof
In addition, the script generates a binary file named performance/logs/cards_benchmark.prof,
which can be nicely and interactively visualized via snakeviz:

snakeviz opens an interactive internet browser window allowing to explore all time-details.
See exporing options here: https://jiffyclub.github.io/snakeviz/
(can also use the -s flag for snakeviz which will only set up a server and print out the url
to use from another computer in order to view results shown by that server)
(pip install snakeviz)
snakeviz performance/logs/cards_benchmark.prof

In the browser window, look (ctrl-F) for methods named profiler_... to read profiling data for the major steps in the process.
You will find the total time of each step, accumulated along all cards in the benchmark.
"""
snakeviz opens an interactive internet browser window allowing to explore all time-details.
See exporing options here: https://jiffyclub.github.io/snakeviz/
(can also use the -s flag for snakeviz which will only set up a server and print out the url
to use from another computer in order to view results shown by that server)

In the browser window, look (ctrl-F) for methods named profiler_... to read profiling data for the major steps in the process.
You will find the total time of each step, accumulated along all cards in the benchmark.
"""

class CardProfiler:
def profiler_instantiate_recipe(self, **kwargs) -> StandardRecipe:
return load_recipe(**kwargs)

Expand Down Expand Up @@ -112,32 +115,53 @@ def profile_from_cards():


def main():
# Parse command-line arguments
parser = argparse.ArgumentParser(description="Card Profiler")
parser.add_argument(
"--output_file",
type=str,
required=True,
help="Path to save output files (without extension)",
)
args = parser.parse_args()

# Ensure the directory for the output file exists
output_dir = os.path.dirname(args.output_file)
if output_dir:
os.makedirs(output_dir, exist_ok=True)

logger.info(f"benchmark cards are: {cards}")

cProfile.run(
"profile_from_cards()", "performance_profile/logs/cards_benchmark.prof"
)
f = StringIO()
pst = pstats.Stats("performance_profile/logs/cards_benchmark.prof", stream=f)
pst.strip_dirs()
pst.sort_stats("name") # sort by function name
pst.print_stats("profiler_do_the_profiling|profiler_load_by_recipe")
s = f.getvalue()
assert s.split("\n")[7].split()[3] == "cumtime"
assert "profiler_do_the_profiling" in s.split("\n")[8]
tot_time = round(float(s.split("\n")[8].split()[3]), 3)
assert "profiler_load_by_recipe" in s.split("\n")[9]
load_time = round(float(s.split("\n")[9].split()[3]), 3)
diff = round(tot_time - load_time, 3)

# Data to be written
dictionary = {
"total_time": tot_time,
"load_time": load_time,
"net_time": diff,
}
with open("performance_profile/logs/cards_benchmark.json", "w") as outfile:
json.dump(dictionary, outfile)
# Create a temporary .prof file
with tempfile.NamedTemporaryFile(suffix=".prof", delete=False) as temp_prof_file:
temp_prof_file_path = temp_prof_file.name
cProfile.run("profile_from_cards()", temp_prof_file_path)

f = StringIO()
pst = pstats.Stats(temp_prof_file_path, stream=f)
pst.strip_dirs()
pst.sort_stats("name") # sort by function name
pst.print_stats("profiler_do_the_profiling|profiler_load_by_recipe")
s = f.getvalue()
assert s.split("\n")[7].split()[3] == "cumtime"
assert "profiler_do_the_profiling" in s.split("\n")[8]
tot_time = round(float(s.split("\n")[8].split()[3]), 3)
assert "profiler_load_by_recipe" in s.split("\n")[9]
load_time = round(float(s.split("\n")[9].split()[3]), 3)
diff = round(tot_time - load_time, 3)

# Data to be written
dictionary = {
"total_time": tot_time,
"load_time": load_time,
"net_time": diff,
}

# Write the profiling results to the JSON file (user-specified)
with open(args.output_file, "w+") as outfile:
json.dump(dictionary, outfile)

logger.info(f"JSON output saved to: {args.output_file}")


if __name__ == "__main__":
Expand Down
49 changes: 49 additions & 0 deletions performance/compare_performance_results.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import argparse
import json
import sys

# Argument parser to get file paths from the command line
parser = argparse.ArgumentParser(description="Compare performance profiles.")
parser.add_argument(
"main_perf_file", type=str, help="Path to main performance profile JSON file"
)
parser.add_argument(
"pr_perf_file", type=str, help="Path to PR performance profile JSON file"
)
args = parser.parse_args()

# Reading both performance JSON files:
with open(args.main_perf_file) as openfile:
main_perf = json.load(openfile)

with open(args.pr_perf_file) as openfile:
pr_perf = json.load(openfile)

# Check for valid net_time in the main performance profile
if main_perf["net_time"] == 0:
print("Net run time on main is 0, can't calculate ratio of times.")
sys.exit(1)

# Calculate the ratio between PR and main branch net times
ratio = pr_perf["net_time"] / main_perf["net_time"]

# Markdown table formatting
table_header = "| Branch | Net Time (seconds) | Performance Ratio |\n"
table_divider = "|--------------|--------------------|-------------------|\n"
table_main = f"| Main Branch | {main_perf['net_time']:<18} | - |\n"
table_pr = f"| PR Branch | {pr_perf['net_time']:<18} | {ratio:.2f} |\n"

# Print markdown table
print("### Performance Comparison Results\n")
print(table_header + table_divider + table_main + table_pr)

# Performance degradation check (5% threshold)
if ratio > 1.05:
print("\n**Warning**: Performance degradation exceeds 5%!")
print(
"Explore branch performance via 'python performance_profile/card_profiler.py',"
" followed by 'snakeviz performance_profile/logs/cards_benchmark.prof'."
)
sys.exit(1)

print("\nPerformance of the PR branch is within acceptable limits.")
File renamed without changes.
File renamed without changes.
1 change: 0 additions & 1 deletion performance_profile/__init__.py

This file was deleted.

63 changes: 0 additions & 63 deletions performance_profile/compare_branches.sh

This file was deleted.

37 changes: 0 additions & 37 deletions performance_profile/compare_performance_results.py

This file was deleted.

1 change: 0 additions & 1 deletion performance_profile/logs/cards_benchmark.json

This file was deleted.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ target-version = "py38"
"utils/hf/prepare_dataset.py" = ["T201"]
"utils/hf/prepare_metric.py" = ["T201"]
"utils/compare_unitxt_datasets_between_versions.py" = ["C901"]
"performance_profile/run_profile.py" = ["T201"]
"performance/*.py" = ["T201"]

[tool.ruff.lint]
# Enable Pyflakes (`F`) and a subset of the pycodestyle (`E`) codes by default.
Expand Down
Loading