Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaner report on completed stages #68

Merged
merged 40 commits into from
Jan 10, 2024
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
c7f3238
Remove the deprecated quantization tool (#53)
jeremyfowers Dec 4, 2023
d5390fb
Stop saving labels files in the cache (#54)
jeremyfowers Dec 4, 2023
f8f8093
Better naming for Stats methods and members (#56)
jeremyfowers Dec 5, 2023
47248e1
Refactor build_model() out of benchmark_model() (#48)
jeremyfowers Dec 5, 2023
6214ab3
Remove the SetSuccess stage (and the need for it) (#59)
jeremyfowers Dec 5, 2023
4d612eb
Show stages completed following all stages order
danielholanda Dec 5, 2023
aaf08a2
Fix CI testing numbering of `cli.py` (#60)
danielholanda Dec 5, 2023
d749c5c
Add NOT STARTED
danielholanda Dec 5, 2023
acae5bb
Add not started and sort alphabetically
danielholanda Dec 5, 2023
4f891d8
Fix CI
danielholanda Dec 5, 2023
a550bed
Merge branch 'canary' into stage_order
danielholanda Dec 5, 2023
9ea9537
Fix CI
danielholanda Dec 5, 2023
f3f0584
Split into multiple columns for clarity
danielholanda Dec 6, 2023
d85c9de
Look for correct column
danielholanda Dec 6, 2023
00d96f1
stages_selected vs selected_stages
danielholanda Dec 6, 2023
eaa1358
Fix CI
danielholanda Dec 6, 2023
607a7b0
suggested changes
danielholanda Dec 7, 2023
57bed7d
merge main into branch
danielholanda Dec 7, 2023
7e7401a
lint
danielholanda Dec 8, 2023
4b8c2a7
Merge branch 'main' into stage_order
danielholanda Dec 18, 2023
38fd493
Update branch
danielholanda Dec 18, 2023
f443f68
Unified terminology
danielholanda Jan 7, 2024
a7e86ae
Suggested implementation
danielholanda Jan 7, 2024
2d2d8fa
Release notes
danielholanda Jan 7, 2024
95bd5de
Unify all build/benchmark/stage status to one enum
jeremyfowers Jan 9, 2024
1d802ff
cleanup
jeremyfowers Jan 9, 2024
45e666d
fix bugs
jeremyfowers Jan 9, 2024
e4d5f70
Improved patch notes
jeremyfowers Jan 9, 2024
a6b957d
Merge branch 'main' into stage_order
jeremyfowers Jan 10, 2024
cfd8ce0
Add release note for #78
jeremyfowers Jan 10, 2024
f49821c
fix test
jeremyfowers Jan 10, 2024
8441018
zzzz
jeremyfowers Jan 10, 2024
01fe625
print info on test fail
jeremyfowers Jan 10, 2024
ce7ae52
fix typo
jeremyfowers Jan 10, 2024
25bce69
fix case
jeremyfowers Jan 10, 2024
5bee2db
Fix test bug. Properly initialize build and benchmark status
jeremyfowers Jan 10, 2024
32098b1
wordsmithing
jeremyfowers Jan 10, 2024
83be9bb
merge with main
jeremyfowers Jan 10, 2024
a5d3545
Report incomplete stages as killed
jeremyfowers Jan 10, 2024
d6a878a
one more release note
jeremyfowers Jan 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions docs/release_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,31 @@ We are tracking two types of major changes:

If you are creating the release notes for a new version, please see the [template](#template-version-majorminorpatch). Release notes should capture all of the significant changes since the last numbered package release.

# Version 1.1.0

This version focuses on improving the clarity of the telemetry reported.

## Users

### User Improvements

- Report splits `stages_completed` into stage status and duration.

## User Breaking Changes

None.

## Developers

### Developer Improvements

None

### Developer Breaking Changes

- `build.Status.COMPLETED_BUILD` is now called `build.Status.COMPLETED_BUILD`
- `COMPLETED_BUILD_STAGES` column in the report was removed.

# Version 1.0.0

This version focuses on cleaning up technical debts and most of the changes are not visible to users. It removes cumbersome requirements for developers, removes unused features to streamline the codebase, and also clarifying some API naming schemes.
Expand Down
4 changes: 2 additions & 2 deletions src/turnkeyml/analyze/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,7 @@ def explore_invocation(
)

stats.save_model_eval_stat(
fs.Keys.BUILD_STATUS, fs.FunctionStatus.SUCCESSFUL
fs.Keys.BUILD_STATUS, fs.FunctionStatus.COMPLETED
)

model_to_benchmark = build_state.results[0]
Expand Down Expand Up @@ -317,7 +317,7 @@ def explore_invocation(
)

stats.save_model_eval_stat(
fs.Keys.BENCHMARK_STATUS, fs.FunctionStatus.SUCCESSFUL
fs.Keys.BENCHMARK_STATUS, fs.FunctionStatus.COMPLETED
)

invocation_info.status_message = "Model successfully benchmarked!"
Expand Down
2 changes: 1 addition & 1 deletion src/turnkeyml/build/ignition.py
Original file line number Diff line number Diff line change
Expand Up @@ -324,7 +324,7 @@ def load_or_make_state(

if (
model_type == build.ModelType.UNKNOWN
and state.build_status == build.Status.SUCCESSFUL_BUILD
and state.build_status == build.Status.COMPLETED_BUILD
):
msg = (
"Model caching is disabled for successful builds against custom Sequences. "
Expand Down
40 changes: 29 additions & 11 deletions src/turnkeyml/build/stage.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import time
import os
import copy
import enum
from typing import List, Tuple
from multiprocessing import Process
import psutil
Expand Down Expand Up @@ -50,6 +51,11 @@ def _name_is_file_safe(name: str):


class Stage(abc.ABC):
class Status(enum.Enum):
NOT_STARTED = "not_started"
COMPLETED = "completed"
INCOMPLETE = "incomplete"

def status_line(self, successful, verbosity):
"""
Print a line of status information for this Stage into the monitor.
Expand Down Expand Up @@ -83,6 +89,8 @@ def __init__(
_name_is_file_safe(unique_name)

self.unique_name = unique_name
self.status_key = f"stage_status:{unique_name}"
self.duration_key = f"stage_duration:{unique_name}"
self.monitor_message = monitor_message
self.progress = None
self.logfile_path = None
Expand Down Expand Up @@ -137,12 +145,12 @@ def fire_helper(self, state: build.State) -> Tuple[build.State, int]:
else:
self.status_line(successful=True, verbosity=state.monitor)

# Stages should not set build.Status.SUCCESSFUL_BUILD, as that is
# Stages should not set build.Status.COMPLETED_BUILD, as that is
# reserved for Sequence.launch()
if state.build_status == build.Status.SUCCESSFUL_BUILD:
if state.build_status == build.Status.COMPLETED_BUILD:
raise exp.StageError(
"TurnkeyML Stages are not allowed to set "
"`state.build_status == build.Status.SUCCESSFUL_BUILD`, "
"`state.build_status == build.Status.COMPLETED_BUILD`, "
"however that has happened. If you are a plugin developer, "
"do not do this. If you are a user, please file an issue at "
"https://github.com/onnx/turnkeyml/issues."
Expand Down Expand Up @@ -268,7 +276,7 @@ def launch(self, state: build.State) -> build.State:

if state.build_status == build.Status.NOT_STARTED:
state.build_status = build.Status.PARTIAL_BUILD
elif state.build_status == build.Status.SUCCESSFUL_BUILD:
elif state.build_status == build.Status.COMPLETED_BUILD:
msg = """
build_model() is running a build on a model that already built successfully, which
should not happen because the build should have loaded from cache or rebuilt from scratch.
Expand All @@ -280,13 +288,23 @@ def launch(self, state: build.State) -> build.State:
# Collect telemetry for the build
stats = fs.Stats(state.cache_dir, state.config.build_name, state.evaluation_id)
stats.save_model_eval_stat(
fs.Keys.ALL_BUILD_STAGES,
fs.Keys.SELECTED_SEQUENCE_OF_STAGES,
self.get_names(),
)

# At the beginning of a sequence no stage has started
for stage in self.stages:
stats.save_model_eval_stat(stage.status_key, Stage.Status.NOT_STARTED.value)
stats.save_model_eval_stat(stage.duration_key, "-")

# Run the build
try:
for stage in self.stages:
# Set status as incomplete, since stage just started
stats.save_model_eval_stat(
stage.status_key, Stage.Status.INCOMPLETE.value
)

# Collect telemetry about the stage
state.current_build_stage = stage.unique_name
start_time = time.time()
Expand All @@ -297,17 +315,17 @@ def launch(self, state: build.State) -> build.State:
# Collect telemetry about the stage
execution_time = time.time() - start_time

stats.save_model_eval_sub_stat(
parent_key=fs.Keys.COMPLETED_BUILD_STAGES,
key=stage.unique_name,
value=execution_time,
# Set status as completed
stats.save_model_eval_stat(
stage.status_key, Stage.Status.COMPLETED.value
)
stats.save_model_eval_stat(stage.duration_key, execution_time)

except exp.StageError as e:
# Advance the cursor below the monitor so
# we can print an error message
stage_depth_in_sequence = self.get_depth() - self.get_names().index(
stage.unique_name
stage.unique_name # pylint: disable=undefined-loop-variable
)
stdout_lines_to_advance = stage_depth_in_sequence - 2
cursor_down = "\n" * stdout_lines_to_advance
Expand All @@ -319,7 +337,7 @@ def launch(self, state: build.State) -> build.State:

else:
state.current_build_stage = None
state.build_status = build.Status.SUCCESSFUL_BUILD
state.build_status = build.Status.COMPLETED_BUILD

# We use a deepcopy here because the Stage framework supports
# intermediate_results of any type, including model objects in memory.
Expand Down
2 changes: 1 addition & 1 deletion src/turnkeyml/build_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ def build_model(

# Return a cached build if possible, otherwise prepare the model State for
# a build
if state.build_status == build.Status.SUCCESSFUL_BUILD:
if state.build_status == build.Status.COMPLETED_BUILD:
# Successful builds can be loaded from cache and returned with
# no additional steps
additional_msg = " (build_name auto-selected)" if config.auto_name else ""
Expand Down
12 changes: 5 additions & 7 deletions src/turnkeyml/cli/report.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,12 +78,6 @@ def summary_spreadsheets(args) -> None:

# Copy the build-specific stats
for key, value in build.items():
# Break each value in "completed build stages" into its own column
# to make analysis easier
if key == fs.Keys.COMPLETED_BUILD_STAGES:
for subkey, subvalue in value.items():
evaluation_stats[subkey] = subvalue

# If a build or benchmark is still marked as "running" at
# reporting time, it
# must have been killed by a time out, out-of-memory (OOM), or some
Expand All @@ -93,7 +87,8 @@ def summary_spreadsheets(args) -> None:
) and value == fs.FunctionStatus.RUNNING:
value = fs.FunctionStatus.KILLED

evaluation_stats[key] = value
# Add stats ensuring that those are all in lower case
evaluation_stats[key.lower()] = value

all_evaluation_stats.append(evaluation_stats)
except yaml.scanner.ScannerError:
Expand All @@ -108,6 +103,9 @@ def summary_spreadsheets(args) -> None:
if header not in column_headers:
column_headers.append(header)

# Sort all columns alphabetically
column_headers = sorted(column_headers)

# Add each build to the report
for evaluation_stats in all_evaluation_stats:
# Start with a dictionary where all of the values are "-". If a build
Expand Down
2 changes: 1 addition & 1 deletion src/turnkeyml/common/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ class Status(enum.Enum):
NOT_STARTED = "not_started"
PARTIAL_BUILD = "partial_build"
BUILD_RUNNING = "build_running"
SUCCESSFUL_BUILD = "successful_build"
COMPLETED_BUILD = "completed_build"
FAILED_BUILD = "failed_build"


Expand Down
8 changes: 2 additions & 6 deletions src/turnkeyml/common/filesystem.py
Original file line number Diff line number Diff line change
Expand Up @@ -322,11 +322,7 @@ class Keys:
# ONNX model input tensor dimensions
ONNX_INPUT_DIMENSIONS = "onnx_input_dimensions"
# List of all build stages in the Sequence
ALL_BUILD_STAGES = "all_build_stages"
# Map of build stages that completed successfully to the
# execution time for that stage. We can figure out if any build
# stages failed if all_build_stages != completed_build_stages.keys().
COMPLETED_BUILD_STAGES = "completed_build_stages"
SELECTED_SEQUENCE_OF_STAGES = "selected_sequence_of_stages"
# Location of the most up-to-date ONNX file for this build. If the
# build completed successfully, this is the final ONNX file.
ONNX_FILE = "onnx_file"
Expand Down Expand Up @@ -362,7 +358,7 @@ class Keys:

class FunctionStatus:
RUNNING = "running"
SUCCESSFUL = "successful"
COMPLETED = "completed"
FAILED = "failed"
KILLED = "killed"

Expand Down
2 changes: 1 addition & 1 deletion src/turnkeyml/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "1.0.0"
__version__ = "1.1.0"
26 changes: 13 additions & 13 deletions test/build_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ def full_compilation_pytorch_model():
monitor=False,
cache_dir=cache_location,
)
return state.build_status == build.Status.SUCCESSFUL_BUILD
return state.build_status == build.Status.COMPLETED_BUILD


def full_compilation_keras_subclass_model():
Expand All @@ -118,7 +118,7 @@ def full_compilation_keras_subclass_model():
monitor=False,
cache_dir=cache_location,
)
return state.build_status == build.Status.SUCCESSFUL_BUILD
return state.build_status == build.Status.COMPLETED_BUILD


def full_compilation_keras_sequential_model():
Expand All @@ -131,7 +131,7 @@ def full_compilation_keras_sequential_model():
monitor=False,
cache_dir=cache_location,
)
return state.build_status == build.Status.SUCCESSFUL_BUILD
return state.build_status == build.Status.COMPLETED_BUILD


def full_compilation_onnx_model():
Expand All @@ -152,7 +152,7 @@ def full_compilation_onnx_model():
monitor=False,
cache_dir=cache_location,
)
return state.build_status == build.Status.SUCCESSFUL_BUILD
return state.build_status == build.Status.COMPLETED_BUILD


def full_compilation_hummingbird_rf():
Expand All @@ -167,7 +167,7 @@ def full_compilation_hummingbird_rf():
monitor=False,
cache_dir=cache_location,
)
return state.build_status == build.Status.SUCCESSFUL_BUILD
return state.build_status == build.Status.COMPLETED_BUILD


def full_compilation_hummingbird_xgb():
Expand All @@ -182,7 +182,7 @@ def full_compilation_hummingbird_xgb():
monitor=False,
cache_dir=cache_location,
)
return state.build_status == build.Status.SUCCESSFUL_BUILD
return state.build_status == build.Status.COMPLETED_BUILD


def full_compilation_hummingbird_lgbm():
Expand All @@ -197,7 +197,7 @@ def full_compilation_hummingbird_lgbm():
monitor=False,
cache_dir=cache_location,
)
return state.build_status == build.Status.SUCCESSFUL_BUILD
return state.build_status == build.Status.COMPLETED_BUILD


def full_compilation_hummingbird_kn():
Expand All @@ -212,7 +212,7 @@ def full_compilation_hummingbird_kn():
monitor=False,
cache_dir=cache_location,
)
return state.build_status == build.Status.SUCCESSFUL_BUILD
return state.build_status == build.Status.COMPLETED_BUILD


def scriptmodule_functional_check():
Expand All @@ -229,7 +229,7 @@ def scriptmodule_functional_check():
monitor=False,
cache_dir=cache_location,
)
return state.build_status == build.Status.SUCCESSFUL_BUILD
return state.build_status == build.Status.COMPLETED_BUILD


def custom_stage():
Expand Down Expand Up @@ -280,7 +280,7 @@ def fire(self, state):
cache_dir=cache_location,
)

return state.build_status == build.Status.SUCCESSFUL_BUILD
return state.build_status == build.Status.COMPLETED_BUILD


class FullyCustomStage(stage.Stage):
Expand Down Expand Up @@ -580,7 +580,7 @@ def test_013_set_onnx_opset(self):
sequence=sequences.optimize_fp16,
)

assert state.build_status == build.Status.SUCCESSFUL_BUILD
assert state.build_status == build.Status.COMPLETED_BUILD

onnx_model = onnx.load(state.results[0])
model_opset = getattr(onnx_model.opset_import[0], "version", None)
Expand All @@ -599,7 +599,7 @@ def test_014_export_only(self):
sequence=sequences.onnx_fp32,
)

assert state.build_status == build.Status.SUCCESSFUL_BUILD
assert state.build_status == build.Status.COMPLETED_BUILD
assert os.path.exists(export.base_onnx_file(state))
assert not os.path.exists(export.opt_onnx_file(state))

Expand Down Expand Up @@ -635,7 +635,7 @@ def test_015_receive_onnx(self):
)

# Make sure the build was successful
assert state.build_status == build.Status.SUCCESSFUL_BUILD
assert state.build_status == build.Status.COMPLETED_BUILD

# Get ONNX file's opset
onnx_model = onnx.load(onnx_file)
Expand Down
Loading
Loading