Use the -h
or --help
flag to view a description of the Model Analyzer's
command line interface.
$ model-analyzer -h
Options like -q
, --quiet
and -v
, --verbose
are global and apply to all
model analyzer subcommands.
The -m
or --mode
flag is global and is accessible to all subcommands. It tells the model analyzer the context
in which it is being run. Currently model analyzer supports 2 modes.
This is the default mode. When in this mode, Model Analyzer will operate to find
the optimal model configuration for an online inference scenario. By default in
online mode, the best model configuration will be the one that maximizes
throughput. If a latency budget is specified to the analyze subcommand via
--latency-budget
, then the best model configuration will be the one with the highest throughput in the given budget.
In online mode the analyze and report subcommands will generate summaries specific to online inference. See the example online summary and online detailed report.
The offline mode --mode=offline
tells Model Analyzer to operate to find the
optimal model configuration for an offline inference scenario. By default
in offline mode, the best model configuration will be the one that maximizes throughput.
A minimum throughput can be specified to the analyze subcommand
via --min-throughput
to ignore any configuration that does not exceed a minimum number of inferences per second.
In offline mode the analyze and report subcommands will generate reports specific to offline inference. See the example offline summary and offline detailed report examples.
The Model Analyzer's functionality is split across 3 separate subcommands. Each
subcommand has its own CLI and config options. Some options are required for
more than one subcommand (e.g. --export-path
). See the Configuring Model
Analyzer section for more details on configuring each of these
subcommands.
The profile
subcommand allows the user to run model inferences using perf
analyzer, and collect metrics like throughput, latency and memory usage. Use the
following command to see the usage and argument descriptions for the subcommand.
$ model-analyzer profile -h
Depending on the command line or YAML config options provided, the profile
subcommand will either perform a
manual or automatic
search over perf analyzer
and model config file parameters. For each combination of model config
parameters (e.g. max batch size, dynamic batching, and instance count), it will run tritonserver and perf analyzer instances with
all the specified run parameters (client request concurrency and static batch
size). It will also save the protobuf (.pbtxt) model config files corresponding
to each combination in the output model
repository. Model Analyzer collects
various metrics at fixed time intervals during these perf analyzer runs. Each
perf analyzer run generates a single measurement, which corresponds to a row in
the output tables. After completing the runs for all configurations for each
model, the Model Analyzer will save the measurements it has collected into the
checkpoint directory as a pickle file. See the
Checkpointing section for more details on checkpoints
Some example profile commands are shown here. For a full example see the quick start section.
- Run auto config search on a model called
resnet50_libtorch
located in/home/model_repo
$ model-analyzer profile -m /home/model_repo --profile-models resnet50_libtorch
- Run auto config search on 2 models called
resnet50_libtorch
andvgg16_graphdef
located in/home/model_repo
and save checkpoints tocheckpoints
$ model-analyzer profile -m /home/model_repo --profile-models resnet50_libtorch,vgg16_graphdef --checkpoint-directory=checkpoints
- Run auto config search on a model called
resnet50_libtorch
located in/home/model_repo
, but change the repository where model config variants are stored to/home/output_repo
$ model-analyzer profile -m /home/model_repo --output-model-repository-path=/home/output_repo --profile-models resnet50_libtorch
- Run profile over manually defined configurations for a models
classification_malaria_v1
andclassification_chestxray_v1
located in/home/model_repo
using the YAML config file
$ model-analyzer profile -f /path/to/config.yaml
The contents of config.yaml
are shown below.
model_repository: /home/model_repo
run_config_search_disable: True
concurrency: [2,4,8,16,32]
batch_sizes: [8,16,64]
profile_models:
classification_malaria_v1:
model_config_parameters:
instance_group:
-
kind: KIND_GPU
count: [1,2]
dynamic_batching:
max_queue_delay_microseconds: [100]
classification_chestxray_v1:
model_config_parameters:
instance_group:
-
kind: KIND_GPU
count: [1,2]
dynamic_batching:
max_queue_delay_microseconds: [100]
Note: The checkpoint directory should be removed between consecutive runs of
the model-analyzer profile
command.
The analyze
subcommand allows the user to create summaries and data tables
from the measurements taken using the profile
subcommand. The YAML config file
can be used to set constraints and objectives used to sort and filter the
measurements, and order the model configs and models according to the metrics
collected. Use the
following command to see the usage and argument descriptions for the subcommand.
$ model-analyzer analyze -h
The analyze
subcommand begins by loading the "latest" checkpoint available in
the checkpoint directory. Next, it sorts the models specified in the CLI or
config YAML, provided they contain measurements in the checkpoint, using the
objectives specified in the config YAML. Finally, it constructs summary PDFs
using the top model configs for each model, as well as across models, if
requested (See the Reports section for more details). The
analyze
subcommand can be run multiple times with different configurations if
the user would like to sort and filter the results using different objectives or
under different constraints.
- Create summary and results for model
resnet50_libtorch
from latest checkpoint in directorycheckpoints
.
$ model-analyzer analyze --analysis-models resnet50_libtorch --checkpoint-directory=checkpoints
- Create summaries and results for models
resnet50_libtorch
andvgg16_graphdef
from same checkpoint as above and export them to a directory calledexport_directory
$ model-analyzer analyze --analysis-models resnet50_libtorch,vgg16_graphdef -e export_directory --checkpoint-directory=checkpoints
- Apply objectives and constraints to sort and filter results in summary plots and tables using yaml config file.
$ model-analyzer analyze -f /path/to/config.yaml
The contents of config.yaml
are shown below.
checkpoint_directory: ./checkpoints/
export_path: ./export_directory/
analysis_models:
resnet50_libtorch:
objectives:
- perf_throughput
constraints:
perf_latency_p99:
max: 15
vgg16_graphdef:
objectives:
- gpu_used_memory
constraints:
perf_latency_p99:
max: 15
The report
subcommand allows the user to create detailed reports on one or
more of the model configs that were profiled.
$ model-analyzer report -h
Instead of showing only the top measurements from each config like in the summary reports, Model Analyzer compiles and displays all the meausurements for a given config in the detailed report (See the Reports section for more details).
- Generate detailed reports for a model configs of
resnet50_libtorch
calledresnet50_libtorch_config_1
andresnet50_libtorch_config_2
. Read fromcheckpoints
and write toexport_directory
.
$ model-analyzer --report-model-configs resnet50_libtorch_config_1,resnet50_libtorch_config_2 --checkpoint-directory checkpoints -e export_directory
- Generate detailed report for
resnet50_libtorch_config_2
with a custom plot using YAML config file
$ model-analyzer report -f /path/to/config.yaml
The contents of the config.yaml
are shown below
checkpoint_directory: ./checkpoints/
export_path: './export_directory'
report_model_configs:
resnet50_libtorch_config_2:
plots:
throughput_v_memory:
title: Thoughput vs GPU Memory
x_axis: gpu_used_memory
y_axis: perf_throughput
monotonic: True