README, LICENSE, and another model

rdma-from-gpu · Aug 15, 2024 · f08292c · f08292c
1 parent ae57ed6
commit f08292c
Show file tree

Hide file tree

Showing 10 changed files with 753 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,6 @@
+*.so
+*.onnx
+mod.json
+mod.params
+model.tar
+tuning*.json
diff --git a/LICENSE b/LICENSE
diff --git a/README.md b/README.md
@@ -0,0 +1,16 @@
+This folder contains a set of scripts (and eventually models) used for evaluating the inference serving prototype.
+
+These require a standard version of TVM to be known to your Python interpreter, either via `PYTHONPATH` or via a virtual environment.
+
+The `tune.sh` should optimize the models to use your specific GPU, while the `compile.sh` scripts would compile the models (tuned or not) to a `.so` library that can be later loaded by TVM.
+
+
+
+
+# LICENSE
+
+The models have been sources from public repositories.
+
+The scripts in this folder are released under the GNU GPL v3 license. See [LICENSE](LICENSE).
+
+(C) 2024 Massimo Girondi [email protected] GNU GPL v3
diff --git a/a100_squeezenet_tuned/README.md b/a100_squeezenet_tuned/README.md
@@ -0,0 +1,8 @@
+This is a model from
+
+https://github.com/onnx/models/tree/main/vision/classification/squeezenet
+
+
+It is designed to run on embedded devices, but it's a good benchmark for a "fast" GPU, where we want to see fast inferences with reasonably sized inputs. The weights are small, hence probably not a good candidate if the load/unload times are critical.
+
+The compilation with TVM 0.6 throw an out of resources errors.
diff --git a/a100_squeezenet_tuned/compile.sh b/a100_squeezenet_tuned/compile.sh
@@ -0,0 +1,8 @@
+#!/bin/bash
+# This just compile the onnx model without caring about tuning
+source ../../scripts/activate_venv.sh
+python3 -m tvm.driver.tvmc compile \
+        --target "cuda" \
+        --output model.tar \
+        squeezenet1.1-7.onnx ---help
+tar -xvf model.tar
diff --git a/a100_squeezenet_tuned/compile_tuned.sh b/a100_squeezenet_tuned/compile_tuned.sh
@@ -0,0 +1,10 @@
+#!/bin/bash
+# This just compile the onnx model without caring about tuning
+
+TUNING=$(ls tuning* --sort=time -1 | head -n1)
+python3 -m tvm.driver.tvmc compile \
+        --target "cuda" \
+        --output model.tar \
+        --tuning-records ${TUNING} \
+        squeezenet1.1-7.onnx
+tar -xvf model.tar
diff --git a/a100_squeezenet_tuned/download.sh b/a100_squeezenet_tuned/download.sh
@@ -0,0 +1,7 @@
+#!/bin/bash
+#wget https://github.com/onnx/models/blob/main/vision/classification/squeezenet/model/squeezenet1.0-12-int8.onnx
+wget https://github.com/onnx/models/raw/main/vision/classification/squeezenet/model/squeezenet1.1-7.onnx
+#wget https://github.com/onnx/models/raw/main/vision/classification/squeezenet/model/squeezenet1.0-12-int8.tar.gz
+#wget https://github.com/onnx/models/raw/main/vision/classification/squeezenet/model/squeezenet1.1-7.tar.gz
+#tar xvzf squeezenet1.0-12-int8.tar.gz
+#tar xvzf squeezenet1.1-7.tar.gz
diff --git a/a100_squeezenet_tuned/metadata.json b/a100_squeezenet_tuned/metadata.json
@@ -0,0 +1,16 @@
+{
+        "load_time": 1000000000,
+        "exec_time": [1000000,2000000,4000000,8000000],
+        "weights_size": 100000000,
+        "workspace_size": 500000000,
+        "input_name" : "input",
+        "output_name" : "output",
+        "input_shape": [1,3,224,224],
+        "output_shape": [1,1000],
+        "input_type": "FP32",
+        "output_type": "FP32",
+        "device": "CUDA",
+        "model" : "a100",
+        "architecture" : "CUDA_80"
+
+}
diff --git a/a100_squeezenet_tuned/model2.tar b/a100_squeezenet_tuned/model2.tar
diff --git a/a100_squeezenet_tuned/tune.sh b/a100_squeezenet_tuned/tune.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+
+python3 -m tvm.driver.tvmc tune \
+        --target "cuda" \
+        --output tuning.$(date --iso-8601=minutes).json \
+        squeezenet1.1-7.onnx