From d3c8aa65beece279daf3201824ed7ccf4c8f1f68 Mon Sep 17 00:00:00 2001 From: siqi Date: Mon, 28 Aug 2023 19:21:41 +0800 Subject: [PATCH] Add customized implementation --- report/customization.md | 79 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) create mode 100644 report/customization.md diff --git a/report/customization.md b/report/customization.md new file mode 100644 index 0000000..96944e9 --- /dev/null +++ b/report/customization.md @@ -0,0 +1,79 @@ + +# ⚡️ Customized Implementation + +You can build and customize your cluster from scratch according to your needs. Here in this section you'll find: (1) System prerequisites, (2) AI features, (3) OpenMLDB evaluation, (4) Flink evaluation. + +## Prerequisites + +Before executing the benchmarking scripts, ensure that your environment meets the following version requirements, assuming you've already correctly configured the target FE system. + +- Java JDK: Version 1.8.0 or higher +- Maven: 3.8.0 (recommended) + +## AI Features + +In the *features* folder: Check out the features utilized in each of the 6 AI tasks, which are generated by the commercial automated ML tool [HCML](https://en.4paradigm.com/product/hypercycle_ml.html) (the simplified version is available at *https://github.com/4paradigm/AutoX* ). + +## OpenMLDB Evaluation + +**Step 1:** Clone the repository + +**Step 2:** Download and move the data files to the `dataset` directory of the repository + +**Step 3:** [Start the OpenMLDB cluster](https://github.com/4paradigm/OpenMLDB/blob/main/docs/en/deploy/install_deploy.md#install-and-deploy). For a quick start, you can use the [docker](https://github.com/4paradigm/OpenMLDB/blob/main/docs/en/quickstart/openmldb_quickstart.md#pulls-the-image), but note that the performance may not be optimal since all the components are deployed on a single physical machine. + +> Please be aware that the default values for `spark.driver.memory` and `spark.executor.memory` may not be enough for your needs. If you encounter a `java.lang.OutOfMemoryError: Java heap space` error, you may need to increase them by setting `spark.default.conf` in `conf/taskmanager.properties` and restart taskmanager, or set spark parameters through CLI. You can refer to [Spark Client Configuration](https://github.com/4paradigm/OpenMLDB/blob/main/docs/en/reference/client_config/client_spark_config.md#spark-client-configuration). +>``` +>spark.default.conf=spark.driver.memory=32g;spark.executor.memory=32g +>``` + + +**Step 4:** Modify the `conf.properties.template` file to create your own `conf.properties` file in the `./OpenMLDB/conf` directory, and update the configuration settings in the file accordingly, including the OpenMLDB cluster and the locations of data and queries. + +4.1 Modify the locations of data and query, + +```sh +export FEBENCH_ROOT=`pwd` +# better to add file:// +sed s#\#file://$FEBENCH_ROOT# ./OpenMLDB/conf/conf.properties.template > ./OpenMLDB/conf/conf.properties +sed s#\#$FEBENCH_ROOT# ./flink/conf/conf.properties.template > ./flink/conf/conf.properties +``` + +4.2 Modify the OpenMLDB cluster in `conf.properties` to your own, + +```sh +# ./OpenMLDB/conf/conf.properties +ZK_CLUSTER=127.0.0.1:7181 +ZK_PATH=/openmldb +``` + +**Step 5:** Compile and run the test + +```bash +cd OpenMLDB +./compile_test.sh +./test.sh +``` + +Example test result looks as follows +![image](../imgs/openmldb-jmh.png) + + +## Flink Evaluation + +Repeat the 1-5 steps in [*OpenMLDB Evaluation*](#openmldb-evaluation). And there are a few more steps: + +1. In Step 3, additionally start a disk-based storage engine (e.g., RocksDB in MySQL) to persist the Flink table data. Note (1) the listening port is set 3306 by default and (2) you need to preload all the secondary tables into the storage engine. + +2. In Step 5, supply `` when running `compile_test.sh` script; and no parameter when running `test.sh`, e.g., + +```bash +./compile_test.sh 3 # compile and run the test of task3 +./test.sh # rerun the test of task3 +``` + +3. You will need to rerun `compile_test.sh` if you modify the file `conf.properties`. This is not required for *OpenMLDB Evaluation*. + +![image](../imgs/flink-jmh.png) + +