Skip to content

Commit

Permalink
feat: add doc of spark once job on k8s (#756)
Browse files Browse the repository at this point in the history
* feat: add doc of spark once job on k8s

* fix: hide ip information

* doc: add supplemental instructions

* Update spark.md

---------

Co-authored-by: peacewong <[email protected]>
  • Loading branch information
lenoxzhao and peacewong authored Jan 3, 2024
1 parent 941fa6c commit f213d81
Show file tree
Hide file tree
Showing 6 changed files with 229 additions and 1 deletion.
Binary file added docs/engine-usage/images/k8s-config.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/engine-usage/images/k8s-ecm-label.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
116 changes: 115 additions & 1 deletion docs/engine-usage/spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,120 @@ Execute the test case
sh ./bin/linkis-cli -engineType spark-3.2.1 -codeType sql -labelMap engingeConnRuntimeMode=yarnCluster -submitUser hadoop -proxyUser hadoop -code "select 123"
```

### 3.6 Submitting spark K8S cluster tasks via `Linkis-cli`

Before submitting the task, please install the `metric server` on Kubernetes, as relevant APIs will be invoked during the resource validation process.

#### 3.6.1 External Resource Provider Configuration

To submit task to kubernetes cluster, you need to add cluster configuration on `Linkis Control Panel->Basic Data Management->External Resource Provider Manage` as show in the figure. The `Resource Type` must be set to `Kubernetes` while the `Name` can be customized.

![k8s](./images/k8s-config.png)

The parameters to be set in the `Config` are shown in the following table:

| Conf | Desc |
| ----------------- | ------------------------------------------------------------ |
| k8sMasterUrl | Full URL of the API Server such as`https://xxx.xxx.xxx.xxx:6443`. This parameter must be configured. |
| k8sConfig | Location of the kubeconfig file such as`/home/hadoop/.kube/config`. If this parameter is configured, the following three parameters do not need to be configured. |
| k8sCaCertData | CA certificate for clusters in kubeconfig corresponding to `certificate-authority-data`. If `k8sConfig` is not configured, you need to configure this parameter |
| k8sClientCertData | Client certificate in kubeconfig corresponding to `client-certificate-data`,If `k8sConfig` is not configured, you need to configure this parameter |
| k8sClientKeyData | Client private key in kubeconfig corresponding to `client-key-data`,If `k8sConfig` is not configured, you need to configure this parameter |

#### 3.6.2 Label Configuration for ECM

After external provider configuration, you need to configure corresponding cluster label information on `ECM Managerment` as shown in the figure. You need to selete `yarnCluster`for label type and `K8S-cluster name` for label value where the cluster name is the name specified in `External Resource Provider Configuration` such as `K8S-default` if the name is set to `default` in the previous step.

> Due to compatibility issues with `ClusterLabel`, the Key value has not been changed yet(yarnCluster).
![k8s-ecm-label](./images/k8s-ecm-label.png)

#### 3.6.3 Description of parameters

When using`linkis-cli` to submit task, the parameters that need to be set are as follows:

* Specify the cluster to execute the task. If the cluster name is `default` when configuring the external provider, you need to specify the value of the `k8sCluster` as `'K8S-default'` when submitting the task;
* To distinguish it from the `k8s-operator` submitting method, you need to specify the `spark.master` parameter as `k8s-native`;
* Currently spark once job tasks on k8s only support `cluster` deploy mode, you need to set `spark.submit.deployMode` to `cluster`.

The corresponding Spark parameter of Linkis parameters as follows:

| Linkis Parameters | Spark Parameters | Default Value |
| --------------------------------------- | ------------------------------------------------------- | ------------------- |
| linkis.spark.k8s.master.url | --master | empty string |
| linkis.spark.k8s.serviceAccount | spark.kubernetes.authenticate.driver.serviceAccountName | empty string |
| linkis.spark.k8s.image | spark.kubernetes.container.image | apache/spark:v3.2.1 |
| linkis.spark.k8s.imagePullPolicy | spark.kubernetes.container.image.pullPolicy | Always |
| linkis.spark.k8s.namespace | spark.kubernetes.namespace | default |
| linkis.spark.k8s.ui.port | spark.ui.port | 4040 |
| linkis.spark.k8s.executor.request.cores | spark.kubernetes.executor.request.cores | 1 |
| linkis.spark.k8s.driver.request.cores | spark.kubernetes.driver.request.cores | 1 |

#### 3.6.4 Example of commands for submission

submitting task with jar

```shell
linkis-cli --mode once \
-engineType spark-3.2.1 \
-labelMap engineConnMode=once \
-k8sCluster 'K8S-default' \
-jobContentMap runType='jar' \
-jobContentMap spark.app.main.class='org.apache.spark.examples.SparkPi' \
-confMap spark.master='k8s-native' \
-confMap spark.app.name='spark-submit-jar-k8s' \
-confMap spark.app.resource='local:///opt/spark/examples/jars/spark-examples_2.12-3.2.1.jar' \
-confMap spark.submit.deployMode='cluster' \
-confMap linkis.spark.k8s.serviceAccount='spark' \
-confMap linkis.spark.k8s.master.url='k8s://https://xxx.xxx.xxx.xxx:6443' \
-confMap linkis.spark.k8s.config.file='/home/hadoop/.kube/config' \
-confMap linkis.spark.k8s.imagePullPolicy='IfNotPresent' \
-confMap linkis.spark.k8s.namespace='default'
```

submitting task with py

```shell
linkis-cli --mode once \
-engineType spark-3.2.1 \
-labelMap engineConnMode=once \
-k8sCluster 'K8S-default' \
-jobContentMap runType='py' \
-confMap spark.master='k8s-native' \
-confMap spark.app.name='spark-submit-py-k8s' \
-confMap spark.app.resource='local:///opt/spark/examples/src/main/python/pi.py' \
-confMap spark.submit.deployMode='cluster' \
-confMap spark.submit.pyFiles='local:///opt/spark/examples/src/main/python/wordcount.py' \
-confMap linkis.spark.k8s.serviceAccount='spark' \
-confMap linkis.spark.k8s.master.url='k8s://https://xxx.xxx.xxx.xxx:6443' \
-confMap linkis.spark.k8s.config.file='/home/hadoop/.kube/config' \
-confMap linkis.spark.k8s.imagePullPolicy='IfNotPresent' \
-confMap linkis.spark.k8s.namespace='default' \
-confMap linkis.spark.k8s.image="apache/spark-py:v3.2.1"
```

#### 3.6.5 Supplemental instructions

**Upgrade instructions for old version**

* You need to use `linkis-dist/package/db/upgrade/1.5.0_schema/mysql/linkis_ddl.sql` to upgrade the database fields. Specifically, the `label_key` field of `linkis_cg_manager_label` needs to be increased from 32 to 50 in length.

```sql
ALTER TABLE `linkis_cg_manager_label` MODIFY COLUMN label_key varchar(50);
```

* Prior to version 1.5.0, when building CombineLabel, ClusterLabel was not included. To maintain compatibility with older versions, when the submitted ClusterLabel value is 'Yarn-default', ClusterLabel is still not included when building CombineLabel. You can disable this feature by setting `linkis.combined.without.yarn.default` to false (default is true).

> The specific reason is that if tasks related to that ClusterLabel were submitted in old versions, corresponding resource records would exist in the database. After upgrading to the new version, since CombineLabel includes ClusterLabel, conflicts would occur in the database's resource records when submitting tasks of this type. Therefore, to maintain compatibility with older versions, the construction of CombineLabel for Yarn-default (the default value of ClusterLabel) still does not include ClusterLabel.
> If the latest version is installed directly, this issue does not need to be considered because there are no conflicting records in the database. You can set `linkis.combined.without.yarn.default` to false to improve readability.
**Validation of submitting tasks**

Submitting a Spark Once Job to K8S involves two levels of resource validation, and the task will only be submitted to the K8S cluster after passing both levels of validation:

1. First, user resource quota validation will be performed. For detailed process, please refer to [**ResourceManager Architecture**](architecture/feature/computation-governance-services/linkis-manager/resource-manager.md).
2. Next, resource validation for the K8S cluster will be performed. If a resourceQuota is configured under the current namespace, it will be prioritized for validation. Otherwise, the available resources of the cluster will be calculated directly through the metric server for validation.

## 4. Engine configuration instructions

### 4.1 Default Configuration Description
Expand Down Expand Up @@ -310,4 +424,4 @@ insert into `linkis_ps_configuration_config_value` (`config_key_id`, `config_val
(select `relation`.`config_key_id` AS `config_key_id`, '' AS `config_value`, `relation`.`engine_type_label_id` AS `config_label_id` FROM linkis_ps_configuration_key_engine_relation relation
INNER JOIN linkis_cg_manager_label label ON relation.engine_type_label_id = label.id AND label.label_value = @SPARK_ALL);

```
```
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,120 @@ hdfs dfs -put hive-site.xml hdfs:///spark/cluster
sh ./bin/linkis-cli -engineType spark-3.2.1 -codeType sql -labelMap engingeConnRuntimeMode=yarnCluster -submitUser hadoop -proxyUser hadoop -code "select 123"
```

### 3.6 通过 `Linkis-cli` 提交spark k8s cluster任务

提交任务前,请在Kubernetes上安装metric server,在资源校验过程中会调用到相关的API。

#### 3.6.1 拓展资源配置

用户首先需要在Linkis控制台中配置拓展资源信息,在**Linkis控制台->基础数据管理->拓展资源管理**中新增Kubernetes集群配置,如下图所示。其中**资源类型**必须设置为`Kubernetes`**名称**可自定义。

![k8s](./images/k8s-config.png)

**配置信息**中需要设置的参数如下表所示:

| 配置 | 说明 |
| ----------------- | ------------------------------------------------------------ |
| k8sMasterUrl | API Server的完整URL,如`https://xxx.xxx.xxx.xxx:6443`,该参数必须配置 |
| k8sConfig | kubeconfig文件的位置,如`/home/hadoop/.kube/config`,如果配置了该参数,则不需要配置以下三个参数 |
| k8sCaCertData | kubeconfig中集群的CA证书,对应`certificate-authority-data`,如果不配置k8sConfig,则需要配置该参数 |
| k8sClientCertData | kubeconfig中的客户端证书,对应`client-certificate-data`,如果不配置k8sConfig,则需要配置该参数 |
| k8sClientKeyData | kubeconfig中的客户端私钥,对应`client-key-data`,如果不配置k8sConfig,则需要配置该参数 |

#### 3.6.2 ECM标签配置

配置完拓展资源后,需要在**ECM管理**中配置对应ECM的集群标签信息,如图所示,其中标签类型选择`yarnCluster`,标签值填写`K8S-集群名称`,这里的集群名称指的是上一步拓展资源配置中的名称,如名称配置为`default`,则此处标签值应设置为`K8S-default`

> 由于`ClusterLabel`的兼容性问题,暂未修改其Key值(yarnCluster)。
![k8s-ecm-label](./images/k8s-ecm-label.png)

#### 3.6.3 提交参数说明

以linkis-cli为例,提交任务需要设置的参数:

* 指定执行任务的集群,如配置集群时集群名称为`default`,则提交任务时需要指定`k8sCluster`参数的值为`'K8S-default'`
* 为区分operator提交任务方式,需要指定`spark.master`参数为`k8s-native`
* 目前k8s上的once job任务仅支持cluster运行模式,需要设置`spark.submit.deployMode``cluster`

其他Linkis参数和Spark参数的对照如下:

| Linkis参数 | Spark参数 | 默认值 |
| --------------------------------------- | ------------------------------------------------------- | ------------------- |
| linkis.spark.k8s.master.url | --master | 空字符串 |
| linkis.spark.k8s.serviceAccount | spark.kubernetes.authenticate.driver.serviceAccountName | 空字符串 |
| linkis.spark.k8s.image | spark.kubernetes.container.image | apache/spark:v3.2.1 |
| linkis.spark.k8s.imagePullPolicy | spark.kubernetes.container.image.pullPolicy | Always |
| linkis.spark.k8s.namespace | spark.kubernetes.namespace | default |
| linkis.spark.k8s.ui.port | spark.ui.port | 4040 |
| linkis.spark.k8s.executor.request.cores | spark.kubernetes.executor.request.cores | 1 |
| linkis.spark.k8s.driver.request.cores | spark.kubernetes.driver.request.cores | 1 |

#### 3.6.4 提交命令示例

提交jar任务

```shell
linkis-cli --mode once \
-engineType spark-3.2.1 \
-labelMap engineConnMode=once \
-k8sCluster 'K8S-default' \
-jobContentMap runType='jar' \
-jobContentMap spark.app.main.class='org.apache.spark.examples.SparkPi' \
-confMap spark.master='k8s-native' \
-confMap spark.app.name='spark-submit-jar-k8s' \
-confMap spark.app.resource='local:///opt/spark/examples/jars/spark-examples_2.12-3.2.1.jar' \
-confMap spark.submit.deployMode='cluster' \
-confMap linkis.spark.k8s.serviceAccount='spark' \
-confMap linkis.spark.k8s.master.url='k8s://https://xxx.xxx.xxx.xxx:6443' \
-confMap linkis.spark.k8s.config.file='/home/hadoop/.kube/config' \
-confMap linkis.spark.k8s.imagePullPolicy='IfNotPresent' \
-confMap linkis.spark.k8s.namespace='default'
```

提交py任务

```shell
linkis-cli --mode once \
-engineType spark-3.2.1 \
-labelMap engineConnMode=once \
-k8sCluster 'K8S-default' \
-jobContentMap runType='py' \
-confMap spark.master='k8s-native' \
-confMap spark.app.name='spark-submit-py-k8s' \
-confMap spark.app.resource='local:///opt/spark/examples/src/main/python/pi.py' \
-confMap spark.submit.deployMode='cluster' \
-confMap spark.submit.pyFiles='local:///opt/spark/examples/src/main/python/wordcount.py' \
-confMap linkis.spark.k8s.serviceAccount='spark' \
-confMap linkis.spark.k8s.master.url='k8s://https://xxx.xxx.xxx.xxx:6443' \
-confMap linkis.spark.k8s.config.file='/home/hadoop/.kube/config' \
-confMap linkis.spark.k8s.imagePullPolicy='IfNotPresent' \
-confMap linkis.spark.k8s.namespace='default' \
-confMap linkis.spark.k8s.image="apache/spark-py:v3.2.1"
```

#### 3.6.5 补充说明

**旧版本升级说明**

* 需要使用`linkis-dist/package/db/upgrade/1.5.0_schema/mysql/linkis_ddl.sql`进行升级数据库字段,具体是将`linkis_cg_manager_label``label_key`字段长度从32增加到50。

```sql
ALTER TABLE `linkis_cg_manager_label` MODIFY COLUMN label_key varchar(50);
```

* 1.5.0版本以前,构建CombineLabel时是不包括ClusterLabel的,为兼容旧版本,当提交的ClusterLabel值为Yarn-default时,构建CombineLabel时仍然不包含ClusterLabel,可通过设置`linkis.combined.without.yarn.default`为false来关闭该功能(默认为true)。

> 具体原因是,如果在旧版本有提交过跟该ClusterLabel相关的任务,在数据库会有对应的资源记录,当升级到新版本之后,由于CombineLabel包含了ClusterLabel,所以在提交该类型的任务时,数据库的资源记录会发生冲突,所以为兼容旧版本,对于Yarn-default(ClusterLabel默认值)的CombineLabel构建仍然不包含ClusterLabel。
> 如果是直接安装的最新版本,则不需要考虑这个问题,因为数据库里没有会发生冲突的记录,可以将`linkis.combined.without.yarn.default`设置为false以提升可读性。
**提交任务的校验过程**

提交Spark Once Job任务到K8S要经过两层资源校验,两层校验通过后任务才会真正提交到K8S集群:

1. 首先会进行用户的资源配额校验,详细过程参考[**ResourceManager 架构**](architecture/feature/computation-governance-services/linkis-manager/resource-manager.md)
2. 其次会进行K8S集群的资源校验,如果当前namespace下配置了resourceQuota,则优先通过resourceQuota来进行校验,否则直接通过metric server计算集群的可用资源来进行校验。

## 4.引擎配置说明

### 4.1 默认配置说明
Expand Down

0 comments on commit f213d81

Please sign in to comment.