Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: move-from-reference-folder-to-openmldbsql-dml-folder #3573

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,44 +1,44 @@
# ALTER
OpenMLDB supports ALERT command to update table's attributes.
## Syntax
```
ALTER TABLE TableName (AlterAction) 'offline_path' FilePath
TableName ::=
Identifier ('.' Identifier)?
AlterAction:
'add' | 'drop'
FilePath
::= URI
URI
::= 'file://FilePathPattern'
|'hdfs://FilePathPattern'
|'hive://[db.]table'
|'FilePathPattern'
FilePathPattern
::= string_literal
```
**Description**
- `ALTER` only supports updateing offline table's symbolic paths currently.
## Examples
```SQL
-- Add one offline path
ALTER TABLE t1 ADD offline_path 'hdfs://foo/bar';
-- Drop one offline path
ALTER TABLE t1 DROP offline_path 'hdfs://foo/bar';
-- Add one offline path and drop anthor offline path
ALTER TABLE t1 ADD offline_path 'hdfs://foo/bar', DROP offline_path 'hdfs://foo/bar2';
```
# ALTER

OpenMLDB supports ALERT command to update table's attributes.

## Syntax

```
ALTER TABLE TableName (AlterAction) 'offline_path' FilePath

TableName ::=
Identifier ('.' Identifier)?

AlterAction:
'add' | 'drop'

FilePath
::= URI

URI
::= 'file://FilePathPattern'
|'hdfs://FilePathPattern'
|'hive://[db.]table'
|'FilePathPattern'

FilePathPattern
::= string_literal
```

**Description**

- `ALTER` only supports updating offline table's symbolic paths currently.

## Examples

```SQL
-- Add one offline path
ALTER TABLE t1 ADD offline_path 'hdfs://foo/bar';

-- Drop one offline path
ALTER TABLE t1 DROP offline_path 'hdfs://foo/bar';

-- Add one offline path and drop anthor offline path
ALTER TABLE t1 ADD offline_path 'hdfs://foo/bar', DROP offline_path 'hdfs://foo/bar2';
```
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,8 @@ TableName ::=
```

**Description**

- `DELETE` statement will delete all data from the index of specific column value of online table.
- The filter columns sepcified by `WHERE` must be an index column. if it is a key column, only `=` can be used.
- `DELETE` statement will delete data fulfilling specific requirements in online table, not all data from the index. Only index related to where condition will be deleted. For more examples please check [function_boundary](../../quickstart/function_boundary.md#delete).
- The filter columns specified by `WHERE` must be an index column. if it is a key column, only `=` can be used.

## Examples

Expand All @@ -25,4 +24,4 @@ DELETE FROM t1 WHERE col1 = 'aaaa' and ts_col = 1687145994000;
DELETE FROM t1 WHERE col1 = 'aaaa' and ts_col > 1687059594000 and ts_col < 1687145994000;

DELETE FROM t1 WHERE ts_col > 1687059594000 and ts_col < 1687145994000;
```
```
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# LOAD DATA INFILE

The `LOAD DATA INFILE` statement load data efficiently from a file to a table. `LOAD DATA INFILE` is complementary to `SELECT ... INTO OUTFILE`. To export data from a table to a file, use [SELECT...INTO OUTFILE](../dql/SELECT_INTO_STATEMENT.md).
The `LOAD DATA INFILE` statement load data efficiently from a file to a table. `LOAD DATA INFILE` is complementary to `SELECT INTO OUTFILE`. To export data from a table to a file, use [SELECT INTO OUTFILE](../dql/SELECT_INTO_STATEMENT.md). To import data into a table, please use `LOAD DATA INFILE`.

```{note}
The filepath of INFILE can be a single file name or a directory. `*` can also be used. If multiple file format exists in the directory, only FORMAT specified in LoadDataInfileOptionsList will be chosen. The format is the same as `DataFrameReader.read.load(String)`. You can read the directory with spark-shell to confirm if the read is successful.
```

## Syntax

Expand Down Expand Up @@ -43,18 +47,17 @@ Supports loading data from Hive, but needs extra settings, see [Hive Support](#H

The following table introduces the parameters of `LOAD DATA INFILE`.

| Parameter | Type | Default Value | Note |
| ----------- | ------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| delimiter | String | , | It defines the column separator, the default value is `,`. |
| header | Boolean | true | It indicates that whether the table to import has a header. If the value is `true`, the table has a header. |
| null_value | String | null | It defines the string that will be used to replace the `NULL` value when loading data. |
| format | String | csv | It defines the format of the input file.<br />`csv` is the default format. <br />`parquet` format is supported in the cluster version. |
| quote | String | " | It defines the string surrounding the input data. The string length should be <= 1. <br />load_mode='cluster': default is `"`, the content surrounded by a pair of the quote characters will be parsed as a whole. For example, if the surrounding string is `"#"` then the original data like `1, 1.0, #This is a string, with comma#, normal_string` will be converted to four fields. The first field is an integer 1, the second is a float 1.0, the third field is a string "This is a string, with comma" and the 4th is "normal_string" even it's no quote. <br /> load_mode='local': default is `\0`, which means that the string surrounding the input data is empty. |
| mode | String | "error_if_exists" | It defines the input mode.<br />`error_if_exists` is the default mode which indicates that an error will be thrown out if the offline table already has data. This input mode is only supported by the offline execution mode.<br />`overwrite` indicates that if the file already exists, the data will overwrite the contents of the original file. This input mode is only supported by the offline execution mode.<br />`append` indicates that if the table already exists, the data will be appended to the original table. Both offline and online execution modes support this input mode. |
| deep_copy | Boolean | true | It defines whether `deep_copy` is used. Only offline load supports `deep_copy=false`, you can specify the `INFILE` path as the offline storage address of the table to avoid hard copy. |
| load_mode | String | cluster | `load_mode='local'` only supports loading the `csv` local files into the `online` storage; It loads the data synchronously by the client process. <br /> `load_mode='cluster'` only supports the cluster version. It loads the data via Spark synchronously or asynchronously. |
| thread | Integer | 1 | It only works for data loading locally, i.e., `load_mode='local'` or in the standalone version; It defines the number of threads used for data loading. The max value is `50`. |
| writer_type | String | single | The writer type for online loading in cluster mode, `single` or `batch`, default is `single`。`single` means write to cluster when reading, cost less memory. `batch` will read the whole rdd partition, check all data is right to pack, then write to cluster, it needs more memory. In some cases, `batch` is better to get the unwritten data, and retry the unwritten part |
| Parameter | Type | Default Value | Note |
| ---------- | ------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| delimiter | String | , | It defines the column separator, the default value is `,`. |
| header | Boolean | true | It indicates that whether the table to import has a header. If the value is `true`, the table has a header. |
| null_value | String | null | It defines the string that will be used to replace the `NULL` value when loading data. |
| format | String | csv | It defines the format of the input file.<br />`csv` is the default format. <br />`parquet` format is supported in the cluster version. |
| quote | String | " | It defines the string surrounding the input data. The string length should be <= 1. <br />load_mode='cluster': default is `"`, the content surrounded by a pair of the quote characters will be parsed as a whole. For example, if the surrounding string is `"#"` then the original data like `1, 1.0, #This is a string, with comma#, normal_string` will be converted to four fields. The first field is an integer 1, the second is a float 1.0, the third field is a string "This is a string, with comma" and the 4th is "normal_string" even it's no quote. <br /> load_mode='local': default is `\0`, which means that the string surrounding the input data is empty. |
| mode | String | "error_if_exists" | It defines the input mode.<br />`error_if_exists` is the default mode which indicates that an error will be thrown out if the offline table already has data. This input mode is only supported by the offline execution mode.<br />`overwrite` indicates that if the file already exists, the data will overwrite the contents of the original file. This input mode is only supported by the offline execution mode.<br />`append` indicates that if the table already exists, the data will be appended to the original table. Both offline and online execution modes support this input mode. |
| deep_copy | Boolean | true | It defines whether `deep_copy` is used. Only offline load supports `deep_copy=false`, you can specify the `INFILE` path as the offline storage address of the table to avoid hard copy. |
| load_mode | String | cluster | `load_mode='local'` only supports loading the `csv` local files into the `online` storage; It loads the data synchronously by the client process. <br /> `load_mode='cluster'` only supports the cluster version. It loads the data via Spark synchronously or asynchronously. |
| thread | Integer | 1 | It only works for data loading locally, i.e., `load_mode='local'` or in the standalone version; It defines the number of threads used for data loading. The max value is `50`. |

```{note}
- In the cluster version, the specified execution mode (defined by `execute_mode`) determines whether to import data to online or offline storage when the `LOAD DATA INFILE` statement is executed. For the standalone version, there is no difference in storage mode and the `deep_copy` option is not supported.
Expand All @@ -76,7 +79,7 @@ Therefore, you are suggested to use absolute paths. In the stand-alone version,
## SQL Statement Template

```sql
LOAD DATA INFILE 'file_name' INTO TABLE 'table_name' OPTIONS (key = value, ...);
LOAD DATA INFILE 'file_path' INTO TABLE 'table_name' OPTIONS (key = value, ...);
```

### Example
Expand Down
File renamed without changes.
Loading