diff --git a/docs/en/reference/sql/dml/ALTER_STATEMENT.md b/docs/en/openmldb_sql/dml/ALTER_STATEMENT.md similarity index 86% rename from docs/en/reference/sql/dml/ALTER_STATEMENT.md rename to docs/en/openmldb_sql/dml/ALTER_STATEMENT.md index dc52380fdb5..022e3a8a011 100644 --- a/docs/en/reference/sql/dml/ALTER_STATEMENT.md +++ b/docs/en/openmldb_sql/dml/ALTER_STATEMENT.md @@ -1,44 +1,44 @@ -# ALTER - -OpenMLDB supports ALERT command to update table's attributes. - -## Syntax - -``` -ALTER TABLE TableName (AlterAction) 'offline_path' FilePath - -TableName ::= - Identifier ('.' Identifier)? - -AlterAction: - 'add' | 'drop' - -FilePath - ::= URI - -URI - ::= 'file://FilePathPattern' - |'hdfs://FilePathPattern' - |'hive://[db.]table' - |'FilePathPattern' - -FilePathPattern - ::= string_literal -``` - -**Description** - -- `ALTER` only supports updateing offline table's symbolic paths currently. - -## Examples - -```SQL --- Add one offline path -ALTER TABLE t1 ADD offline_path 'hdfs://foo/bar'; - --- Drop one offline path -ALTER TABLE t1 DROP offline_path 'hdfs://foo/bar'; - --- Add one offline path and drop anthor offline path -ALTER TABLE t1 ADD offline_path 'hdfs://foo/bar', DROP offline_path 'hdfs://foo/bar2'; -``` \ No newline at end of file +# ALTER + +OpenMLDB supports ALERT command to update table's attributes. + +## Syntax + +``` +ALTER TABLE TableName (AlterAction) 'offline_path' FilePath + +TableName ::= + Identifier ('.' Identifier)? + +AlterAction: + 'add' | 'drop' + +FilePath + ::= URI + +URI + ::= 'file://FilePathPattern' + |'hdfs://FilePathPattern' + |'hive://[db.]table' + |'FilePathPattern' + +FilePathPattern + ::= string_literal +``` + +**Description** + +- `ALTER` only supports updating offline table's symbolic paths currently. + +## Examples + +```SQL +-- Add one offline path +ALTER TABLE t1 ADD offline_path 'hdfs://foo/bar'; + +-- Drop one offline path +ALTER TABLE t1 DROP offline_path 'hdfs://foo/bar'; + +-- Add one offline path and drop anthor offline path +ALTER TABLE t1 ADD offline_path 'hdfs://foo/bar', DROP offline_path 'hdfs://foo/bar2'; +``` diff --git a/docs/en/reference/sql/dml/DELETE_STATEMENT.md b/docs/en/openmldb_sql/dml/DELETE_STATEMENT.md similarity index 59% rename from docs/en/reference/sql/dml/DELETE_STATEMENT.md rename to docs/en/openmldb_sql/dml/DELETE_STATEMENT.md index e2488494497..60914052bfd 100644 --- a/docs/en/reference/sql/dml/DELETE_STATEMENT.md +++ b/docs/en/openmldb_sql/dml/DELETE_STATEMENT.md @@ -11,9 +11,8 @@ TableName ::= ``` **Description** - -- `DELETE` statement will delete all data from the index of specific column value of online table. -- The filter columns sepcified by `WHERE` must be an index column. if it is a key column, only `=` can be used. +- `DELETE` statement will delete data fulfilling specific requirements in online table, not all data from the index. Only index related to where condition will be deleted. For more examples please check [function_boundary](../../quickstart/function_boundary.md#delete). +- The filter columns specified by `WHERE` must be an index column. if it is a key column, only `=` can be used. ## Examples @@ -25,4 +24,4 @@ DELETE FROM t1 WHERE col1 = 'aaaa' and ts_col = 1687145994000; DELETE FROM t1 WHERE col1 = 'aaaa' and ts_col > 1687059594000 and ts_col < 1687145994000; DELETE FROM t1 WHERE ts_col > 1687059594000 and ts_col < 1687145994000; -``` \ No newline at end of file +``` diff --git a/docs/en/reference/sql/dml/INSERT_STATEMENT.md b/docs/en/openmldb_sql/dml/INSERT_STATEMENT.md similarity index 100% rename from docs/en/reference/sql/dml/INSERT_STATEMENT.md rename to docs/en/openmldb_sql/dml/INSERT_STATEMENT.md diff --git a/docs/en/reference/sql/dml/LOAD_DATA_STATEMENT.md b/docs/en/openmldb_sql/dml/LOAD_DATA_STATEMENT.md similarity index 64% rename from docs/en/reference/sql/dml/LOAD_DATA_STATEMENT.md rename to docs/en/openmldb_sql/dml/LOAD_DATA_STATEMENT.md index 8210b92217c..e10a91e98be 100644 --- a/docs/en/reference/sql/dml/LOAD_DATA_STATEMENT.md +++ b/docs/en/openmldb_sql/dml/LOAD_DATA_STATEMENT.md @@ -1,6 +1,10 @@ # LOAD DATA INFILE -The `LOAD DATA INFILE` statement load data efficiently from a file to a table. `LOAD DATA INFILE` is complementary to `SELECT ... INTO OUTFILE`. To export data from a table to a file, use [SELECT...INTO OUTFILE](../dql/SELECT_INTO_STATEMENT.md). +The `LOAD DATA INFILE` statement load data efficiently from a file to a table. `LOAD DATA INFILE` is complementary to `SELECT INTO OUTFILE`. To export data from a table to a file, use [SELECT INTO OUTFILE](../dql/SELECT_INTO_STATEMENT.md). To import data into a table, please use `LOAD DATA INFILE`. + +```{note} +The filepath of INFILE can be a single file name or a directory. `*` can also be used. If multiple file format exists in the directory, only FORMAT specified in LoadDataInfileOptionsList will be chosen. The format is the same as `DataFrameReader.read.load(String)`. You can read the directory with spark-shell to confirm if the read is successful. +``` ## Syntax @@ -43,18 +47,17 @@ Supports loading data from Hive, but needs extra settings, see [Hive Support](#H The following table introduces the parameters of `LOAD DATA INFILE`. -| Parameter | Type | Default Value | Note | -| ----------- | ------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| delimiter | String | , | It defines the column separator, the default value is `,`. | -| header | Boolean | true | It indicates that whether the table to import has a header. If the value is `true`, the table has a header. | -| null_value | String | null | It defines the string that will be used to replace the `NULL` value when loading data. | -| format | String | csv | It defines the format of the input file.
`csv` is the default format.
`parquet` format is supported in the cluster version. | -| quote | String | " | It defines the string surrounding the input data. The string length should be <= 1.
load_mode='cluster': default is `"`, the content surrounded by a pair of the quote characters will be parsed as a whole. For example, if the surrounding string is `"#"` then the original data like `1, 1.0, #This is a string, with comma#, normal_string` will be converted to four fields. The first field is an integer 1, the second is a float 1.0, the third field is a string "This is a string, with comma" and the 4th is "normal_string" even it's no quote.
load_mode='local': default is `\0`, which means that the string surrounding the input data is empty. | -| mode | String | "error_if_exists" | It defines the input mode.
`error_if_exists` is the default mode which indicates that an error will be thrown out if the offline table already has data. This input mode is only supported by the offline execution mode.
`overwrite` indicates that if the file already exists, the data will overwrite the contents of the original file. This input mode is only supported by the offline execution mode.
`append` indicates that if the table already exists, the data will be appended to the original table. Both offline and online execution modes support this input mode. | -| deep_copy | Boolean | true | It defines whether `deep_copy` is used. Only offline load supports `deep_copy=false`, you can specify the `INFILE` path as the offline storage address of the table to avoid hard copy. | -| load_mode | String | cluster | `load_mode='local'` only supports loading the `csv` local files into the `online` storage; It loads the data synchronously by the client process.
`load_mode='cluster'` only supports the cluster version. It loads the data via Spark synchronously or asynchronously. | -| thread | Integer | 1 | It only works for data loading locally, i.e., `load_mode='local'` or in the standalone version; It defines the number of threads used for data loading. The max value is `50`. | -| writer_type | String | single | The writer type for online loading in cluster mode, `single` or `batch`, default is `single`。`single` means write to cluster when reading, cost less memory. `batch` will read the whole rdd partition, check all data is right to pack, then write to cluster, it needs more memory. In some cases, `batch` is better to get the unwritten data, and retry the unwritten part | +| Parameter | Type | Default Value | Note | +| ---------- | ------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| delimiter | String | , | It defines the column separator, the default value is `,`. | +| header | Boolean | true | It indicates that whether the table to import has a header. If the value is `true`, the table has a header. | +| null_value | String | null | It defines the string that will be used to replace the `NULL` value when loading data. | +| format | String | csv | It defines the format of the input file.
`csv` is the default format.
`parquet` format is supported in the cluster version. | +| quote | String | " | It defines the string surrounding the input data. The string length should be <= 1.
load_mode='cluster': default is `"`, the content surrounded by a pair of the quote characters will be parsed as a whole. For example, if the surrounding string is `"#"` then the original data like `1, 1.0, #This is a string, with comma#, normal_string` will be converted to four fields. The first field is an integer 1, the second is a float 1.0, the third field is a string "This is a string, with comma" and the 4th is "normal_string" even it's no quote.
load_mode='local': default is `\0`, which means that the string surrounding the input data is empty. | +| mode | String | "error_if_exists" | It defines the input mode.
`error_if_exists` is the default mode which indicates that an error will be thrown out if the offline table already has data. This input mode is only supported by the offline execution mode.
`overwrite` indicates that if the file already exists, the data will overwrite the contents of the original file. This input mode is only supported by the offline execution mode.
`append` indicates that if the table already exists, the data will be appended to the original table. Both offline and online execution modes support this input mode. | +| deep_copy | Boolean | true | It defines whether `deep_copy` is used. Only offline load supports `deep_copy=false`, you can specify the `INFILE` path as the offline storage address of the table to avoid hard copy. | +| load_mode | String | cluster | `load_mode='local'` only supports loading the `csv` local files into the `online` storage; It loads the data synchronously by the client process.
`load_mode='cluster'` only supports the cluster version. It loads the data via Spark synchronously or asynchronously. | +| thread | Integer | 1 | It only works for data loading locally, i.e., `load_mode='local'` or in the standalone version; It defines the number of threads used for data loading. The max value is `50`. | ```{note} - In the cluster version, the specified execution mode (defined by `execute_mode`) determines whether to import data to online or offline storage when the `LOAD DATA INFILE` statement is executed. For the standalone version, there is no difference in storage mode and the `deep_copy` option is not supported. @@ -76,7 +79,7 @@ Therefore, you are suggested to use absolute paths. In the stand-alone version, ## SQL Statement Template ```sql -LOAD DATA INFILE 'file_name' INTO TABLE 'table_name' OPTIONS (key = value, ...); +LOAD DATA INFILE 'file_path' INTO TABLE 'table_name' OPTIONS (key = value, ...); ``` ### Example diff --git a/docs/en/reference/sql/dml/index.rst b/docs/en/openmldb_sql/dml/index.rst similarity index 100% rename from docs/en/reference/sql/dml/index.rst rename to docs/en/openmldb_sql/dml/index.rst