From 69bae9a401c20849af41696023169ad4426fdd44 Mon Sep 17 00:00:00 2001 From: Manfred Moser Date: Wed, 10 Jan 2024 15:36:15 -0800 Subject: [PATCH] Add docs for json_table Co-authored-by: Michael Eby --- docs/src/main/sphinx/functions/json.md | 234 +++++++++++++++++++++++++ docs/src/main/sphinx/sql/select.md | 7 + 2 files changed, 241 insertions(+) diff --git a/docs/src/main/sphinx/functions/json.md b/docs/src/main/sphinx/functions/json.md index 2066d57998dd..ff7f5a458e18 100644 --- a/docs/src/main/sphinx/functions/json.md +++ b/docs/src/main/sphinx/functions/json.md @@ -1138,6 +1138,240 @@ FROM customers | 102 | 'missing' | | 103 | 'missing' | + +(json-table)= +## json_table + +The `json_table` clause extracts a table from a JSON value. Use this clause to +transform JSON data into a relational format, making it easier to query and +analyze. Use `json_table` in the `FROM` clause of a +[`SELECT`](select-json-table) statement to create a table from JSON data. + +```text +JSON_TABLE( + json_input, + json_path [ AS path_name ] + [ PASSING value AS parameter_name [, ...] ] + COLUMNS ( + column_definition [, ...] ) + [ PLAN ( json_table_specific_plan ) + | PLAN DEFAULT ( json_table_default_plan ) ] + [ { ERROR | EMPTY } ON ERROR ] +) +``` + +The `COLUMNS` clause supports the following `column_definition` arguments: + +```text +column_name FOR ORDINALITY +| column_name type + [ FORMAT JSON [ ENCODING { UTF8 | UTF16 | UTF32 } ] ] + [ PATH json_path ] + [ { WITHOUT | WITH { CONDITIONAL | UNCONDITIONAL } } [ ARRAY ] WRAPPER ] + [ { KEEP | OMIT } QUOTES [ ON SCALAR STRING ] ] + [ { ERROR | NULL | EMPTY { [ARRAY] | OBJECT } | DEFAULT expression } ON EMPTY ] + [ { ERROR | NULL | DEFAULT expression } ON ERROR ] +| NESTED [ PATH ] json_path [ AS path_name ] COLUMNS ( column_definition [, ...] ) +``` + +`json_input` is a character string or a binary string. It must contain a single +JSON item. + +`json_path` is a string literal containing the path mode specification and the +path expression. It follows the syntax rules described in +[](json-path-syntax-and-semantics). + +```text +'strict ($.price + $.tax)?(@ > 99.9)' +'lax $[0 to 1].floor()?(@ > 10)' +``` + +In the `PASSING` clause, pass values as named parameters that the `json_path` +expression can reference. + +```text +PASSING orders.totalprice AS o_price, + orders.tax % 10 AS o_tax +``` + +Use named parameters to reference the values in the path expression. Prefix +named parameters with `$`. + +```text +'lax $?(@.price > $o_price || @.tax > $o_tax)' +``` + +You can also pass JSON values in the `PASSING` clause. Use `FORMAT JSON` to +specify the format and `ENCODING` to specify the encoding: + +```text +PASSING orders.json_desc FORMAT JSON AS o_desc, + orders.binary_record FORMAT JSON ENCODING UTF16 AS o_rec +``` + +The `json_path` value is case-sensitive. The SQL identifiers are uppercase. Use +quoted identifiers in the `PASSING` clause: + +```text +'lax $.$KeyName' PASSING nation.name AS KeyName --> ERROR; no passed value found +'lax $.$KeyName' PASSING nation.name AS "KeyName" --> correct +``` + +The `PLAN` clause specifies how to join columns from different paths. Use +`OUTER` or `INNER` to define how to join parent paths with their child paths. +Use `CROSS` or `UNION` to join siblings. + +`COLUMNS` defines the schema of your table. Each `column_definition` specifies +how to extract and format your `json_input` value into a relational column. + +`PLAN` is an optional clause to control how to process and join nested JSON +data. + +`ON ERROR` specifies how to handle processing errors. `ERROR ON ERROR` throws an +error. `EMPTY ON ERROR` returns an empty result set. + +`column_name` specifies a column name. + +`FOR ORDINALITY` adds a row number column to the output table, starting at `1`. +Specify the column name in the column definition: + +```text +row_num FOR ORDINALITY +``` + +`NESTED PATH` extracts data from nested levels of a `json_input` value. Each +`NESTED PATH` clause can contain `column_definition` values. + +The `json_table` function returns a result set that you can use like any other +table in your queries. You can join the result set with other tables or +combine multiple arrays from your JSON data. + +You can also process nested JSON objects without parsing the data multiple +times. + +Use `json_table` as a lateral join to process JSON data from another table. + +### Examples + +The following query uses `json_table` to extract values from a JSON array and +return them as rows in a table with three columns: + +```sql +SELECT + * +FROM + json_table( + '[ + {"id":1,"name":"Africa","wikiDataId":"Q15"}, + {"id":2,"name":"Americas","wikiDataId":"Q828"}, + {"id":3,"name":"Asia","wikiDataId":"Q48"}, + {"id":4,"name":"Europe","wikiDataId":"Q51"} + ]', + 'strict $' COLUMNS ( + NESTED PATH 'strict $[*]' COLUMNS ( + id integer PATH 'strict $.id', + name varchar PATH 'strict $.name', + wiki_data_id varchar PATH 'strict $."wikiDataId"' + ) + ) + ); +``` + +| id | child | wiki_data_id | +| -- | --------- | ------------- | +| 1 | Africa | Q1 | +| 2 | Americas | Q828 | +| 3 | Asia | Q48 | +| 4 | Europe | Q51 | + +The following query uses `json_table` to extract values from an array of nested +JSON objects. It flattens the nested JSON data into a single table. The example +query processes an array of continent names, where each continent contains an +array of countries and their populations. + +The `NESTED PATH 'lax $[*]'` clause iterates through the continent objects, +while the `NESTED PATH 'lax $.countries[*]'` iterates through each country +within each continent. This creates a flat table structure with four rows +combining each continent with each of its countries. Continent values repeat for +each of their countries. + +```sql +SELECT + * +FROM + json_table( + '[ + {"continent": "Asia", "countries": [ + {"name": "Japan", "population": 125.7}, + {"name": "Thailand", "population": 71.6} + ]}, + {"continent": "Europe", "countries": [ + {"name": "France", "population": 67.4}, + {"name": "Germany", "population": 83.2} + ]} + ]', + 'lax $' COLUMNS ( + NESTED PATH 'lax $[*]' COLUMNS ( + continent varchar PATH 'lax $.continent', + NESTED PATH 'lax $.countries[*]' COLUMNS ( + country varchar PATH 'lax $.name', + population double PATH 'lax $.population' + ) + ) + )); +``` + +| continent | country | population | +| ---------- | --------- | ------------- | +| Asia | Japan | 125.7 | +| Asia | Thailand | 71.6 | +| Europe | France | 67.4 | +| Europe | Germany | 83.2 | + +The following query uses `PLAN` to specify an `OUTER` join between a parent path +and a child path: + +```sql +SELECT + * +FROM + JSON_TABLE( + '[]', + 'lax $' AS "root_path" + COLUMNS( + a varchar(1) PATH 'lax "A"', + NESTED PATH 'lax $[*]' AS "nested_path" + COLUMNS (b varchar(1) PATH 'lax "B"')) + PLAN ("root_path" OUTER "nested_path") + ); +``` + +| a | b | +| ---- | ---- | +| A | null | + +The following query uses `PLAN` to specify an `INNER` join between a parent path +and a child path: + +```sql +SELECT + * +FROM + JSON_TABLE( + '[]', + 'lax $' AS "root_path" + COLUMNS( + a varchar(1) PATH 'lax "A"', + NESTED PATH 'lax $[*]' AS "nested_path" + COLUMNS (b varchar(1) PATH 'lax "B"')) + PLAN ("root_path" INNER "nested_path") + ); +``` + +| a | b | +| ---- | ---- | +| null | null | + (json-array)= ## json_array diff --git a/docs/src/main/sphinx/sql/select.md b/docs/src/main/sphinx/sql/select.md index 82938f816871..59d96267e0f0 100644 --- a/docs/src/main/sphinx/sql/select.md +++ b/docs/src/main/sphinx/sql/select.md @@ -1277,6 +1277,13 @@ LEFT JOIN UNNEST(checkpoints) AS t(checkpoint) ON TRUE; Note that in case of using `LEFT JOIN` the only condition supported by the current implementation is `ON TRUE`. +(select-json-table)= +## JSON_TABLE + +`JSON_TABLE` transforms JSON data into a relational table format. Like `UNNEST` +and `LATERAL`, use `JSON_TABLE` in the `FROM` clause of a `SELECT` statement. +For more information, see [`JSON_TABLE`](json-table). + ## Joins Joins allow you to combine data from multiple relations.