Skip to content

Commit

Permalink
Translate PPL LOOKUP Command (opensearch-project#686)
Browse files Browse the repository at this point in the history
* initial commit of lookup

Signed-off-by: Lantao Jin <[email protected]>

* fix bug

Signed-off-by: Lantao Jin <[email protected]>

* fix scalafmt

Signed-off-by: Lantao Jin <[email protected]>

* add docs

Signed-off-by: Lantao Jin <[email protected]>

---------

Signed-off-by: Lantao Jin <[email protected]>
  • Loading branch information
LantaoJin authored Sep 25, 2024
1 parent 919ba5c commit 0b248ec
Show file tree
Hide file tree
Showing 15 changed files with 966 additions and 2 deletions.
69 changes: 69 additions & 0 deletions docs/PPL-Lookup-command.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
## PPL Lookup Command

## Overview
Lookup command enriches your search data by adding or replacing data from a lookup index (dimension table).
You can extend fields of an index with values from a dimension table, append or replace values when lookup condition is matched.
As an alternative of [Join command](../docs/PPL-Join-command.md), lookup command is more suitable for enriching the source data with a static dataset.


### Syntax of Lookup Command

```sql
SEARCH source=<sourceIndex>
| <other piped command>
| LOOKUP <lookupIndex> (<lookupMappingField> [AS <sourceMappingField>])...
[(REPLACE | APPEND) (<inputField> [AS <outputField>])...]
| <other piped command>
```
**lookupIndex**
- Required
- Description: the name of lookup index (dimension table)

**lookupMappingField**
- Required
- Description: A mapping key in \<lookupIndex\>, analogy to a join key from right table. You can specify multiple \<lookupMappingField\> with comma-delimited.

**sourceMappingField**
- Optional
- Default: \<lookupMappingField\>
- Description: A mapping key from source **query**, analogy to a join key from left side. If you don't specify any \<sourceMappingField\>, its default value is \<lookupMappingField\>.

**inputField**
- Optional
- Default: All fields of \<lookupIndex\> where matched values are applied to result output if no field is specified.
- Description: A field in \<lookupIndex\> where matched values are applied to result output. You can specify multiple \<inputField\> with comma-delimited. If you don't specify any \<inputField\>, all fields of \<lookupIndex\> where matched values are applied to result output.

**outputField**
- Optional
- Default: \<inputField\>
- Description: A field of output. You can specify multiple \<outputField\>. If you specify \<outputField\> with an existing field name in source query, its values will be replaced or appended by matched values from \<inputField\>. If the field specified in \<outputField\> is a new field, an extended new field will be applied to the results.

**REPLACE | APPEND**
- Optional
- Default: REPLACE
- Description: If you specify REPLACE, matched values in \<lookupIndex\> field overwrite the values in result. If you specify APPEND, matched values in \<lookupIndex\> field only append to the missing values in result.

### Usage
> LOOKUP <lookupIndex> id AS cid REPLACE mail AS email</br>
> LOOKUP <lookupIndex> name REPLACE mail AS email</br>
> LOOKUP <lookupIndex> id AS cid, name APPEND address, mail AS email</br>
> LOOKUP <lookupIndex> id</br>
### Example
```sql
SEARCH source=<sourceIndex>
| WHERE orderType = 'Cancelled'
| LOOKUP account_list, mkt_id AS mkt_code REPLACE amount, account_name AS name
| STATS count(mkt_code), avg(amount) BY name
```
```sql
SEARCH source=<sourceIndex>
| DEDUP market_id
| EVAL category=replace(category, "-", ".")
| EVAL category=ltrim(category, "dvp.")
| LOOKUP bounce_category category AS category APPEND classification
```
```sql
SEARCH source=<sourceIndex>
| LOOKUP bounce_category category
```
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,54 @@ trait FlintSparkSuite extends QueryTest with FlintSuite with OpenSearchSuite wit
| """.stripMargin)
}

protected def createPeopleTable(testTable: String): Unit = {
sql(s"""
| CREATE TABLE $testTable
| (
| id INT,
| name STRING,
| occupation STRING,
| country STRING,
| salary INT
| )
| USING $tableType $tableOptions
|""".stripMargin)

// Insert data into the new table
sql(s"""
| INSERT INTO $testTable
| VALUES (1000, 'Jake', 'Engineer', 'England' , 100000),
| (1001, 'Hello', 'Artist', 'USA', 70000),
| (1002, 'John', 'Doctor', 'Canada', 120000),
| (1003, 'David', 'Doctor', null, 120000),
| (1004, 'David', null, 'Canada', 0),
| (1005, 'Jane', 'Scientist', 'Canada', 90000)
| """.stripMargin)
}

protected def createWorkInformationTable(testTable: String): Unit = {
sql(s"""
| CREATE TABLE $testTable
| (
| uid INT,
| name STRING,
| department STRING,
| occupation STRING
| )
| USING $tableType $tableOptions
|""".stripMargin)

// Insert data into the new table
sql(s"""
| INSERT INTO $testTable
| VALUES (1000, 'Jake', 'IT', 'Engineer'),
| (1002, 'John', 'DATA', 'Scientist'),
| (1003, 'David', 'HR', 'Doctor'),
| (1005, 'Jane', 'DATA', 'Engineer'),
| (1006, 'Tom', 'SALES', 'Artist')
| """.stripMargin)
}

protected def createOccupationTopRareTable(testTable: String): Unit = {
sql(s"""
| CREATE TABLE $testTable
Expand Down
Loading

0 comments on commit 0b248ec

Please sign in to comment.